lunar (1) enca.1.gz

Provided by: enca_1.19-1.1_amd64 bug

NAME

       enca -- detect and convert encoding of text files

SYNOPSIS

       enca [-L LANGUAGE] [OPTION]... [FILE]...
       enconv [-L LANGUAGE] [OPTION]... [FILE]...

INTRODUCTION AND EXAMPLES

       If you are lucky enough, the only two things you will ever need to know are: command

              enca FILE

       will tell you which encoding file FILE uses (without changing it), and

              enconv FILE

       will  convert file FILE to your locale native encoding.  To convert the file to some other
       encoding use the -x option (see -x entry in section OPTIONS and  sections  CONVERSION  and
       ENCODINGS for details).

       Both work with multiple files and standard input (output) too.  E.g.

              enca -x latin2 <sometext | lpr

       assures file `sometext' is in ISO Latin 2 when it's sent to printer.

       The  main reason why these command will fail and turn your files into garbage is that Enca
       needs to know their language to detect the encoding.  It tries to determine your  language
       and preferred charset from locale settings, which might not be what you want.

       You can (or have to) use -L option to tell it the right language.  Suppose, you downloaded
       some Russian HTML file, `file.htm', it claims it's windows-1251 but it isn't.  So you run

              enca -L ru file.htm

       and find out it's KOI8-R (for example).  Be warned, currently there are not many supported
       languages (see section LANGUAGES).

       Another  warning  concerns the fact several Enca's features, namely its charset conversion
       capabilities, strongly depend on what other  tools  are  installed  on  your  system  (see
       section CONVERSION)--run

              enca --version

       to get list of features (see section FEATURES).  Also try

              enca --help

       to  get  description  of  all other Enca options (and to find the rest of this manual page
       redundant).

DESCRIPTION

       Enca reads given text files, or standard input when none are  given,  and  uses  knowledge
       about  their  language  (must  be  supported by you) and a mixture of parsing, statistical
       analysis, guessing and black magic to determine their encodings, which it then  prints  to
       standard output (or it confesses it doesn't have any idea what the encoding could be).  By
       default, Enca presents results as a multiline human-readable descriptions,  several  other
       formats are available--see Output type selectors below.

       Enca can also convert files to some other encoding ENC when you ask for it--either using a
       built-in converter, some conversion library, or by calling an external converter.

       Enca's primary goal is to be usable unattended, as an automatic conversion tool, though it
       perhaps have not reached this point yet (please see section SECURITY).

       Please  note except rare cases Enca really has to know the language of input files to give
       you a reliable answer.  On the other hand, it can then cope quite well with files that are
       not  purely  textual  or  even  detect charset of text strings inside some binary file; of
       course, it depends on the character of the non-text component.

       Enca doesn't care about structure of input files, it views them  as  a  uniform  piece  of
       text/data.  In case of multipart files (e.g. mailboxes), you have to use some tool knowing
       the structure to extract the individual parts first.  It's the cost of ability  to  detect
       encodings of any damaged, incomplete or otherwise incorrect files.

OPTIONS

       There  are  several  categories of options: operation mode options, output type selectors,
       guessing parameters, conversion parameters, general options and listings.

       All long options can be abbreviated as long as they are unambiguous, mandatory  parameters
       of long options are mandatory for short options too.

   Operation modes
       are following:

       -c, --auto-convert
              Equivalent to calling Enca as enconv.

              If  no  output  type  selector  is  specified,  detect  file  encodings, guess your
              preferred charset from locales, and  convert  files  to  it  (only  available  with
              +target-charset-auto feature).

       -g, --guess
              Equivalent to calling Enca as enca.

              If no output type selector is specified, detect file encodings and report them.

   Output type selectors
       select  what  action  Enca  will  take  when it determines the encoding; most of them just
       choose between different names, formats and conventions how encodings can be printed,  but
       one  of  them  (-x)  is special: it tells Enca to recode files to some other encoding ENC.
       These options are mutually exclusive; if you specify more than one  output  type  selector
       the last one takes precedence.

       Several  output types represent charset name used by some other program, but not all these
       programs know all the charsets which Enca recognises.  Be warned, Enca makes no difference
       between  unrecognised  charset  and  charset  having  no  name  in given namespace in such
       situations.

       -d, --details
              It used to print a few pages of details about the guessing process, but since  Enca
              is just a program linked against Enca library, this is not possible and this option
              is roughly equivalent to --human-readable, except it reports  failure  reason  when
              Enca doesn't recognize the encoding.

       -e, --enca-name
              Prints  Enca's  nice name of the charset, i.e., perhaps the most generally accepted
              and more or less human-readable charset identifier, with surfaces appended.

              This name is used when calling an external converter, too.

       -f, --human-readable
              Prints verbal description of the detected charset and surfaces--something  a  human
              understands best.  This is the default behaviour.

              The  precise  format  is following: the first line contains charset name alone, and
              it's followed by zero or more indented lines containing names of detected surfaces.
              This  format  is not, however, suitable or intended for further machine-processing,
              and the verbal charset descriptions are like to change in the future.

       -i, --iconv-name
              Prints how iconv(3) (and/or iconv(1)) calls the detected charset.  More  precisely,
              it prints one, more or less arbitrarily chosen, alias accepted by iconv.  A charset
              unknown to iconv counts as unknown.

              This output type makes sense only when Enca is compiled with iconv support (feature
              +iconv-interface).

       -r, --rfc1345-name
              Prints  RFC  1345  charset  name.   When such a name doesn't exist because RFC 1345
              doesn't define a given encoding, some other name defined in some other RFC or  just
              the name which author considers `the most canonical', is printed.

              Since RFC 1345 doesn't define surfaces, no surface info is appended.

       -m, --mime-name
              Prints  preferred  MIME  name  of  detected  charset.   This is the name you should
              normally use when fixing e-mails or web pages.

              A charset not present in http://www.iana.org/assignments/character-sets  counts  as
              unknown.

       -s, --cstocs-name
              Prints  how  cstocs(1)  calls  the  detected  charset.  A charset unknown to cstocs
              counts as unknown.

       -n, --name=WORD
              Prints charset (encoding) name selected by WORD (can be abbreviated as long  as  is
              unambiguous).  For names listed above, --name=WORD is equivalent to --WORD.

              Using  aliases as the output type causes Enca to print list of all accepted aliases
              of detected charset.

       -x, --convert-to=[..]ENC
              Converts file to encoding ENC.

              The optional `..' before encoding name has no special meaning, except you  can  use
              it  to  remind  yourself  that,  unlike  in  recode(1),  you should specify desired
              encoding, instead of current.

              You can use recode(1) recoding chains or  any  other  kind  of  braindead  recoding
              specification  for  ENC, provided that you tell Enca to use some tool understanding
              it for conversion (see section CONVERSION).

              When Enca fails to determine the encoding, it prints a warning and leaves  the  the
              file  as  is;  when  it is run as a filter it tries to do its best to copy standard
              input to standard output unchanged.  Nevertheless, you should not rely on it and do
              backup.

   Guessing parameters
       There's  only  one:  -L setting language of input files. This option is mandatory (but see
       below).

       -L, --language=LANG
              Sets language of input files to LANG.

              More precisely, LANG can be any valid locale  name  (or  alias  with  +locale-alias
              feature) of some supported language.  You can also specify `none' as language name,
              only multibyte encodings are recognised then.  Run

              enca --list languages

              to get list of supported languages.  When you don't specify any language Enca tries
              to  guess  your  language  from  locale  settings  and assumes input files use this
              language.  See section LANGUAGES for details.

   Conversion parameters
       give you finer control of how charset conversion will be  performed.   They  don't  affect
       anything  when  -x is not specified as output type.  Please see section CONVERSION for the
       gory conversion details.

       -C, --try-converters=LIST
              Appends comma separated LIST to the list of converters that will be tried when  you
              ask  for  conversion.   Their  names  can  be  abbreviated  as  long  as  they  are
              unambiguous.  Run

              enca --list converters

              to get list of all valid converter names (and  see  section  CONVERSION  for  their
              description).

              The default list depends on how Enca has been compiled, run

              enca --help

              to find out default converter list.

              Note  the  default  list is used only when you don't specify -C at all.  Otherwise,
              the list is built as if it were initially empty and every -C adds new  converter(s)
              to  it.   Moreover, specifying none as converter name causes clearing the converter
              list.

       -E, --external-converter-program=PATH
              Sets external converter program name to PATH.  Default external  converter  depends
              on  how  enca has been complied, and the possibility to use external converters may
              not be available at all.  Run

              enca --help

              to find out default converter program in your enca build.

   General options
       don't fit to other option categories...

       -p, --with-filename
              Forces Enca to prefix each result with corresponding file name.  By  default,  Enca
              prefixes results with filenames when run on multiple files.

              Standard input is printed as STDIN and standard output as STDOUT (the latter can be
              probably seen in error messages only).

       -P, --no-filename
              Forces Enca to not prefix results with file names.  By default, Enca doesn't prefix
              result with file name when run on a single file (including standard input).

       -V, --verbose
              Increases verbosity level (each use increases it by one).

              Currently  this  option  in not very useful because different parts of Enca respond
              differently to the same verbosity level, mostly not at all.

   Listings
       are all terminal, i.e. when Enca encounters some of them it prints  the  required  listing
       and terminates without processing any following options.

       -h, --help
              Prints brief usage help.

       -G, --license
              Prints full Enca license (through a pager, if possible).

       -l, --list=WORD
              Prints  list  specified  by WORD (can be abbreviated as long as it is unambiguous).
              Available lists include:

              built-in-charsets.  All encodings convertible by built-in converter, by group (both
              input  and  output encoding must be from this list and belong to the same group for
              internal conversion).

              built-in-encodings.  Equivalent to built-in-charsets, but considered obsolete; will
              be accepted with a warning, for a while.

              converters.  All valid converter names (to be used with -C).

              charsets.   All  encodings  (charsets).   You can select what names will be printed
              with --name or any name output type selector (of course, only  encodings  having  a
              name  in  given  namespace  will  be  printed then), the selector must be specified
              before --list.

              encodings.  Equivalent to charsets, but considered obsolete; will be accepted  with
              a warning, for a while.

              languages.  All supported languages together with charsets belonging to them.  Note
              output type selects language name style, not charset name style here.

              names.  All possible values of --name option.

              lists.  All possible values of this option.  (Crazy?)

              surfaces.  All surfaces Enca recognises.

       -v, --version
              Prints program version and list of features (see section FEATURES).

CONVERSION

       Though Enca has been originally designed as a tool for  guessing  encoding  only,  it  now
       features  several  methods  of  charset conversion.  You can control which of them will be
       used with -C.

       Enca sequentially tries converters from the list specified by -C until it finds some  that
       is  able to perform required conversion or until it exhausts the list.  You should specify
       preferred converters first, less preferred later.  External converter (extern)  should  be
       always  specified  last,  only  as last resort, since it's usually not possible to recover
       when it fails.  The default list of  converters  always  starts  with  built-in  and  then
       continues with the first one available from: librecode, iconv, nothing.

       It  should  be noted when Enca says it is not able to perform the conversion it only means
       none of the converters is able to perform it.  It can be still  possible  to  perform  the
       required  conversion  in  several  steps, using several converters, but to figure out how,
       human intelligence is probably needed.

   Built-in converter
       is the simplest and  far  the  fastest  of  all,  can  perform  only  a  few  byte-to-byte
       conversions  and  modifies  files  directly  in place (may be considered dangerous, but is
       pretty efficient).  You can get list of all encodings it can convert with

              enca --list built-in

       Beside speed, its main advantage (and also disadvantage)  is  that  it  doesn't  care:  it
       simply  converts  characters  having  a  representation  in target encoding, doesn't touch
       anything else and never prints any error message.

       This converter can be specified as built-in with -C.

   Librecode converter
       is an interface to GNU recode library, that does the actual recoding job.  It may  or  may
       not be compiled in; run

              enca --version

       to find out its availability in your enca build (feature +librecode-interface).

       You  should  be  familiar  with  recode(1)  before  using  it,  since  recode  is  a quite
       sophisticated and powerful charset conversion tool.  You may run into  problems  using  it
       together  with  Enca particularly because Enca's support for surfaces not 100% compatible,
       because recode tries too hard to make the transformation reversible, because it  sometimes
       silently  ignores  I/O  errors,  and because it's incredibly buggy.  Please see GNU recode
       info pages for details about recode library.

       This converter can be specified as librecode with -C.

   Iconv converter
       is an interface to the UNIX98 iconv(3) conversion functions, that do the  actual  recoding
       job.  It may or may not be compiled in; run

              enca --version

       to find out its availability in your enca build (feature +iconv-interface).

       While  iconv  is  present  on  most  today systems it only rarely offer some useful set of
       available conversions, the only notable exception  being  iconv  from  GNU  libc.   It  is
       usually quite picky about surfaces, too (while, at the same time, not implementing surface
       conversion).  It however probably represents the only standard(ized) tool able to  perform
       conversion  from/to  Unicode.   Please see iconv documentation about for details about its
       capabilities on your particular system.

       This converter can be specified as iconv with -C.

   External converter
       is an arbitrary external conversion tool that can be specified with -E option (at most one
       can  be  defined  simultaneously).   There are some standard, provided together with enca:
       cstocs, recode, map, umap, and piconv.  All are wrapper scripts: for cstocs(1), recode(1),
       map(1), umap(1), and piconv(1).

       Please  note  enca has little control what the external converter really does.  If you set
       it to /bin/rm you are fully responsible for the consequences.

       If you want to make your own converter to use with enca, you  should  know  it  is  always
       called

              CONVERTER ENC_CURRENT ENC FILE [-]

       where  CONVERTER is what has been set by -E, ENC_CURRENT is detected encoding, ENC is what
       has been specified with -x, and FILE is the file to convert, i.e. it is  called  for  each
       file  separately.   The  optional fourth parameter, -, should cause (when present) sending
       result of conversion to standard  output  instead  of  overwriting  the  file  FILE.   The
       converter  should  also take care of not changing file permissions, returning error code 1
       when it fails and  cleaning  its  temporary  files.   Please  see  the  standard  external
       converters for examples.

       This converter can be specified as extern with -C.

   Default target charset
       The straightforward way of specifying target charset is the -x option, which overrides any
       defaults.  When Enca is called as enconv, default target charset is selected  exactly  the
       same way as recode(1) does it.

       If the DEFAULT_CHARSET environment variable is set, it's used as the target charset.

       Otherwise,  if  you  system  provides the nl_langinfo(3) function, current locale's native
       charset is used as the target charset.

       When both methods fail, Enca complains and terminates.

   Reversibility notes
       If reversibility is crucial for you, you shouldn't use enca as converter at all (or  maybe
       you  can,  with  very  specifically  designed recode(1) wrapper).  Otherwise you should at
       least know that there four basic means of handling inconvertible character entities:

       fail--this is a possibility, too, and incidentally it's  exactly  what  current  GNU  libc
       iconv implementation does (recode can be also told to do it)

       don't  touch  them--this  is  what  enca internal converter always does and recode can do;
       though it is not reversible, a human being is usually able to reconstruct the original (at
       least in principle)

       approximate  them--this is what cstocs can do, and recode too, though differently; and the
       best choice if you just want to make the accursed text readable

       drop them out--this is what both recode and cstocs can do (cstocs can also  replace  these
       characters   by   some  fixed  character  instead  of  mere  ignoring);  useful  when  the
       to-be-omitted characters contain only noise.

       Please consult your favourite converter manual for details of this issue.   Generally,  if
       you  are  not  lucky  enough  to  have  all  convertible  characters  in  you file, manual
       intervention is needed anyway.

   Performance notes
       Poor performance of available converters has  been  one  of  main  reasons  for  including
       built-in  converter  in  enca.   Try  to  use  it  whenever  possible,  i.e. when files in
       consideration are charset-clean enough or charset-messy enough so that its  zero  built-in
       intelligence  doesn't  matter.   It  requires no extra disk space nor extra memory and can
       outperform recode(1) more than 10 times on large files and Perl version (i.e.  the  faster
       one)  of cstocs(1) more than 400 times on small files (in fact it's almost as fast as mere
       cp(1)).

       Try to avoid external converters when it's not absolutely necessary since all the  forking
       and moving stuff around is incredibly slow.

ENCODINGS

       You can get list of recognised character sets with

              enca --list charsets

       and  using  --name  parameter  you can select any name you want to be used in the listing.
       You can also list all surfaces with

              enca --list surfaces

       Encoding and surface names are case insensitive and non-alphanumeric  characters  are  not
       taken  into  account.  However, non-alphanumeric characters are mostly not allowed at all.
       The only allowed are: `-', `_', `.', `:', and  `/'  (as  charset/surface  separator).   So
       `ibm852' and `IBM-852' are the same, while `IBM 852' is not accepted.

   Charsets
       Following  list  of  recognised charsets uses Enca's names (-e) and verbal descriptions as
       reported by Enca (-f):

       ASCII         7bit ASCII characters
       ISO-8859-2    ISO 8859-2 standard; ISO Latin 2
       ISO-8859-4    ISO 8859-4 standard; Latin 4
       ISO-8859-5    ISO 8859-5 standard; ISO Cyrillic
       ISO-8859-13   ISO 8859-13 standard; ISO Baltic; Latin 7
       ISO-8859-16   ISO 8859-16 standard
       CP1125        MS-Windows code page 1125
       CP1250        MS-Windows code page 1250
       CP1251        MS-Windows code page 1251
       CP1257        MS-Windows code page 1257; WinBaltRim
       IBM852        IBM/MS code page 852; PC (DOS) Latin 2
       IBM855        IBM/MS code page 855
       IBM775        IBM/MS code page 775
       IBM866        IBM/MS code page 866
       baltic        ISO-IR-179; Baltic
       KEYBCS2       Kamenicky encoding; KEYBCS2
       macce         Macintosh Central European

       maccyr        Macintosh Cyrillic
       ECMA-113      Ecma Cyrillic; ECMA-113
       KOI-8_CS_2    KOI8-CS2 code (`T602')
       KOI8-R        KOI8-R Cyrillic
       KOI8-U        KOI8-U Cyrillic
       KOI8-UNI      KOI8-Unified Cyrillic
       TeX           (La)TeX control sequences
       UCS-2         Universal character set 2 bytes; UCS-2; BMP
       UCS-4         Universal character set 4 bytes; UCS-4; ISO-10646
       UTF-7         Universal transformation format 7 bits; UTF-7
       UTF-8         Universal transformation format 8 bits; UTF-8
       CORK          Cork encoding; T1
       GBK           Simplified Chinese National Standard; GB2312
       BIG5          Traditional Chinese Industrial Standard; Big5
       HZ            HZ encoded GB2312
       unknown       Unrecognized encoding

       where unknown is not any real encoding, it's reported when Enca is  not  able  to  give  a
       reliable answer.

   Surfaces
       Enca  has  some  experimental  support  for  so-called  surfaces  (see below).  It detects
       following surfaces (not all can be applied to all charsets):

       /CR     CR line terminators
       /LF     LF line terminators
       /CRLF   CRLF line terminators
       N.A.    Mixed line terminators
       N.A.    Surrounded by/intermixed with non-text data
       /21     Byte order reversed in pairs (1,2 -> 2,1)
       /4321   Byte order reversed in quadruples (1,2,3,4 -> 4,3,2,1)
       N.A.    Both little and big endian chunks, concatenated
       /qp     Quoted-printable encoded

       Note some surfaces have N.A. in place of identifier--they cannot be specified  on  command
       line, they can only be reported by Enca.  This is intentional because they only inform you
       why the file cannot be  considered  surface-consistent  instead  of  representing  a  real
       surface.

       Each  charset  has its natural surface (called `implied' in recode) which is not reported,
       e.g., for IBM 852 charset it's `CRLF line terminators'.  For UCS encodings, big endian  is
       considered  as  natural  surface;  unusual  byte  orders  are constructed from 21 and 4321
       permutations: 2143 is reported simply as 21, while 3412 is reported as combination of 4321
       and 21.

       Doubly-encoded UTF-8 is neither charset nor surface, it's just reported.

   About charsets, encodings and surfaces
       Charset  is  a set of character entities while encoding is its representation in the terms
       of bytes and bits.  In Enca, the word encoding means the same as `representation of text',
       i.e.  the  relation  between  sequence  of  character  entities  constituting the text and
       sequence of bytes (bits) constituting the file.

       So, encoding is both character set and so-called surface (line  terminators,  byte  order,
       combining,  Base64 transformation, etc.).  Nevertheless, it proves convenient to work with
       some {charset,surface} pairs as with genuine charsets.  So, as in recode(1), all UCS-  and
       UTF-  encodings  of  Universal  character  set  are  called  charsets.   Please see recode
       documentation for more details of this issue.

       The only good thing about surfaces is: when you don't start  playing  with  them,  neither
       Enca  won't  start  and  it  will  try  to behave as much as possible as a surface-unaware
       program, even when talking to recode.

LANGUAGES

       Enca needs to know the language of input files to work  reliably,  at  least  in  case  of
       regular  8bit  encoding.  Multibyte encodings should be recognised for any Latin, Cyrillic
       or Greek language.

       You can (or have to) use -L option to tell Enca the language.   Since  people  most  often
       work  with  files  in the same language for which they have configured locales, Enca tries
       tries to guess the language by examining value of LC_CTYPE  and  other  locale  categories
       (please  see  locale(7))  and  using  it  for the language when you don't specify any.  Of
       course, it may be completely wrong and will give you  nonsense  answers  and  damage  your
       files,  so please don't forget to use the -L option.  You can also use ENCAOPT environment
       variable to set a default language (see section ENVIRONMENT).

       Following languages are supported by Enca (each language is listed together with supported
       8bit encodings).

       Belarusian    CP1251 IBM866 ISO-8859-5 KOI8-UNI maccyr IBM855
       Bulgarian     CP1251 ISO-8859-5 IBM855 maccyr ECMA-113
       Czech         ISO-8859-2 CP1250 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
       Estonian      ISO-8859-4 CP1257 IBM775 ISO-8859-13 macce baltic
       Croatian      CP1250 ISO-8859-2 IBM852 macce CORK
       Hungarian     ISO-8859-2 CP1250 IBM852 macce CORK
       Lithuanian    CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
       Latvian       CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
       Polish        ISO-8859-2 CP1250 IBM852 macce ISO-8859-13 ISO-8859-16 baltic CORK
       Russian       KOI8-R CP1251 ISO-8859-5 IBM866 maccyr
       Slovak        CP1250 ISO-8859-2 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
       Slovene       ISO-8859-2 CP1250 IBM852 macce CORK
       Ukrainian     CP1251 IBM855 ISO-8859-5 CP1125 KOI8-U maccyr
       Chinese       GBK BIG5 HZ
       none

       The  special  language none can be shortened to __, it contains no 8bit encodings, so only
       multibyte encodings are detected.

       You can also use locale names instead of languages:

       Belarusian      be
       Bulgarian       bg
       Czech           cs
       Estonian        et
       Croatian        hr
       Hungarian       hu
       Lithuanian      lt
       Latvian         lv
       Polish          pl
       Russian         ru
       Slovak          sk
       Slovene         sl
       Ukrainian       uk
       Chinese         zh

FEATURES

       Several Enca's features depend on what  is  available  on  your  system  and  how  it  was
       compiled.  You can get their list with

              enca --version

       Plus  sign  before  a feature name means it's available, minus sign means this build lacks
       the particular feature.

       librecode-interface.   Enca  has  interface  to  GNU  recode  library  charset  conversion
       functions.

       iconv-interface.  Enca has interface to UNIX98 iconv charset conversion functions.

       external-converter.   Enca can use external conversion programs (if you have some suitable
       installed).

       language-detection.  Enca tries to guess language (-L) from locales.  You don't  need  the
       --language option, at least in principle.

       locale-alias.  Enca is able to decrypt locale aliases used for language names.

       target-charset-auto.   Enca  tries  to detect your preferred charset from locales.  Option
       --auto-convert and calling Enca as enconv works, at least in principle.

       ENCAOPT.  Enca is able to correctly parse this environment variable  before  command  line
       parameters.  Simple stuff like ENCAOPT="-L uk" will work even without this feature.

ENVIRONMENT

       The  variable  ENCAOPT  can  hold set of default Enca options.  Its content is interpreted
       before command line arguments.  Unfortunately, this doesn't  work  everywhere  (must  have
       +ENCAOPT feature).

       LC_CTYPE,  LC_COLLATE,  LC_MESSAGES  (possibly  inherited from LC_ALL or LANG) is used for
       guessing your language (must have +language-detection feature).

       The variable DEFAULT_CHARSET can be used by enconv as the default target charset.

DIAGNOSTICS

       Enca returns exit code 0 when all  input  files  were  successfully  proceeded  (i.e.  all
       encodings  were  detected and all files were converted to required encoding, if conversion
       was asked for).  Exit code 1 is returned when Enca wasn't able to either guess encoding or
       perform  conversion  on  any  input  file  because it's not clever enough.  Exit code 2 is
       returned in case of serious (e.g. I/O) troubles.

SECURITY

       It should be possible to let Enca work unattended, it's its goal. However:

       There's no warranty the detection works 100%.  Don't  bet  on  it,  you  can  easily  lose
       valuable data.

       Don't  use  enca  (the  program),  link to libenca instead if you want anything resembling
       security. You have to perform the eventual conversion yourself then.

       Don't use external converters. Ideally, disable them compile-time.

       Be aware  of  ENCAOPT  and  all  the  built-in  automagic  guessing  various  things  from
       environment, namely locales.

SEE ALSO

       autoconvert(1), cstocs(1), file(1), iconv(1), iconv(3), nl_langinfo(3), map(1), piconv(1),
       recode(1), locale(5), locale(7), ltt(1), umap(1), unicode(7), utf-8(7), xcode(1)

KNOWN BUGS

       It has too many unknown bugs.

       The idea of using LC_* value for language is certainly braindead.  However I like it.

       It can't backup files before mangling them.

       In certain situations, it may behave incorrectly on >31bit file systems  and/or  over  NFS
       (both untested but shouldn't cause problems in practice).

       Built-in  converter does not convert character `ch' from KOI8-CS2, and possibly some other
       characters you've probably never heard about anyway.

       EOL type recognition works poorly on Quoted-printable encoded files.  This should be fixed
       someday.

       There  are  no command line options to tune libenca parameters.  This is intentional (Enca
       should DWIM) but sometimes this is a nuisance.

       The manual page is too long, especially this section.  This doesn't  matter  since  nobody
       does read it.

       Send bug reports to <https://github.com/nijel/enca/issues>.

TRIVIA

       Enca  is  Extremely Naive Charset Analyser.  Nevertheless, the `enc' originally comes from
       `encoding' so the leading `e' should be read as in `encoding' not as in `extreme'.

AUTHORS

       David Necas (Yeti) <yeti@physics.muni.cz>

       Michal Cihar <michal@cihar.com>

       Unicode data has been generated from various (free) on-line resources or using GNU recode.
       Statistical  data  has  been  generated  from  various  texts on the Net, I hope character
       counting doesn't break anyone's copyright.

ACKNOWLEDGEMENTS

       Please see the file THANKS in distribution.

       Copyright (C) 2000-2003 David Necas (Yeti).

       Copyright (C) 2009 Michal Cihar <michal@cihar.com>.

       Enca is free software; you can redistribute it and/or modify it under the terms of version
       2 of the GNU General Public License as published by the Free Software Foundation.

       Enca  is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       GNU General Public License for more details.

       You should have received a copy of the GNU General Public License along with Enca; if not,
       write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.