lunar (1) eoconv.1.gz

Provided by: eoconv_1.5-2_amd64 bug

NAME

       eoconv - Convert text files between various Esperanto encodings

USAGE

       eoconv [-q] --from=encoding --to=encoding [file ...]

        Options:
          --from       specify input encoding (see below)
          --to         specify output encoding (see below)
          -q, --quiet  suppress warnings

          --help       detailed help message
          --man        full documentation
          --version    display version information

        Valid encodings:
          post-h post-H post-x post-X post-caret pre-caret latex
          html-hex html-dec iso-8859-3 utf-7 utf-8 utf-16 utf-32

DESCRIPTION

       eoconv will read the given input files (or stdin if no files are specified) containing
       Esperanto text in the encoding specified by --from, and then output it in the encoding
       specified by --to.

OPTIONS

       --from=encoding  Specify character encoding for input

       --to=encoding    Specify character encoding for output

       -q --quiet       Suppress non-essential warning messages

       -? --help        Print a brief help message and exit.

       --man            Print the manual page and exit.

       --version        Print version information and exit.

   CHARACTER ENCODINGS
       post-h           ASCII postfix h notation

       post-H           ASCII postfix H notation

       post-x           ASCII postfix x notation

       post-X           ASCII postfix X notation

       post-caret       ASCII postfix caret (^) notation

       pre-caret        ASCII prefix caret (^) notation

       latex, LaTeX     ASCII LaTeX sequences

       html-hex, HTML-hex
                        ASCII HTML hexadecimal entities

       html-dec, HTML-dec
                        ASCII HTML decimal entities

       iso-8859-3, ISO-8859-3, latin3, latin-3, Latin3, Latin-3
                        ISO-8859-3

       utf-7, UTF-7, utf7, UTF7
                        Unicode UTF-7

       utf-8, UTF-8, utf8, UTF8
                        Unicode UTF-8

       utf-16, UTF-16, utf16, UTF16
                        Unicode UTF-16

       utf-32, UTF-32, utf32, UTF32
                        Unicode UTF-32

ESPERANTO ORTHOGRAPHY

       Esperanto is written in an alphabet of 28 letters.  However, only 22 of these letters can
       be found in the standard ASCII character set.  The remaining six -- `c', `g', `h', `j',
       and `s' with circumflex, and `u' with breve -- are not available in ASCII; neither are
       they among the characters available in the common 8-bit ISO-8859-1 character encoding.
       Therefore, while the six special Esperanto characters pose no problem for handwritten
       texts, they were impossible to represent on standard typewriters, and are somewhat
       problematic even on modern-day computers.  Various encoding systems have been developed to
       represent Esperanto text in printed and typed text.

   POSTFIX-h NOTATION
       This was the solution proposed by the creator of Esperanto, L. L. Zamenhof.  He
       recommended using `u' for `u-breve' and appending an `h' to a letter to indicate that it
       should have a circumflex.  However, the letters `u' and `h' are already part of the
       Esperanto alphabet, so using them for another purpose invites ambiguity and
       mispronunciation.  It also makes conversion of Esperanto text to postfix-h notation
       `lossy' or one-way; it is generally not possible to convert from postfix-h notation via
       automated means.  This notation suffers from the additional drawback that the text cannot
       be sorted with standard rules for ASCII text.

   POSTFIX-H NOTATION
       This is the same as postfix-h notation, except that `H' is used instead of `h' following a
       capital letter.

   POSTFIX-x NOTATION
       This is the most common ASCII notation encountered today.  It involves appending an `x' to
       a letter to indicate that it should have an accent (be it circumflex or breve).  Since `x'
       is not a letter in the Esperanto alphabet, no ambiguity results.  However, ASCII sorting
       algorithms still fail with postfix-x text.

   POSTFIX-X NOTATION
       This is the same as postfix-x notation, except that `X' is used instead of `x' following a
       capital letter.

   PREFIX- AND POSTFIX-CARET NOTATION
       Two slightly less popular ASCII encodings are to prepend or append a caret (`^') to a
       letter to indicate that it should have an accent.

   ISO-8859-3 (LATIN-3)
       ISO 8859-3, also known as Latin-3 or South European, is an 8-bit character encoding for
       Esperanto.  High-bit characters are used to encode the accented Esperanto letters.
       ISO-8859-3 can also be used for encoding English, Finnish, German, Italian, Latin,
       Maltese, Turkish, and Portuguese, making it useful for texts which mix Esperanto with one
       or more of these languages.

   UNICODE (ISO/IEC 10646)
       Unicode is a standard for matching every character of every human language to a specific
       code.  The mapping methods are known as Unicode Transformation Formats (UTF). Among them
       are UTF-32, UTF-16, UTF-8 and UTF-7, where the numbers indicate the number of bits in one
       unit.

   LaTeX SEQUENCES
       The popular LaTeX typesetting package is capable of representing virtually any accented
       character.  Note that conversion from LaTeX sequences assumes that characters to be
       accented are enclosed in braces -- for example, `\^{C}' will be recognized as `C' with
       circumflex, but `\^C' will not be.

   HTML ENTITIES
       Unicode codes for Esperanto characters can be escaped in HTML documents by using HTML
       entities.  The codes can be represented in either decimal (base-10) or hexadecimal
       (base-16) notation; the two are functionally equivalent.

BUGS AND LIMITATIONS

       Because the postfix-h and postfix-H notations are inherently ambiguous, conversion from
       postfix-h or -H text is unlikely to result in coherent text.  Use at your own risk, and
       carefully proofread the results.

       Report bugs to <psychonaut@nothingisreal.com>.

AUTHOR

       Tristan Miller <psychonaut@nothingisreal.com>

SEE ALSO

       charsets(7), ascii(7), iso_8859-3(7), unicode(7), utf-8(7), latex(1)

       Copyright (C) 2004-2016 Tristan Miller.

       Permission is granted to make and distribute verbatim or modified copies of this manual
       provided the copyright notice and this permission notice are preserved on all copies.