Ubuntu Manpage: apertium — machine translation application platform

NAME

     apertium — machine translation application platform

SYNOPSIS

     apertium [-au] [-d datadir] [-f format] language-pair [infile [outfile]]

DESCRIPTION

     apertium is the application that most people will be using as it simplifies the use of
     apertium/lt-toolbox tools for machine translation purposes.

     This tool tries to ease the use of lt-toolbox (which contains all the lexical processing
     modules and tools) and apertium (which contains the rest of the engine) by providing a
     unique front-end to the end-user.

     The different modules behind the apertium machine translation architecture are in order:

     de-formatter
             Separates the text to be translated from the format information.

     morphological-analyser
             Tokenizes the text in surface forms.

     part-of-speech tagger
             Chooses one surface forms among homographs.

     lexical transfer module
             Reads each source-language lexical form and delivers a corresponding target-language
             lexical form.

     structural transfer module
             Detects fixed-length patterns of lexical forms (chunks or phrases) needing special
             processing due to grammatical divergences between the two languages and performs the
             corresponding transformations.

     morphological generator
             Delivers a target-language surface form for each target-language lexical form, by
             suitably inflecting it.

     post-generator
             Performs orthographical operations such as contractions and apostrophations.

     re-formatter
             Restores the format information encapsulated by the de-formatter into the translated
             text and removes the encapsulation sequences used to protect certain characters in
             the source text.

OPTIONS

     -d datadir
             The directory holding the linguistic data.  By default it will use the expected
             installation path.

     language-pair
             The language pair: LANG1–LANG2 (for instance “es-ca” or “ca-es”).

     -f format
             Specifies the format of the input and output files which can have these values:

             txt     (default value) Input and output files are in text format.

             html    Input and output files are in “html” format.  This “html” is the one
                     accepted by the vast majority of web browsers.

             html-noent
                     Input and output files are in “html” format, but preserving native encoding
                     characters rather than using HTML text entities.

             rtf     Input and output files are in “rtf” format.  The accepted “rtf” is the one
                     generated by Microsoft WordPad and Microsoft Office up to and including
                     Office 97.

     -u      Disable marking of unknown words with the ‘*’ character.

     -H      Enable header-detection (only used in some language pairs; will lead to stray ‘❡’
             characters in pairs that don't support it).

     -a      Enable marking of disambiguated words with the ‘=’ character.

FILES

     These are the two files that can be used with this command:

     -m memory.tmx
             use a translation memory to recycle translations

     -o direction
             translation direction using the translation memory, by default “direction” is used
             instead

     -l      lists the available translation directions and exits direction typically,
             LANG1–LANG2, but see modes.xml in language data

     infile  Input file (stdin by default).

     outfile
             Output file (stdout by default).

COPYRIGHT

     Copyright © 2005, 2006 Universitat d'Alacant / Universidad de Alicante.  This is free
     software.  You may redistribute copies of it under the terms of the GNU General Public
     License: https://www.gnu.org/licenses/gpl.html.

BUGS

     Many... lurking in the dark and waiting for you!

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

FILES

SEE ALSO

COPYRIGHT

BUGS