Ubuntu Manpage: apertium - This application is part of ( apertium )

Provided by: apertium_3.4.0~r61013-5_amd64

NAME

       apertium - This application is part of ( apertium )

       This tool is part of the apertium machine translation architecture: http://apertium.sf.net.

SYNOPSIS

       apertium [-d datadir] [-f format] [-u] [-a] {language-pair} [infile [outfile]]

DESCRIPTION

       apertium  is  the  application  that  most  people will be using as it simplifies the use of apertium/lt-
       toolbox tools for machine translation purposes.

       This tool tries to ease the use of lt-toolbox (which contains all  the  lexical  processing  modules  and
       tools)  and  apertium (which contains the rest of the engine) by providing a unique front-end to the end-
       user.

       The different modules behind the apertium machine translation architecture are in order:
              • de-formatter: Separates the text to be translated from the format information.

              • morphological-analyser: Tokenizes the text in surface forms.

              • part-of-speech tagger: Chooses one surface forms among homographs.

              • lexical transfer module: Reads each source-language lexical form and  delivers  a  corresponding
              target-language lexical form.

              •  structural  transfer module: Detects fixed-length patterns of lexical forms (chunks or phrases)
              needing special processing due to grammatical divergences between the two languages  and  performs
              the corresponding transformations.

              •  morphological  generator:  Delivers  a  target-language  surface  form for each target-language
              lexical form, by suitably inflecting it.

              • post-generator: Performs orthographical operations such as contractions and apostrophations.

              • re-formatter: Restores  the  format  information  encapsulated  by  the  de-formatter  into  the
              translated  text and removes the encapsulation sequences used to protect certain characters in the
              source text.

OPTIONS

       -d datadir The directory holding the linguistic data.  By default it will used the expected  installation
       path.

       language-pair The language pair: LANG1-LANG2 (for instance es-ca or ca-es).

       -f format Specifies the format of the input and output files which can have these values:
              • txt (default value) Input and output files are in text format.

              •  html  Input  and output files are in "html" format. This "html" is the one accepted by the vast
              majority of web browsers.

              • html-noent Input and  output  files  are  in  "html"  format,  but  preserving  native  encoding
              characters rather than using HTML text entities.

              •  rtf  Input  and  output  files  are in "rtf" format. The accepted "rtf" is the one generated by
              Microsoft WordPad (C) and Microsoft Office (C) up to and including Office-97.

       -u Disable marking of unknown words with the '*' character.

       -a Enable marking of disambiguated words with the '=' character.

FILES

       These are the two files that can be used with this command:

       -m memory.tmx use a translation memory to recycle translations

       -o direction translation direction using the translation memory, by default 'direction' is used instead

       -l lists the available translation  directions  and  exits  direction  typically,  LANG1-LANG2,  but  see
       modes.xml in language data

       infile Input file (stdin by default).

       outfile Output file (stdout by default).

BUGS

       Lots of...lurking in the dark and waiting for you!

AUTHOR

       (c) 2005,2006 Universitat d'Alacant / Universidad de Alicante. All rights reserved.

                                                   2006-03-08                                        apertium(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

FILES

SEE ALSO

BUGS

AUTHOR