Provided by: lttoolbox_3.3.3~r68466-2_amd64 bug

NAME

       lt-proc - This application is part of the lexical processing modules and tools ( lttoolbox
       )

       This   tool   is   part   of    the    apertium    machine    translation    architecture:
       http://www.apertium.org.

SYNOPSIS

       lt-proc  [ -a | -b | -o | -c | -d | -e | -g | -n | -p | -s | -t | -v | -h -z -w ] fst_file
       [input_file [output_file]]

       lt-proc [ --analysis | --bilingual | --surf-bilingual | --case-sensitive |  --debugged-gen
       | --decompose-nouns | --generation | --non-marked-gen | --tagged-gen | --post-generation |
       --sao  |  --transliteration  |  --null-flush  --dictionary-case  --decompose-compounds   |
       --version | --help ] fst_file [input_file [output_file]]

DESCRIPTION

       lt-proc  is  the  application  responsible  for  providing  the  four  lexical  processing
       functionalities

              • morphological analyser  ( option -a )

              • lexical transfer  ( option -n )

              • morphological generator  ( option -g )

              • post-generator  ( option -p )

       It accomplishes these tasks by reading binary files containing  a  compact  and  efficient
       representation  of  dictionaries  (a  class  of  finite-state transducers called augmented
       letter transducers). These files are generated by lt-comp(1).

       It is worth to mention that some characters (`[', `]', `$', `^',  `/',  `+')  are  special
       chars  used  for  format and encapsulation. They should be escaped if they have to be used
       literally, for instance: `['...`]' are ignored and the format of a linefeed is `^...$'.

OPTIONS

       -a, --analysis
              Tokenizes the text in surface forms (lexical units as they  appear  in  texts)  and
              delivers,  for  each  surface  form, one or more lexical forms consisting of lemma,
              lexical category and morphological  inflection  information.  Tokenization  is  not
              straightforward due to the existence, on the one hand, of contractions, and, on the
              other hand, of multi-word lexical units. For contractions, the system  reads  in  a
              single  surface  form  and  delivers  the  corresponding sequence of lexical forms.
              Multi-word surface forms are analysed in a  left-to-right,  longest-match  fashion.
              Multi-word  surface  forms  may  be invariable (such as a multi-word preposition or
              conjunction) or inflected (for example, in es, "echaban de menos",  "they  missed",
              is  a  form  of  the  imperfect  indicative tense of the verb "echar de menos", "to
              miss"). Limited support for some kinds of discontinuous multi-word  units  is  also
              available. Single-word surface forms analysis produces output like the one in these
              examples:    "cantar"   ->    `^cantar/cantar<vblex><inf>$'    or     `"daba"    ->
               `^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$'.

       -b, --bilingual
              Does  lexical transference, attaching queues of morphological symbols not specified
              in the dictionaries. As the analysis mode, supports multiple lexical forms  in  the
              target  language  for  a given lexical form in the source language. Works tipically
              with the output of apertium-pretransfer.

       -o, --surf-bilingual
              As with -b, but takes input from apertium-tagger -p , with surface  forms,  and  if
              the  lexical  form is not found in the bilingual dictionary, it outputs the surface
              form of the word.

       -c, --case-sensitive
              Use the literal case of the incoming characters

       -d, --debugged-gen
              Morph. generation with all the stuff

       -e, --decompose-compounds
              Try to treat unknown words as compounds, and decompose them.

       -w, --dictionary-case
              Use the case information contained in the lexicon,  instead  of  the  surface  case
              (only applied in analysis mode).

       -g, --generation
              Delivers  a  target-language surface form for each target-language lexical form, by
              suitably inflecting it.

       -n, --non-marked-gen
              Morphological generation (like -g) but without unknown word marks (asterisk `*').

       -b, --tagged-gen
              Morphological generation (like -g) but retaining part-of-speech tags.

       -p, --post-generation
              Performs orthographical operations such as contractions  and  apostrophations.  The
              post-generator  is  usually  dormant  (just copies the input to the output) until a
              special alarm symbol contained in some target-language surface forms wakes it up to
              perform  a  particular  string  transformation  if  necessary; then it goes back to
              sleep.

       -s, --sao
              Input processing is in orthoepikon (previously  `sao')  annotation  system  format:
              http://orthoepikon.sf.net.

       -t, --transliteration
              Apply a transliteration dictionary

       -z, --null-flush
              Flush output on the null character

       -v, --version
              Display the version number.

       -h, --help
              Display this help.

FILES

       input_file The input compiled dictionary.

SEE ALSO

       lt-expand(1), lt-comp(1), apertium-tagger(1), apertium(1).

BUGS

       Lots of...lurking in the dark and waiting for you!

AUTHOR

       (c) 2005,2006 Universitat d'Alacant / Universidad de Alicante.

                                            2006-03-23                                 lt-proc(1)