Provided by: lttoolbox_3.3.3~r68466-2_amd64 bug

NAME

       lt-proc - This application is part of the lexical processing modules and tools ( lttoolbox )

       This tool is part of the apertium machine translation architecture: http://www.apertium.org.

SYNOPSIS

       lt-proc  [  -a  |  -b | -o | -c | -d | -e | -g | -n | -p | -s | -t | -v | -h -z -w ] fst_file [input_file
       [output_file]]

       lt-proc [ --analysis | --bilingual | --surf-bilingual | --case-sensitive | --debugged-gen |  --decompose-
       nouns  | --generation | --non-marked-gen | --tagged-gen | --post-generation | --sao | --transliteration |
       --null-flush  --dictionary-case  --decompose-compounds  |  --version  |  --help  ]  fst_file  [input_file
       [output_file]]

DESCRIPTION

       lt-proc is the application responsible for providing the four lexical processing functionalities

              • morphological analyser  ( option -a )

              • lexical transfer  ( option -n )

              • morphological generator  ( option -g )

              • post-generator  ( option -p )

       It  accomplishes these tasks by reading binary files containing a compact and efficient representation of
       dictionaries (a class of finite-state transducers called augmented letter transducers). These  files  are
       generated by lt-comp(1).

       It  is  worth  to  mention that some characters (`[', `]', `$', `^', `/', `+') are special chars used for
       format and encapsulation. They should be escaped if  they  have  to  be  used  literally,  for  instance:
       `['...`]' are ignored and the format of a linefeed is `^...$'.

OPTIONS

       -a, --analysis
              Tokenizes the text in surface forms (lexical units as they appear in texts) and delivers, for each
              surface  form,  one  or more lexical forms consisting of lemma, lexical category and morphological
              inflection information. Tokenization is not straightforward due to the existence, on the one hand,
              of contractions, and, on the other hand, of multi-word lexical units. For contractions, the system
              reads in a single surface form and delivers the corresponding sequence of  lexical  forms.  Multi-
              word  surface  forms  are  analysed  in a left-to-right, longest-match fashion. Multi-word surface
              forms may be invariable (such as a  multi-word  preposition  or  conjunction)  or  inflected  (for
              example,  in es, "echaban de menos", "they missed", is a form of the imperfect indicative tense of
              the verb "echar de menos", "to miss"). Limited support for some kinds of discontinuous  multi-word
              units  is also available. Single-word surface forms analysis produces output like the one in these
              examples:       "cantar"      ->      `^cantar/cantar<vblex><inf>$'      or       `"daba"       ->
               `^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$'.

       -b, --bilingual
              Does  lexical  transference,  attaching  queues  of  morphological  symbols  not  specified in the
              dictionaries. As the analysis mode, supports multiple lexical forms in the target language  for  a
              given  lexical  form  in  the  source  language.  Works  tipically  with  the  output of apertium-
              pretransfer.

       -o, --surf-bilingual
              As with -b, but takes input from apertium-tagger -p , with surface forms, and if the lexical  form
              is not found in the bilingual dictionary, it outputs the surface form of the word.

       -c, --case-sensitive
              Use the literal case of the incoming characters

       -d, --debugged-gen
              Morph. generation with all the stuff

       -e, --decompose-compounds
              Try to treat unknown words as compounds, and decompose them.

       -w, --dictionary-case
              Use  the  case  information contained in the lexicon, instead of the surface case (only applied in
              analysis mode).

       -g, --generation
              Delivers a target-language surface  form  for  each  target-language  lexical  form,  by  suitably
              inflecting it.

       -n, --non-marked-gen
              Morphological generation (like -g) but without unknown word marks (asterisk `*').

       -b, --tagged-gen
              Morphological generation (like -g) but retaining part-of-speech tags.

       -p, --post-generation
              Performs orthographical operations such as contractions and apostrophations. The post-generator is
              usually  dormant  (just  copies the input to the output) until a special alarm symbol contained in
              some target-language surface forms wakes it up to perform a particular  string  transformation  if
              necessary; then it goes back to sleep.

       -s, --sao
              Input    processing   is   in   orthoepikon   (previously   `sao')   annotation   system   format:
              http://orthoepikon.sf.net.

       -t, --transliteration
              Apply a transliteration dictionary

       -z, --null-flush
              Flush output on the null character

       -v, --version
              Display the version number.

       -h, --help
              Display this help.

FILES

       input_file The input compiled dictionary.

SEE ALSO

       lt-expand(1), lt-comp(1), apertium-tagger(1), apertium(1).

BUGS

       Lots of...lurking in the dark and waiting for you!

AUTHOR

       (c) 2005,2006 Universitat d'Alacant / Universidad de Alicante.

                                                   2006-03-23                                         lt-proc(1)