xenial (1) lt-proc.1.gz

Provided by: lttoolbox_3.3.2~r63423-3_amd64 bug

NAME

       lt-proc - This application is part of the lexical processing modules and tools ( lttoolbox )

       This tool is part of the apertium machine translation architecture: http://www.apertium.org.

SYNOPSIS

       lt-proc  [  -a  |  -b | -o | -c | -d | -e | -g | -n | -p | -s | -t | -v | -h -z -w ] fst_file [input_file
       [output_file]]

       lt-proc [ --analysis | --bilingual | --surf-bilingual | --case-sensitive | --debugged-gen |  --decompose-
       nouns  | --generation | --non-marked-gen | --tagged-gen | --post-generation | --sao | --transliteration |
       --null-flush  --dictionary-case  --decompose-compounds  |  --version  |  --help  ]  fst_file  [input_file
       [output_file]]

DESCRIPTION

       lt-proc is the application responsible for providing the four lexical processing functionalities

              • morphological analyser  ( option -a )

              • lexical transfer  ( option -n )

              • morphological generator  ( option -g )

              • post-generator  ( option -p )

       It  accomplishes these tasks by reading binary files containing a compact and efficient representation of
       dictionaries (a class of finite-state transducers called augmented letter transducers). These  files  are
       generated by lt-comp(1).

       It  is  worth  to  mention that some characters (`[', `]', `$', `^', `/', `+') are special chars used for
       format and encapsulation. They should be escaped if  they  have  to  be  used  literally,  for  instance:
       `['...`]' are ignored and the format of a linefeed is `^...$'.

OPTIONS

       -a, --analysis
              Tokenizes the text in surface forms (lexical units as they appear in texts) and delivers, for each
              surface form, one or more lexical forms consisting of lemma, lexical  category  and  morphological
              inflection information. Tokenization is not straightforward due to the existence, on the one hand,
              of contractions, and, on the other hand, of multi-word lexical units. For contractions, the system
              reads  in  a  single surface form and delivers the corresponding sequence of lexical forms. Multi-
              word surface forms are analysed in a  left-to-right,  longest-match  fashion.  Multi-word  surface
              forms  may  be  invariable  (such  as  a  multi-word preposition or conjunction) or inflected (for
              example, in es, "echaban de menos", "they missed", is a form of the imperfect indicative tense  of
              the  verb "echar de menos", "to miss"). Limited support for some kinds of discontinuous multi-word
              units is also available. Single-word surface forms analysis produces output like the one in  these
              examples:        "cantar"      ->      `^cantar/cantar<vblex><inf>$'      or       `"daba"      ->
               `^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$'.

       -b, --bilingual
              Does lexical transference,  attaching  queues  of  morphological  symbols  not  specified  in  the
              dictionaries.  As  the analysis mode, supports multiple lexical forms in the target language for a
              given lexical form  in  the  source  language.  Works  tipically  with  the  output  of  apertium-
              pretransfer.

       -o, --surf-bilingual
              As  with -b, but takes input from apertium-tagger -p , with surface forms, and if the lexical form
              is not found in the bilingual dictionary, it outputs the surface form of the word.

       -c, --case-sensitive
              Use the literal case of the incoming characters

       -d, --debugged-gen
              Morph. generation with all the stuff

       -e, --decompose-compounds
              Try to treat unknown words as compounds, and decompose them.

       -w, --dictionary-case
              Use the case information contained in the lexicon, instead of the surface case  (only  applied  in
              analysis mode).

       -g, --generation
              Delivers  a  target-language  surface  form  for  each  target-language  lexical form, by suitably
              inflecting it.

       -n, --non-marked-gen
              Morphological generation (like -g) but without unknown word marks (asterisk `*').

       -b, --tagged-gen
              Morphological generation (like -g) but retaining part-of-speech tags.

       -p, --post-generation
              Performs orthographical operations such as contractions and apostrophations. The post-generator is
              usually  dormant  (just  copies the input to the output) until a special alarm symbol contained in
              some target-language surface forms wakes it up to perform a particular  string  transformation  if
              necessary; then it goes back to sleep.

       -s, --sao
              Input    processing   is   in   orthoepikon   (previously   `sao')   annotation   system   format:
              http://orthoepikon.sf.net.

       -t, --transliteration
              Apply a transliteration dictionary

       -z, --null-flush
              Flush output on the null character

       -v, --version
              Display the version number.

       -h, --help
              Display this help.

FILES

       input_file The input compiled dictionary.

SEE ALSO

       lt-expand(1), lt-comp(1), apertium-tagger(1), apertium(1).

BUGS

       Lots of...lurking in the dark and waiting for you!

AUTHOR

       (c) 2005,2006 Universitat d'Alacant / Universidad de Alicante.

                                                   2006-03-23                                         lt-proc(1)