Ubuntu Manpage: lt-proc - This application is part of the lexical processing modules and tools ( lttoolbox )

Provided by: lttoolbox_3.3.2~r63423-3_amd64

NAME

       lt-proc - This application is part of the lexical processing modules and tools ( lttoolbox )

       This tool is part of the apertium machine translation architecture: http://www.apertium.org.

SYNOPSIS

       lt-proc  [  -a  |  -b | -o | -c | -d | -e | -g | -n | -p | -s | -t | -v | -h -z -w ] fst_file [input_file
       [output_file]]

       lt-proc [ --analysis | --bilingual | --surf-bilingual | --case-sensitive | --debugged-gen |  --decompose-
       nouns  | --generation | --non-marked-gen | --tagged-gen | --post-generation | --sao | --transliteration |
       --null-flush  --dictionary-case  --decompose-compounds  |  --version  |  --help  ]  fst_file  [input_file
       [output_file]]

DESCRIPTION

       lt-proc is the application responsible for providing the four lexical processing functionalities

              • morphological analyser  ( option -a )

              • lexical transfer  ( option -n )

              • morphological generator  ( option -g )

              • post-generator  ( option -p )

       It  accomplishes these tasks by reading binary files containing a compact and efficient representation of
       dictionaries (a class of finite-state transducers called augmented letter transducers). These  files  are
       generated by lt-comp(1).

       It  is  worth  to  mention that some characters (`[', `]', `$', `^', `/', `+') are special chars used for
       format and encapsulation. They should be escaped if  they  have  to  be  used  literally,  for  instance:
       `['...`]' are ignored and the format of a linefeed is `^...$'.

OPTIONS

-a, --analysis
Tokenizes the text in surface forms (lexical units as they appear in texts) and delivers, for each
surface form, one or more lexical forms consisting of lemma, lexical category and morphological
inflection information. Tokenization is not straightforward due to the existence, on the one hand,
of contractions, and, on the other hand, of multi-word lexical units. For contractions, the system
reads in a single surface form and delivers the corresponding sequence of lexical forms. Multi-
word surface forms are analysed in a left-to-right, longest-match fashion. Multi-word surface
forms may be invariable (such as a multi-word preposition or conjunction) or inflected (for
example, in es, "echaban de menos", "they missed", is a form of the imperfect indicative tense of
the verb "echar de menos", "to miss"). Limited support for some kinds of discontinuous multi-word
units is also available. Single-word surface forms analysis produces output like the one in these
examples: "cantar" -> `^cantar/cantar<vblex><inf>$' or `"daba" ->
`^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$'.

-b, --bilingual
Does lexical transference, attaching queues of morphological symbols not specified in the
dictionaries. As the analysis mode, supports multiple lexical forms in the target language for a
given lexical form in the source language. Works tipically with the output of apertium-
pretransfer.

-o, --surf-bilingual
As with -b, but takes input from apertium-tagger -p , with surface forms, and if the lexical form
is not found in the bilingual dictionary, it outputs the surface form of the word.

-c, --case-sensitive
Use the literal case of the incoming characters

-d, --debugged-gen
Morph. generation with all the stuff

-e, --decompose-compounds
Try to treat unknown words as compounds, and decompose them.

-w, --dictionary-case
Use the case information contained in the lexicon, instead of the surface case (only applied in
analysis mode).

-g, --generation
Delivers a target-language surface form for each target-language lexical form, by suitably
inflecting it.

-n, --non-marked-gen
Morphological generation (like -g) but without unknown word marks (asterisk `*').

-b, --tagged-gen
Morphological generation (like -g) but retaining part-of-speech tags.

-p, --post-generation
Performs orthographical operations such as contractions and apostrophations. The post-generator is
usually dormant (just copies the input to the output) until a special alarm symbol contained in
some target-language surface forms wakes it up to perform a particular string transformation if
necessary; then it goes back to sleep.

-s, --sao
Input processing is in orthoepikon (previously `sao') annotation system format:
http://orthoepikon.sf.net.

-t, --transliteration
Apply a transliteration dictionary

-z, --null-flush
Flush output on the null character

-v, --version
Display the version number.

-h, --help
Display this help.

FILES

       input_file The input compiled dictionary.

BUGS

       Lots of...lurking in the dark and waiting for you!

AUTHOR

       (c) 2005,2006 Universitat d'Alacant / Universidad de Alicante.

                                                   2006-03-23                                         lt-proc(1)