Ubuntu Manpage: lt-proc — lexical processor for Apertium

NAME

     lt-proc — lexical processor for Apertium

SYNOPSIS

     lt-proc [-a | -b | -o | -c | -d | -e | -g | -h | -p | -s | -t | -v | -h | -z | -w] [-W]
             [-N -N] [-L -N] [-i icx_file] fst_file [input_file [output_file]]

DESCRIPTION

     lt-proc is the application responsible for providing the four lexical processing
     functionalities:

     •   morphological analyser (option -a)

     •   lexical transfer (option -n)

     •   morphological generator (option -g)

     •   post-generator (option -p)

     It accomplishes these tasks by reading binary files containing a compact and efficient
     representation of dictionaries (a class of finite-state transducers called augmented letter
     transducers).  These files are generated by lt-comp(1).

     It is worth mentioning that some characters (‘[’, ‘]’, ‘$’, ‘^’, ‘/’, ‘+’) are special chars
     used for format and encapsulation.  They should be escaped if they have to be used
     literally, for instance: ‘[’...‘]’ are ignored and the format of a linefeed is ‘^...$’.

OPTIONS

-a, --analysis
Tokenizes the text in surface forms (lexical units as they appear in texts) and
delivers, for each surface form, one or more lexical forms consisting of lemma,
lexical category and morphological inflection information. Tokenization is not
straightforward due to the existence, on the one hand, of contractions, and, on the
other hand, of multi-word lexical units. For contractions, the system reads in a
single surface form and delivers the corresponding sequence of lexical forms.
Multi-word surface forms are analysed in a left-to-right, longest-match fashion.
Multi-word surface forms may be invariable (such as a multi-word preposition or
conjunction) or inflected (for example, in es, “echaban de menos”, “they missed”, is
a form of the imperfect indicative tense of the verb “echar de menos”, “to miss”).
Limited support for some kinds of discontinuous multi-word units is also available.
Single-word surface forms analysis produces output like the one in these examples:

“cantar” → “^cantar/cantar<vblex><inf>$” or “daba” →
“^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$”.

-b, --bilingual
Does lexical transference, attaching queues of morphological symbols not specified
in the dictionaries. As the analysis mode, supports multiple lexical forms in the
target language for a given lexical form in the source language. Works typically
with the output of apertium-pretransfer(1).

-o, --surf-bilingual
As with -b, but takes input from apertium-tagger(1) -p, with surface forms, and if
the lexical form is not found in the bilingual dictionary, it outputs the surface
form of the word.

-c, --case-sensitive
Use the literal case of the incoming characters

-d, --debugged-gen
Morphological generation with all the stuff

-e, --decompose-compounds
Try to treat unknown words as compounds, and decompose them.

-w, --dictionary-case
Use the case information contained in the lexicon, instead of the surface case (only
applied in analysis mode).

-g, --generation
Delivers a target-language surface form for each target-language lexical form, by
suitably inflecting it.

-n, --non-marked-gen
Morphological generation (like -g) but without unknown word marks (asterisk ‘*’).

-b, --tagged-gen
Morphological generation (like -g) but retaining part-of-speech tags.

-p, --post-generation
Performs orthographical operations such as contractions and apostrophations. The
post-generator is usually dormant (just copies the input to the output) until a
special alarm symbol contained in some target-language surface forms wakes it up to
perform a particular string transformation if necessary; then it goes back to sleep.

-s, --sao
Input processing is in orthoepikon (previously sao) annotation system format:
https://orthoepikon.sf.net.

-t, --transliteration
Apply a transliteration dictionary

-i icx_file, --ignored-chars icx_file
Ignores characters specified in the file icx_file

-z, --null-flush
Flush output on the null character

-C, --careful-case
Use dictionary case if present, else surface

-N, --analyses
Output no more than N analyses (if the transducer is weighted, the N best analyses)

-L, --weight-classes
Output no more than N best weight classes (where analyses with equal weight
constitute a class)

-W, --show-weights
Print final analysis weights (if any)

-v, --version
Display the version number.

-h, --help
Display this help.

FILES

     input_file
             The input compiled dictionary.

COPYRIGHT

     Copyright © 2005, 2006 Universitat d'Alacant / Universidad de Alicante.  This is free
     software.  You may redistribute copies of it under the terms of the GNU General Public
     License: https://www.gnu.org/licenses/gpl.html.

BUGS

     Many... lurking in the dark and waiting for you!