Ubuntu Manpage: lt-proc — lexical processor for Apertium

Provided by: lttoolbox_3.7.6-1build3_amd64

NAME

       lt-proc — lexical processor for Apertium

SYNOPSIS

       lt-proc  [-a  | -b | -o | -c | -d | -e | -g | -h | -p | -s | -t | -v | -h | -z | -w] [-W] [-N -N] [-L -N]
               [-i icx_file] fst_file [input_file [output_file]]

DESCRIPTION

       lt-proc is the application responsible for providing the four lexical processing functionalities:

       •   morphological analyser (option -a)

       •   lexical transfer (option -n)

       •   morphological generator (option -g)

       •   post-generator (option -p)

       It accomplishes these tasks by reading binary files containing a compact and efficient representation  of
       dictionaries  (a class of finite-state transducers called augmented letter transducers).  These files are
       generated by lt-comp(1).

       It is worth mentioning that some characters (‘[’, ‘]’, ‘$’, ‘^’, ‘/’, ‘+’) are  special  chars  used  for
       format  and  encapsulation.   They  should  be  escaped  if they have to be used literally, for instance:
       ‘[’...‘]’ are ignored and the format of a linefeed is ‘^...$’.

OPTIONS

       -a, --analysis
               Tokenizes the text in surface forms (lexical units as they appear in  texts)  and  delivers,  for
               each  surface  form,  one  or  more  lexical  forms  consisting  of  lemma,  lexical category and
               morphological inflection information.  Tokenization is not straightforward due to the  existence,
               on  the  one  hand,  of  contractions,  and, on the other hand, of multi-word lexical units.  For
               contractions, the system reads in a single surface form and delivers the  corresponding  sequence
               of  lexical  forms.   Multi-word  surface  forms  are  analysed in a left-to-right, longest-match
               fashion.  Multi-word surface forms may  be  invariable  (such  as  a  multi-word  preposition  or
               conjunction)  or  inflected  (for example, in es, “echaban de menos”, “they missed”, is a form of
               the imperfect indicative tense of the verb “echar de menos”, “to  miss”).   Limited  support  for
               some  kinds  of  discontinuous  multi-word  units  is  also available.  Single-word surface forms
               analysis produces output like the one in these examples:

               “cantar”          →          “^cantar/cantar<vblex><inf>$”          or          “daba”          →
               “^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$”.

       -b, --bilingual
               Does  lexical  transference,  attaching  queues  of  morphological  symbols  not specified in the
               dictionaries.  As the analysis mode, supports multiple lexical forms in the target language for a
               given  lexical  form  in  the  source   language.    Works   typically   with   the   output   of
               apertium-pretransfer(1).

       -o, --surf-bilingual
               As  with  -b,  but takes input from apertium-tagger(1) -p, with surface forms, and if the lexical
               form is not found in the bilingual dictionary, it outputs the surface form of the word.

       -c, --case-sensitive
               Use the literal case of the incoming characters

       -d, --debugged-gen
               Morphological generation with all the stuff

       -e, --decompose-compounds
               Try to treat unknown words as compounds, and decompose them.

       -w, --dictionary-case
               Use the case information contained in the lexicon, instead of the surface case (only  applied  in
               analysis mode).

       -g, --generation
               Delivers  a  target-language  surface  form  for  each  target-language lexical form, by suitably
               inflecting it.

       -n, --non-marked-gen
               Morphological generation (like -g) but without unknown word marks (asterisk ‘*’).

       -b, --tagged-gen
               Morphological generation (like -g) but retaining part-of-speech tags.

       -p, --post-generation
               Performs orthographical operations such as contractions and apostrophations.  The  post-generator
               is  usually  dormant (just copies the input to the output) until a special alarm symbol contained
               in some target-language surface forms wakes it up to perform a particular  string  transformation
               if necessary; then it goes back to sleep.

       -s, --sao
               Input    processing    is   in   orthoepikon   (previously   sao)   annotation   system   format:
               https://orthoepikon.sf.net.

       -t, --transliteration
               Apply a transliteration dictionary

       -i icx_file, --ignored-chars icx_file
               Ignores characters specified in the file icx_file

       -z, --null-flush
               Flush output on the null character

       -C, --careful-case
               Use dictionary case if present, else surface

       -N, --analyses
               Output no more than N analyses (if the transducer is weighted, the N best analyses)

       -L, --weight-classes
               Output no more than N best weight classes (where analyses with equal weight constitute a class)

       -W, --show-weights
               Print final analysis weights (if any)

       -v, --version
               Display the version number.

       -h, --help
               Display this help.

FILES

       input_file
               The input compiled dictionary.

COPYRIGHT

       Copyright © 2005, 2006 Universitat d'Alacant / Universidad de Alicante.  This is free software.  You  may
       redistribute    copies    of    it    under    the   terms   of   the   GNU   General   Public   License:
       https://www.gnu.org/licenses/gpl.html.

BUGS

       Many... lurking in the dark and waiting for you!

Apertium                                         March 23, 2006                                       LT-PROC(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

FILES

SEE ALSO

COPYRIGHT

BUGS