lunar (1) apertium-tagger.1.gz

Provided by: apertium_3.8.3-1build2_amd64 bug

NAME

     apertium-tagger — part-of-speech tagger and trainer for Apertium

SYNOPSIS

     apertium-tagger [options] -g serialized_tagger [input [output]]
     apertium-tagger [options] -r iterations corpus serialized_tagger
     apertium-tagger [options] -s iterations dictionary corpus tagger_spec serialized_tagger
                     tagged_corpus untagged_corpus
     apertium-tagger [options] -s 0 dictionary tagger_spec serialized_tagger tagged_corpus
                     untagged_corpus
     apertium-tagger [options] -s 0 -u model serialized_tagger tagged_corpus
     apertium-tagger [options] -t iterations dictionary corpus tagger_spec serialized_tagger

DESCRIPTION

     apertium-tagger is the application responsible for the apertium part-of-speech tagger
     training or tagging, depending on the calling options.  This command only reads from the
     standard input if the option --tagger or -g is used.

MODES

     -g, --tagger
             Tags input text by means of Viterbi algorithm.

     -r n, --retrain n
             Retrains the model with n additional Baum-Welch iterations (unsupervised).  This
             option is incompatible with -u (--unigram)

     -s n, --supervised n
             Initializes parameters against a hand-tagged text (supervised) through the maximum
             likelihood estimate method, then performs n iterations of the Baum-Welch training
             algorithm (unsupervised).  The CRP argument can be omitted only when n = 0.

     -t n, --train n
             Initializes parameters through Kupiec's method (unsupervised), then performs n
             iterations of the Baum-Welch training algorithm (unsupervised).

MODELS

     -u, --unigram=MODEL
             use unigram algorithm MODEL from <https://coltekin.net/cagri/papers/trmorph-
             tools.pdf>

     -w, --sliding-window
             use the Light Sliding Window algorithm

     -x, --perceptron
             use the averaged perceptron algorithm

OPTIONS

     -d, --debug
             Print error (if any) or debug messages while operating.

     -e, --skip-on-error
             Used with -xs to ignore certain types of errors with the training corpus

     -f, --first
             Used in conjunction with -g (--tagger) makes the tagger give all lexical forms of
             each word, with the chosen one in the first place (after the lemma)

     -m, --mark
             Mark disambiguated words.

     -p, --show-superficial
             Prints the superficial form of the word along side the lexical form in the output
             stream.

     -z, --null-flush
             Used in conjunction with -g (--tagger) to flush the output after getting each null
             character.

     --help  Display a help message.

FILES

     These are the kinds of files used with each option:

     dictionary
             Full expanded dictionary file

     corpus  Training text corpus file

     tagger_spec
             Tagger specification file, in XML format

     serialized_tagger
             Tagger data file, built in the training and used while tagging

     tagged_corpus
             Hand-tagged text corpus

     untagged_corpus
             Untagged text corpus, morphological analysis of hand-tagged corpus to use both
             jointly with -s option

     input   Input file, stdin by default

     output  Output file, stdout by default

SEE ALSO

     apertium(1), lt-comp(1), lt-expand(1), lt-proc(1)

     Copyright © 2005, 2006 Universitat d'Alacant / Universidad de Alicante.  This is free
     software.  You may redistribute copies of it under the terms of the GNU General Public
     License: https://www.gnu.org/licenses/gpl.html.

BUGS

     Many... lurking in the dark and waiting for you!