Ubuntu Manpage: apertium-tagger — part-of-speech tagger and trainer for Apertium

Provided by: apertium_3.6.1-1build2_amd64

NAME

       apertium-tagger — part-of-speech tagger and trainer for Apertium

SYNOPSIS

       apertium-tagger [options] -g serialized_tagger [input [output]]
       apertium-tagger [options] -r iterations corpus serialized_tagger
       apertium-tagger  [options]  -s  iterations  dictionary corpus tagger_spec serialized_tagger tagged_corpus
                       untagged_corpus
       apertium-tagger [options] -s 0 dictionary tagger_spec serialized_tagger tagged_corpus untagged_corpus
       apertium-tagger [options] -s 0 -u model serialized_tagger tagged_corpus
       apertium-tagger [options] -t iterations dictionary corpus tagger_spec serialized_tagger

DESCRIPTION

       apertium-tagger is the application  responsible  for  the  apertium  part-of-speech  tagger  training  or
       tagging, depending on the calling options.  This command only reads from the standard input if the option
       --tagger or -g is used.

OPTIONS

-t n, --train n
Initializes parameters through Kupiec's method (unsupervised), then performs n iterations of the
Baum-Welch training algorithm (unsupervised).

-s n, --supervised n
Initializes parameters against a hand-tagged text (supervised) through the maximum likelihood
estimate method, then performs n iterations of the Baum-Welch training algorithm (unsupervised).
The CRP argument can be omitted only when n = 0.

-r n, --retrain -n
Retrains the model with n additional Baum-Welch iterations (unsupervised).

-g, --tagger
Tags input text by means of Viterbi algorithm.

-p, --show-superficial
Prints the superficial form of the word along side the lexical form in the output stream.

-f, --first
Used in conjunction with -g (--tagger) makes the tagger give all lexical forms of each word, with
the chosen one in the first place (after the lemma)

-d, --debug
Print error (if any) or debug messages while operating.

-m, --mark
Mark disambiguated words.

-h, --help
Display a help message.

-z, --null-flush
Used in conjunction with
.Fl g to flush the output after getting each null character.

-u, --unigram=MODEL
use unigram algorithm MODEL from <http://coltekin.net/cagri/papers/trmorph-tools.pdf>

-w, --sliding-window
use the Light Sliding Window algorithm

-x, --perceptron
use the averaged perceptron algorithm

-e, --skip-on-error
Used with Fl xs to ignore certain types of errors with the training corpus

FILES

       These are the kinds of files used with each option:

       dictionary
               Full expanded dictionary file

       corpus  Training text corpus file

       tagger_spec
               Tagger specification file, in XML format

       serialized_tagger
               Tagger data file, built in the training and used while tagging

       tagged_corpus
               Hand-tagged text corpus

       untagged_corpus
               Untagged text corpus, morphological analysis of hand-tagged corpus to use both  jointly  with  -s
               option

       input   Input file, stdin by default

       output  Output file, stdout by default

COPYRIGHT

       Copyright  © 2005, 2006 Universitat d'Alacant / Universidad de Alicante.  This is free software.  You may
       redistribute   copies   of   it   under   the   terms   of    the    GNU    General    Public    License:
       https://www.gnu.org/licenses/gpl.html.

BUGS

       Many... lurking in the dark and waiting for you!

Apertium                                         August 30, 2006                              APERTIUM-TAGGER(1)