Provided by: apertium_3.7.2-2build2_amd64 

NAME
apertium-tagger — part-of-speech tagger and trainer for Apertium
SYNOPSIS
apertium-tagger [options] -g serialized_tagger [input [output]]
apertium-tagger [options] -r iterations corpus serialized_tagger
apertium-tagger [options] -s iterations dictionary corpus tagger_spec serialized_tagger tagged_corpus
untagged_corpus
apertium-tagger [options] -s 0 dictionary tagger_spec serialized_tagger tagged_corpus untagged_corpus
apertium-tagger [options] -s 0 -u model serialized_tagger tagged_corpus
apertium-tagger [options] -t iterations dictionary corpus tagger_spec serialized_tagger
DESCRIPTION
apertium-tagger is the application responsible for the apertium part-of-speech tagger training or
tagging, depending on the calling options. This command only reads from the standard input if the option
--tagger or -g is used.
MODES
-g, --tagger
Tags input text by means of Viterbi algorithm.
-r n, --retrain n
Retrains the model with n additional Baum-Welch iterations (unsupervised). This option is
incompatible with -u (--unigram)
-s n, --supervised n
Initializes parameters against a hand-tagged text (supervised) through the maximum likelihood
estimate method, then performs n iterations of the Baum-Welch training algorithm (unsupervised).
The CRP argument can be omitted only when n = 0.
-t n, --train n
Initializes parameters through Kupiec's method (unsupervised), then performs n iterations of the
Baum-Welch training algorithm (unsupervised).
MODELS
-u, --unigram=MODEL
use unigram algorithm MODEL from <https://coltekin.net/cagri/papers/trmorph-tools.pdf>
-w, --sliding-window
use the Light Sliding Window algorithm
-x, --perceptron
use the averaged perceptron algorithm
OPTIONS
-d, --debug
Print error (if any) or debug messages while operating.
-e, --skip-on-error
Used with -xs to ignore certain types of errors with the training corpus
-f, --first
Used in conjunction with -g (--tagger) makes the tagger give all lexical forms of each word, with
the chosen one in the first place (after the lemma)
-m, --mark
Mark disambiguated words.
-p, --show-superficial
Prints the superficial form of the word along side the lexical form in the output stream.
-z, --null-flush
Used in conjunction with -g (--tagger) to flush the output after getting each null character.
--help Display a help message.
FILES
These are the kinds of files used with each option:
dictionary
Full expanded dictionary file
corpus Training text corpus file
tagger_spec
Tagger specification file, in XML format
serialized_tagger
Tagger data file, built in the training and used while tagging
tagged_corpus
Hand-tagged text corpus
untagged_corpus
Untagged text corpus, morphological analysis of hand-tagged corpus to use both jointly with -s
option
input Input file, stdin by default
output Output file, stdout by default
SEE ALSO
apertium(1), lt-comp(1), lt-expand(1), lt-proc(1)
COPYRIGHT
Copyright © 2005, 2006 Universitat d'Alacant / Universidad de Alicante. This is free software. You may
redistribute copies of it under the terms of the GNU General Public License:
https://www.gnu.org/licenses/gpl.html.
BUGS
Many... lurking in the dark and waiting for you!
Apertium February 22, 2021 APERTIUM-TAGGER(1)