Provided by: kytea_0.4.6+dfsg-2_amd64
NAME
kytea — a word segmentation/pronunciation estimation tool
SYNOPSIS
train-kytea [options]
DESCRIPTION
This manual page documents briefly the train-kytea command. This manual page was written for the Debian distribution because the original program does not have a manual page. Instead, it has documentation in the GNU Info format; see below. kytea is morphological analysis system based on pointwise predictors. It separetes sentences into words, tagging and predict pronunciations. The pronunciation of KyTea is same as cutie.
OPTIONS
A summary of options is included below. Input/Output Options: -encode The text encoding to be used (utf8/euc/sjis; default: utf8) -full A fully annotated training corpus (multiple possible) -tok A training corpus that is tokenized with no tags (multiple possible) -part A partially annotated training corpus (multiple possible) -conf A confidence annotated training corpus (multiple possible) -feat A file containing features generated by -featout -dict A dictionary file (one 'word/pron' entry per line, multiple possible) -subword A file of subword units. This will enable unknown word PE. -model The file to write the trained model to -modtext Print a text model (instead of the default binary) -featout Write the features used in training the model to this file Model Training Options (basic) -nows Don't train a word segmentation model -notags Skip the training of tagging, do only word segmentation -global Train the nth tag with a global model (good for POS, bad for PE) -debug The debugging level during training (0=silent, 1=normal, 2=detailed) Model Training Options (for advanced users): -charw The character window to use for WS (3) -charn The character n-gram length to use for WS for WS (3) -typew The character type window to use for WS (3) -typen The character type n-gram length to use for WS for WS (3) -dictn Dictionary words greater than -dictn will be grouped together (4) -unkn Language model n-gram order for unknown words (3) -eps The epsilon stopping criterion for classifier training -cost The cost hyperparameter for classifier training -nobias Don't use a bias value in classifier training -solver The solver (1=SVM, 7=logistic regression, etc.; default 1, see LIBLINEAR documentation for more details) Format Options (for advanced users): -wordbound The separator for words in full annotation (" ") -tagbound The separator for tags in full/partial annotation ("/") -elembound The separator for candidates in full/partial annotation ("&") -unkbound Indicates unannotated boundaries in partial annotation (" ") -skipbound Indicates skipped boundaries in partial annotation ("?") -nobound Indicates non-existence of boundaries in partial annotation ("-") -hasbound Indicates existence of boundaries in partial annotation ("|")
AUTHOR
This manual page was written by Koichi Akabe vbkaisetsu@gmail.com for the Debian system (and may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. TRAIN-KYTEA(1)