Ubuntu Manpage: hfst-tokenize - =perform matching/lookup on text streams

NAME

       hfst-tokenize - =perform matching/lookup on text streams

SYNOPSIS

       hfst-tokenize [--segment | --xerox | --cg | --giella-cg] [OPTIONS...] RULESET

DESCRIPTION

       perform matching/lookup on text streams

   Common options:
       -h, --help
              Print help message

       -V, --version
              Print version info

       -v, --verbose
              Print verbosely while processing

       -q, --quiet
              Only print fatal erros and requested output

       -s, --silent
              Alias of --quiet

       -n, --newline
              Newline as input separator (default is blank line)

       -a, --print-all
              Print nonmatching text

       -w, --print-weight
              Print weights (overrides earlier -W option)

       -W, --no-weights
              Don't print weights (default; overrides earlier -w, or -w implied by -g, options)

       -m, --tokenize-multichar Tokenize multicharacter symbols
              (by  default  only one utf-8 character is tokenized at a time regardless of what is present in the
              alphabet)

       -b, --beam=B
              Output only analyses whose weight is within B from best result

       -tS, --time-cutoff=S
              Limit search after having used S seconds per input

       -lN, --weight-classes=N
              Output no more than N best weight classes (where analyses with equal weight constitute a class

       -u, --unique
              Remove duplicate analyses

       -z, --segment
              Segmenting / tokenization mode (default)

       -i, --space-separated
              Tokenization with one sentence per line, space-separated tokens

       -x, --xerox
              Xerox output

       -c, --cg
              Constraint Grammar output

       -S, --superblanks
              Ignore contents of unescaped [] (cf. apertium-destxt); flush on NUL

       -g, --giella-cg
              CG format used in Giella  infrastructure  (implies  -w  and  -l2,  treats  @PMATCH_INPUT_MARK@  as
              subreading separator, expects tags to be Multichar_symbols, flush on NUL)

       -C  --conllu
              CoNLL-U format

       -f, --finnpos
              FinnPos output

       -L, --visl
              VISL input and output (implies -W, handles <s> as blocks and <STYLE> inline)

       Use standard streams for input and output (for now).

REPORTING BUGS

       Report     bugs     to    <hfst-bugs@helsinki.fi>    or    directly    to    our    bug    tracker    at:
       <https://github.com/hfst/hfst/issues>

       hfst-tokenize home page: <https://github.com/hfst/hfst/wiki/HfstTokenize>
       General help using HFST software: <https://github.com/hfst/hfst/wiki>

COPYRIGHT

       Copyright   ©    2017    University    of    Helsinki,    License    GPLv3:    GNU    GPL    version    3
       <http://gnu.org/licenses/gpl.html>
       This  is  free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent
       permitted by law.

HFST                                               August 2018                                  HFST-TOKENIZE(1)