bionic (1) hfst-tokenize.1.gz

Provided by: hfst_3.13.0~r3461-2_amd64 bug

NAME

       hfst-tokenize - =perform matching/lookup on text streams

SYNOPSIS

       hfst-tokenize [--segment | --xerox | --cg | --giella-cg] [OPTIONS...] RULESET

DESCRIPTION

       perform matching/lookup on text streams

   Common options:
       -h, --help
              Print help message

       -V, --version
              Print version info

       -v, --verbose
              Print verbosely while processing

       -q, --quiet
              Only print fatal erros and requested output

       -s, --silent
              Alias of --quiet

       -n, --newline
              Newline as input separator (default is blank line)

       -a, --print-all
              Print nonmatching text

       -w, --print-weight
              Print weights

       -m, --tokenize-multichar Tokenize multicharacter symbols
              (by  default  only one utf-8 character is tokenized at a time regardless of what is present in the
              alphabet)

       -tS, --time-cutoff=S
              Limit search after having used S seconds per input

       -lN, --weight-classes=N
              Output no more than N best weight classes (where analyses with equal weight constitute a class

       -u, --unique
              Remove duplicate analyses

       -z, --segment
              Segmenting / tokenization mode (default)

       -i, --space-separated
              Tokenization with one sentence per line, space-separated tokens

       -x, --xerox
              Xerox output

       -c, --cg
              Constraint Grammar output

       -S, --superblanks
              Ignore contents of unescaped [] (cf. apertium-destxt); flush on NUL

       -g, --giella-cg
              CG format used in Giella infrastructe  (implies  -l2,  treats  @PMATCH_INPUT_MARK@  as  subreading
              separator, expects tags to start or end with +, flush on NUL)

       -C  --conllu
              CoNLL-U format

       -f, --finnpos
              FinnPos output

       Use standard streams for input and output (for now).

REPORTING BUGS

       Report     bugs     to    <hfst-bugs@helsinki.fi>    or    directly    to    our    bug    tracker    at:
       <https://github.com/hfst/hfst/issues>

       hfst-tokenize home page: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki//HfstTokenize>
       General help using HFST software: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki//HfstHome>

       Copyright   ©    2017    University    of    Helsinki,    License    GPLv3:    GNU    GPL    version    3
       <http://gnu.org/licenses/gpl.html>
       This  is  free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent
       permitted by law.