lunar (1) hfst-tokenize.1.gz

Provided by: hfst_3.16.0-4build2_amd64 bug

NAME

       hfst-tokenize - =perform matching/lookup on text streams

SYNOPSIS

       hfst-tokenize [--segment | --xerox | --cg | --giella-cg] [OPTIONS...] RULESET

DESCRIPTION

       perform matching/lookup on text streams

   Common options:
       -h, --help
              Print help message

       -V, --version
              Print version info

       -v, --verbose
              Print verbosely while processing

       -q, --quiet
              Only print fatal erros and requested output

       -s, --silent
              Alias of --quiet

       -n, --newline
              Newline as input separator (default is blank line)

       -a, --print-all
              Print nonmatching text

       -w, --print-weight
              Print weights (overrides earlier -W option)

       -W, --no-weights
              Don't print weights (default; overrides earlier -w, or -w implied by -g, options)

       -m, --tokenize-multichar Tokenize multicharacter symbols
              (by  default  only one utf-8 character is tokenized at a time regardless of what is
              present in the alphabet)

       -b, --beam=B
              Output only analyses whose weight is within B from best result

       -tS, --time-cutoff=S
              Limit search after having used S seconds per input

       -lN, --weight-classes=N
              Output no more than N  best  weight  classes  (where  analyses  with  equal  weight
              constitute a class

       -u, --unique
              Remove duplicate analyses

       -z, --segment
              Segmenting / tokenization mode (default)

       -i, --space-separated
              Tokenization with one sentence per line, space-separated tokens

       -x, --xerox
              Xerox output

       -c, --cg
              Constraint Grammar output

       -S, --superblanks
              Ignore contents of unescaped [] (cf. apertium-destxt); flush on NUL

       -g, --giella-cg
              CG   format   used   in   Giella   infrastructure   (implies  -w  and  -l2,  treats
              @PMATCH_INPUT_MARK@ as subreading separator, expects tags to be  Multichar_symbols,
              flush on NUL)

       -C  --conllu
              CoNLL-U format

       -f, --finnpos
              FinnPos output

       -L, --visl
              VISL input and output (implies -W, handles <s> as blocks and <STYLE> inline)

       Use standard streams for input and output (for now).

REPORTING BUGS

       Report   bugs   to   <hfst-bugs@helsinki.fi>   or   directly   to   our  bug  tracker  at:
       <https://github.com/hfst/hfst/issues>

       hfst-tokenize home page: <https://github.com/hfst/hfst/wiki/HfstTokenize>
       General help using HFST software: <https://github.com/hfst/hfst/wiki>

       Copyright  ©  2017  University  of  Helsinki,   License   GPLv3:   GNU   GPL   version   3
       <http://gnu.org/licenses/gpl.html>
       This  is free software: you are free to change and redistribute it.  There is NO WARRANTY,
       to the extent permitted by law.