Ubuntu Manpage: frog - Dutch Natural Language Toolkit

NAME

       frog - Dutch Natural Language Toolkit

SYNOPSIS

       frog [options]

       frog -t test-file

DESCRIPTION

       frog  is  an  integration of memory‐-based natural language processing (NLP) modules developed for Dutch.
       frog's current version will tokenize, tag, lemmatize, and morphologically segment word  tokens  in  Dutch
       text files, add IOB chunks and will assign a dependency graph to each sentence.

OPTIONS

       -c <configfile>
              set the configuration using 'file'

       --debug=<modele><level>,...
              set  debug  level  per module.  Tokenizer (t), Lemmatizer (l), Morphological Analyzer (a), Chunker
              (c), Multi‐Word Units (m), Named Entity Recognition (n), or Parser (p).

              (e.g. --debug=l5,n3 sets the level for the Lemmatizer to 5 and for the NER to 3 )

       -d <level>
              set global debug level. (for all modules)

       --deep‐morph
              generate a deep morphological analisys and  add  it  to  the  XML.  This  also  includes  compound
              information.  The default tabbed output is also more detailed in the Morpheme field.

       -e <encoding>
              set input encoding. (default UTF8)

       -h
              give some help

       --keep-parser-files=[yes|no]
              keep the intermediate files from the parser. Last sentence only!

       -n
              assume inputfile to hold one sentence per line.

              Very useful when running interactive, otherwise an empty line is needed to signal end of input.

       -o <file>
              send  output  to  'file'  instead  of  stdout.  Defaults  to the name of the inputfile with '.out'
              appended.

       --outputdir <dir>
              send all output to 'dir' instead of stdout.  Creates  filenames  from  the  inputfilename(s)  with
              '.out' appended.

       --skip=[aclmnpt]
              skip parts of the process: Tokenizer (t), Chunker (c), Lemmatizer (l), Morphological Analyzer (a),
              Multi‐Word unit (m), Named‐Entity recognizer (n) or Parser (p)

       -Q
              Enable quotedetection in the tokenizer. May run havock!

       -S <port>
              Run a server on 'port'

       -t <file>
              process 'file'.

              When -t is omitted, Frog will run in interactive mode.

       -x <xmlfile>
              process  'xmlfile',  which  is  supposed  to  be  in  FoLiA  format!  If  'xmlfile'  is empty, and
              --testdir=<dir> is provided, all '.xml' files in 'dir' will be processed as FoLia XML.

       --textclass=<cls>
              When -x is given, use 'cls' to find text in the FoLiA document(s).

       --testdir=<dir>
              process all files in 'dir'. When the input mode is XML, only '.xml' files are  teken  from  'dir'.
              see also --outputdir

       --tmpdir=<dir>
              location to store intermediate files. Default /tmp.

       --threads=<n>
              use  a maximum of 'n' threads. The default is to take whatever is needed.  In servermode we always
              run on 1 thread per session.

       -V or --version
              show version info

       --xmldir=<dir>
              generate FoLiA XML output and send it to 'dir'. Creates  filenames  from  the  inputfilename  with
              '.xml' appended. (Except when it already ends with '.xml')

       -X <file>
              generate  FoLiA  XML  output  and send it to 'file'. Defaults to the name of the inputfile(s) with
              '.xml' appended. (Except when it already ends with '.xml')

       --id=<id>
              When -X for FoLia is given, use 'id' to give the doc an ID.

BUGS

       likely

AUTHORS

       Maarten van Gompel proycon@anaproy.nl

       Ko van der Sloot Timbl@uvt.nl

       Antal van den Bosch Timbl@uvt.nl

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

BUGS

AUTHORS

SEE ALSO