Ubuntu Manpage: frog - Dutch Natural Language Toolkit

NAME

       frog - Dutch Natural Language Toolkit

SYNOPSIS

       frog [options]

       frog -t test-file

DESCRIPTION

       frog  is  an  integration  of  memory‐-based  natural  language  processing  (NLP) modules
       developed  for  Dutch.   frog's  current  version  will  tokenize,  tag,  lemmatize,   and
       morphologically  segment word tokens in Dutch text files, add IOB chunks and will assign a
       dependency graph to each sentence.

OPTIONS

       -c <configfile>
              set the configuration using 'file'

       --debug=<modele><level>,...
              set debug level per module.  Tokenizer (t), Lemmatizer (l), Morphological  Analyzer
              (a),  Chunker  (c),  Multi‐Word  Units (m), Named Entity Recognition (n), or Parser
              (p).

              (e.g. --debug=l5,n3 sets the level for the Lemmatizer to 5 and for the NER to 3 )

       -d <level>
              set global debug level. (for all modules)

       --deep‐morph
              generate a deep morphological analisys and add it to the XML.  This  also  includes
              compound  information.   The  default  tabbed  output  is also more detailed in the
              Morpheme field.

       -e <encoding>
              set input encoding. (default UTF8)

       -h or --help
              give some help

       --keep-parser-files=[yes|no]
              keep the intermediate files from the parser. Last sentence only!

       --language='commaseparatedlistoflanguages'
              Set the languages to work on. This parameter is also passed to the tokenizer.   The
              strings are assumed to be ISO 639-2 codes.

              The  first  language  in  the  list  will be the default, unspecified languages are
              asumed to be that default.

              e.g. --language=nld,eng,por means: detect Dutch, English and Portuguese, with Dutch
              being the default.

       -n
              assume inputfile to hold one sentence per line.

              Very  useful  when running interactive, otherwise an empty line is needed to signal
              end of input.

       --nostdout
              suppress the collumned output to stdout. (when no outputfile is specified  with  -o
              or --outputdir)

              Especially useful when XML output is speifies with -X or --xmldir.

       -o <file>
              send output to 'file' instead of stdout. Defaults to the name of the inputfile with
              '.out' appended.

       --outputdir <dir>
              send  all  output  to  'dir'  instead  of  stdout.  Creates  filenames   from   the
              inputfilename(s) with '.out' appended.

       --retry
              assume  a  re-run on the same input file(s). Frog wil only process those files that
              haven't been processed yet. This is accomplished by  looking  at  the  output  file
              names. (so this has no effect if neither -o, --outputdir, -X or --xmldir is used)

       --skip=[aclmnpt]
              skip   parts   of   the  process:  Tokenizer  (t),  Chunker  (c),  Lemmatizer  (l),
              Morphological Analyzer (a), Multi‐Word unit (m),  Named‐Entity  recognizer  (n)  or
              Parser (p)

       -Q
              Enable quotedetection in the tokenizer. May run havock!

       -S <port>
              Run a server on 'port'

       -t <file>
              process 'file'.

              When -t is omitted, Frog will run in interactive mode.

       -x <xmlfile>
              process  'xmlfile', which is supposed to be in FoLiA format! If 'xmlfile' is empty,
              and --testdir=<dir> is provided, all '.xml' files in 'dir'  will  be  processed  as
              FoLia XML.

       --textclass=<cls>
              When -x is given, use 'cls' to find AND store text in the FoLiA document(s).  Using
              --inputclass and --ptclass is in general a better choice.

       --inputclass=<cls>
              use 'cls' to find text in the FoLiA input document(s).

       --outputclass=<cls>
              use 'cls' to output text in  the  FoLiA  input  document(s).   Preferably  this  is
              another class then the inputclass.

       --testdir=<dir>
              process all files in 'dir'. When the input mode is XML, only '.xml' files are teken
              from 'dir'. see also --outputdir

       --tmpdir=<dir>
              location to store intermediate files. Default /tmp.

       --uttmarker=<mark>
              assume all utterances are separated by 'mark'. (the default is none).

       --threads=<n>
              use a maximum of 'n' threads. The default  is  to  take  whatever  is  needed.   In
              servermode we always run on 1 thread per session.

       -V or --version
              show version info

       --xmldir=<dir>
              generate  FoLiA  XML  output  and  send  it  to  'dir'.  Creates filenames from the
              inputfilename with '.xml' appended. (Except when it already ends with '.xml')

       -X <file>
              generate FoLiA XML output and send it to  'file'.  Defaults  to  the  name  of  the
              inputfile(s) with '.xml' appended. (Except when it already ends with '.xml')

       --id=<id>
              When -X for FoLia is given, use 'id' to give the doc an ID.

BUGS

       likely

AUTHORS

       Maarten van Gompel

       Ko van der Sloot

       Antal van den Bosch

       e-mail: lamasoftware@science.ru.nl

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

BUGS

AUTHORS

SEE ALSO