Ubuntu Manpage: mmorph - MULTEXT morphology tool

NAME

       mmorph - MULTEXT morphology tool

SYNOPSIS

       information:
              mmorph [ -vh ]

       parse only:
              mmorph -y | -z [ -a addfile ]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       generate:
              mmorph -c | -n [ -t trace_level ] [ -s trace_level ] [ -a addfile ]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       simple lookup:
              mmorph [ -fi ] [ -b | -k ] [ -r rejectfile ]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       record/field lookup:
              mmorph -C classes [ -fU ] [ -E | -O ] [ -b | [ -k ] [ -B class ]]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       dump database:
              mmorph -p | -q
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

DESCRIPTION

In the simplest mode of operation, with just the -m morphfile option, mmorph operates in lookup mode: it
will open an existing database called morphfile.db and lookup all the string segments (usually
corresponding to words) in the input.

To create the database from the lexical entries specified in "morphfile", use -c -m morphfile. The file
morphfile.db should not exist. When the database is complete it will lookup the segments in the input.
If used ineractively (input and output is a terminal), a prompt is printed when the program expects the
user to type a segment string. No prompting occurs in record/field mode.

To test the rule applications on the lexical entries specified in morphfile, without creating a database
and without looking up segments, use -n -m morphfile. This automatically sets the trace level to 1 if it
was not specified.

In order to do the same operations as above, but on the alternate set of lexical entries in addfile, use
the extra option -a addfile. The lexical entries in morphfile will be ignored. This is useful when
making additions to a standard morphological description. Be aware that entries added to the database
morphfile.db do not replace existing ones.

How to test a morphological description
Use the -n option. In the Grammar section, specify goal rules that will match the desired results. In
the Lexicon section specify the lexical items you want to test. When running all rules will be applied
(recursively) to the lexical items, if the rule is a goal, then the result of the application is printed
on the output.

Suggestion: Put the two parts mentioned above (goal rules and Lexicon section) in separate files and
reference these files with an #include directive where they should occur in the main input file.

If you are using an existing description and want to test only new lexical entries, use the options -n -a
addfile, and put the lexical entries in addfile.

OPTIONS

-a addfile
Ignore lexical entries in morphfile, take them from addfile instead.

-B class
Specifies the record class that occurs before the beginning of a sentence. Capitalized words
occurring just after such records will also be looked up with all their letters converted to
lowercase (according to LC_CTYPE, see below).

-b fold case before lookup. Uppercase letters are converted to lowercase letters (according to
LC_CTYPE, see below) before a word is looked up.

-C classes
Determines record/field mode. Specifies the record classes that should be looked up. Class names
should be separated by comma ",", TAB, space, bar "|" or backslash "\".

-c Create a new database for lookup. The name of the created file is the name of morphfile (-m
option) with suffix .db. It should not exist; if it exists the user should remove it manually
before running mmorph -c (this is a minimal protection against accidental overwriting a database
that might have taken a long time to create).

-d debug_map
Specify which debug options are wanted. Each bit in debug_map corresponds to an option.
bit decimal hexadecimal purpose
no bits 0 0x0 no debug option (default)
1 1 0x1 debug initialisation
2 2 0x2 debug yacc parsing
3 4 0x4 debug rule combination
4 8 0x8 debug spelling application
5 16 0x10 print statistics with -p or -q options
all bits -1 0xffff all debug options whatever they are
To combine options add the decimal or hexadecimal values together. Example: -t 0x5 specifies bits
(options) 1 and 4.

-E In record/field mode, extends the morphology annotations if they already exist (the default is to
leave existing annotations as is).

-O In record/field mode, overwrite the morphology annotations if they already exist (the default is
to leave existing annotations as is).

-f Flush the output after each segment lookup. This is useful only if input and output are piped from
and to a program that needs to synchronize them.

-h Print help and exit.

-i Prepend the result of each lookup with the identifier of the input segment it corresponds to.
Currently input segments are identified by their sequential number, starting at 0. With this
indication, the extra newline separating the solutions for different input segments is not printed
because it is not needed. If a lookup has no solutions, only the segment identifier is printed on
the output. The segment identifier is also prepended to rejected segments. A tab always follows
the segment identifier.

-k fallback fold case. If a word lookup failed, then convert all uppercase letters to lowercase and
try lookup again. (conversion is done according to LC_CTYPE, see below).

-l logfile
Specify the file for writing trace and error messages. Defaults to standard error.

-m morphfile
Specify the file containing the morphology description. See mmorph (5) for a description of the
formalism's syntax.

-n No database creation or lookup (test mode).

-p Dump the typed feature structure database to outfile (or standard output). The count of distinct
tfs is given in the logfile (or standard error) if bit 5 of debug option is set.

-q Dump the forms in the database to outfile (or standard output). Some statistics are given in the
logfile (or standard error) if bit 5 of debug option is set.

-r rejectfile
In non record/field mode, specifies the file where to write input segments that could not be
looked up. Defaults to standard error.

-s trace_level
Trace spelling rule application:
0 no tracing (default).
1 trace valid surface forms.
2 trace rules whose lexical part match.
3 trace surface left context match (surface word construction).
4 trace surface right context mismatch and rule blocking.
5 trace rule non blocking.
A trace_level implies all preceding ones.

-t trace_level
Specify the level of tracing for rule application:
0 no tracing (default).
1 trace goal rules that apply.
2 trace all rules that apply, indentation indicates the recursion depth.
10 trace also rules that were tried but did not apply
A trace_level implies all preceding ones.

-U In record/field mode, unknown words (i.e. that were unsuccessfully looked up) are annotated with
??\??.

-v Print version and exit.

-y Parse only: do not process the description other than for syntax checking. While developping a
morphology description you may use this option to catch syntax errors quickly after each
modification before running it "for real".

-z implies -y. Parse and output the lexical descriptions in normalized form.

infile file containing the segments to lookup, one per line. Defaults to the standard input.

outfile
file in which the output of the program is written. One line per solution. Solutions of
different input segments are separated by an empty line. Defaults to the standard output.

WORD GRAMMAR AND SPELLING RULES

       For  a  detailed  account  of the principles and mechanisms used in mmorph, please refer to the documents
       cited in the SEE ALSO section below.

       Briefly sketched, morphosyntactic descriptions written for mmorph describe how words are  constructed  by
       the  concatenation  of  morphemes,  and  how  this  concatenation  process  changes the spelling of these
       morphemes.  The first part, the word structure grammar, is specified by restricted context  free  rewrite
       rules whose formalism is inspired by unification based systems (cf.  Shieber 1986).  The second part, the
       spelling changes, is specified by spelling rules  in  a  formalism  based  on  the  two  level  model  of
       morphology.   This  approach  to  morphology  is  described  in  Ritchie,  Russell et.  al, 1992 and more
       concisely in Pulman and Hepple 1993.

ENVIRONMENT VARIABLES

       To decide which characters are displayable on the output, mmorph uses the language  specific  description
       that  setlocale(3) sets according to the environment variable LC_CTYPE.  For the languages that are dealt
       with in MULTEXT it is a good idea to have that variable set to iso_8859_1.

EXAMPLES

       Here is a summary of the common usage of mmorph options:

              mmorph -n -m morphfile
       Test mode: reads the whole of morphfile and prints results on standard error.  No database is created, no
       words are looked up.

              mmorph -c -m morphfile
       Database  creation:   reads  the  whole of morphfile and stores the results in a database (morphfile.db).
       Typed feature structures are collected in a separate file (morphfile.tfs).  Standard input  is  read  for
       words to look up in the new database.

              mmorph -m morphfile
       Lookup mode: reads only the Alphabets, Attributes and Types sections of morphfile. Standard input is read
       for words to look up according to the existing database (mmorphfile.db and morphfile.tfs).

              mmorph -m morphfile -a addfile
       Addition mode:  ignores the Lexicon section of morphfile, but addfile is consulted, and the  results  are
       added  to  the database.  Standard input is read for words to look up according to the augmented database
       (mmorphfile.db and morphfile.tfs).

DIAGNOSTICS

       Error messages should be self explanatory.  Please refer to mmorph(5) for a  formal  description  of  the
       syntax.

FILES

       morphfile.db
              database file of forms generated for descriptions in file morphfile given as option -m.

       morphfile.tfs
              database file of typed feature structures associated to morphfile.db.

AUTHOR

       Dominique Petitpierre, ISSCO, <petitp@divsun.unige.ch>

ACKNOWLEDGEMENTS

       The  parser  for  the  morphology  description formalism was written using yacc(1) and flex(1).  Flex was
       written by Vern Paxson, <vern@ee.lbl.gov>, and is distributed in the framework of the GNU  project  under
       the condition of the GNU General Public License

       The  database  module  in  the current version uses the db library package developed at the University of
       California, Berkeley by Margo Seltzer, Keith Bostic <bostic@cs.berkeley.edu> and Ozan Yigit.

       The crc procedures used for taking a signature of the typed feature structure declarations are taken from
       the fingerprint package by Daniel J. Bernstein and use code written by Gary S. Brown.

                                            Version 2.3, October 1995                                  MMORPH(1)