Provided by: mmorph_2.3.4.2-15_amd64 bug

NAME

       mmorph - MULTEXT morphology tool

SYNOPSIS

       information:
              mmorph [ -vh ]

       parse only:
              mmorph -y | -z [ -a addfile ]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       generate:
              mmorph -c | -n [ -t trace_level ] [ -s trace_level ] [ -a addfile ]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       simple lookup:
              mmorph [ -fi ] [ -b | -k ] [ -r rejectfile ]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       record/field lookup:
              mmorph -C classes [ -fU ] [ -E | -O ] [ -b | [ -k ] [ -B class ]]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       dump database:
              mmorph -p | -q
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

DESCRIPTION

       In the simplest mode of operation, with just the -m morphfile option, mmorph operates in lookup mode:  it
       will open an  existing  database  called  morphfile.db  and  lookup  all  the  string  segments  (usually
       corresponding to words) in the input.

       To  create the database from the lexical entries specified in "morphfile", use -c -m morphfile.  The file
       morphfile.db should not exist.  When the database is complete it will lookup the segments in  the  input.
       If  used  ineractively (input and output is a terminal), a prompt is printed when the program expects the
       user to type a segment string.  No prompting occurs in record/field mode.

       To test the rule applications on the lexical entries specified in morphfile, without creating a  database
       and without looking up segments, use -n -m morphfile.  This automatically sets the trace level to 1 if it
       was not specified.

       In order to do the same operations as above, but on the alternate set of lexical entries in addfile,  use
       the  extra  option  -a  addfile.   The lexical entries in morphfile will be ignored.  This is useful when
       making additions to a standard morphological description.  Be aware that entries added  to  the  database
       morphfile.db do not replace existing ones.

   How to test a morphological description
       Use  the  -n option.  In the Grammar section, specify goal rules that will match the desired results.  In
       the Lexicon section specify the lexical items you want to test.  When running all rules will  be  applied
       (recursively)  to the lexical items, if the rule is a goal, then the result of the application is printed
       on the output.

       Suggestion: Put the two parts mentioned above (goal rules and Lexicon  section)  in  separate  files  and
       reference these files with an #include directive where they should occur in the main input file.

       If you are using an existing description and want to test only new lexical entries, use the options -n -a
       addfile, and put the lexical entries in addfile.

OPTIONS

       -a addfile
              Ignore lexical entries in morphfile, take them from addfile instead.

       -B class
              Specifies the record class that occurs before the beginning  of  a  sentence.   Capitalized  words
              occurring  just  after  such  records  will  also be looked up with all their letters converted to
              lowercase (according to LC_CTYPE, see below).

       -b     fold case before lookup. Uppercase letters  are  converted  to  lowercase  letters  (according  to
              LC_CTYPE, see below) before a word is looked up.

       -C classes
              Determines  record/field  mode. Specifies the record classes that should be looked up. Class names
              should be separated by comma ",", TAB, space, bar "|" or backslash "\".

       -c     Create a new database for lookup.  The name of the created file  is  the  name  of  morphfile  (-m
              option)  with  suffix  .db.   It should not exist; if it exists the user should remove it manually
              before running mmorph -c (this is a minimal protection against accidental overwriting  a  database
              that might have taken a long time to create).

       -d debug_map
              Specify which debug options are wanted. Each bit in debug_map corresponds to an option.
              bit decimal  hexadecimal purpose
          no bits       0  0x0    no debug option (default)
                1       1  0x1    debug initialisation
                2       2  0x2    debug yacc parsing
                3       4  0x4    debug rule combination
                4       8  0x8    debug spelling application
                5      16  0x10   print statistics with -p or -q options
         all bits      -1  0xffff all debug options whatever they are
              To combine options add the decimal or hexadecimal values together.  Example: -t 0x5 specifies bits
              (options) 1 and 4.

       -E     In record/field mode, extends the morphology annotations if they already exist (the default is  to
              leave existing annotations as is).

       -O     In  record/field  mode, overwrite the morphology annotations if they already exist (the default is
              to leave existing annotations as is).

       -f     Flush the output after each segment lookup. This is useful only if input and output are piped from
              and to a program that needs to synchronize them.

       -h     Print help and exit.

       -i     Prepend  the  result  of  each  lookup with the identifier of the input segment it corresponds to.
              Currently input segments are identified by their sequential number,  starting  at  0.   With  this
              indication, the extra newline separating the solutions for different input segments is not printed
              because it is not needed.  If a lookup has no solutions, only the segment identifier is printed on
              the  output.  The segment identifier is also prepended to rejected segments.  A tab always follows
              the segment identifier.

       -k     fallback fold case.  If a word lookup failed, then convert all uppercase letters to lowercase  and
              try lookup again.  (conversion is done according to LC_CTYPE, see below).

       -l logfile
              Specify the file for writing trace and error messages.  Defaults to standard error.

       -m morphfile
              Specify  the  file containing the morphology description.  See mmorph (5) for a description of the
              formalism's syntax.

       -n     No database creation or lookup (test mode).

       -p     Dump the typed feature structure database to outfile (or standard output).  The count of  distinct
              tfs is given in the logfile (or standard error) if bit 5 of debug option is set.

       -q     Dump  the forms in the database to outfile (or standard output).  Some statistics are given in the
              logfile (or standard error) if bit 5 of debug option is set.

       -r rejectfile
              In non record/field mode, specifies the file where to write  input  segments  that  could  not  be
              looked up.  Defaults to standard error.

       -s trace_level
              Trace spelling rule application:
              0  no tracing (default).
              1  trace valid surface forms.
              2  trace rules whose lexical part match.
              3  trace surface left context match (surface word construction).
              4  trace surface right context mismatch and rule blocking.
              5  trace rule non blocking.
              A trace_level implies all preceding ones.

       -t trace_level
              Specify the level of tracing for rule application:
              0  no tracing (default).
              1  trace goal rules that apply.
              2  trace all rules that apply, indentation indicates the recursion depth.
              10 trace also rules that were tried but did not apply
              A trace_level implies all preceding ones.

       -U     In  record/field  mode, unknown words (i.e. that were unsuccessfully looked up) are annotated with
              ??\??.

       -v     Print version and exit.

       -y     Parse only: do not process the description other than for syntax checking.   While  developping  a
              morphology  description  you  may  use  this  option  to  catch  syntax  errors quickly after each
              modification before running it "for real".

       -z     implies -y. Parse and output the lexical descriptions in normalized form.

       infile file containing the segments to lookup, one per line. Defaults to the standard input.

       outfile
              file in which the output of the  program  is  written.   One  line  per  solution.   Solutions  of
              different input segments are separated by an empty line.  Defaults to the standard output.

WORD GRAMMAR AND SPELLING RULES

       For  a  detailed  account  of the principles and mechanisms used in mmorph, please refer to the documents
       cited in the SEE ALSO section below.

       Briefly sketched, morphosyntactic descriptions written for mmorph describe how words are  constructed  by
       the  concatenation  of  morphemes,  and  how  this  concatenation  process  changes the spelling of these
       morphemes.  The first part, the word structure grammar, is specified by restricted context  free  rewrite
       rules whose formalism is inspired by unification based systems (cf.  Shieber 1986).  The second part, the
       spelling changes, is specified by spelling rules  in  a  formalism  based  on  the  two  level  model  of
       morphology.   This  approach  to  morphology  is  described  in  Ritchie,  Russell et.  al, 1992 and more
       concisely in Pulman and Hepple 1993.

ENVIRONMENT VARIABLES

       To decide which characters are displayable on the output, mmorph uses the language  specific  description
       that  setlocale(3) sets according to the environment variable LC_CTYPE.  For the languages that are dealt
       with in MULTEXT it is a good idea to have that variable set to iso_8859_1.

EXAMPLES

       Here is a summary of the common usage of mmorph options:

              mmorph -n -m morphfile
       Test mode: reads the whole of morphfile and prints results on standard error.  No database is created, no
       words are looked up.

              mmorph -c -m morphfile
       Database  creation:   reads  the  whole of morphfile and stores the results in a database (morphfile.db).
       Typed feature structures are collected in a separate file (morphfile.tfs).  Standard input  is  read  for
       words to look up in the new database.

              mmorph -m morphfile
       Lookup mode: reads only the Alphabets, Attributes and Types sections of morphfile. Standard input is read
       for words to look up according to the existing database (mmorphfile.db and morphfile.tfs).

              mmorph -m morphfile -a addfile
       Addition mode:  ignores the Lexicon section of morphfile, but addfile is consulted, and the  results  are
       added  to  the database.  Standard input is read for words to look up according to the augmented database
       (mmorphfile.db and morphfile.tfs).

DIAGNOSTICS

       Error messages should be self explanatory.  Please refer to mmorph(5) for a  formal  description  of  the
       syntax.

FILES

       morphfile.db
              database file of forms generated for descriptions in file morphfile given as option -m.

       morphfile.tfs
              database file of typed feature structures associated to morphfile.db.

SEE ALSO

       mmorph(5), setlocale(3).

       G. Russell and D. Petitpierre, MMORPH - The Multext Morphology Program, Version 2.3, October1995, MULTEXT
              deliverable report for task 2.3.1.

       Ritchie, G. D., G.J. Russell, A.W. Black and S.G.  Pulman  (1992),  Computational  Morphology:  Practical
              Mechanisms for the English Lexicon, Cambridge Mass., MIT Press.

       Pulman,  S.G.  and M.R. Hepple, (1993) ``A feature-based formalism for two level phonology: a description
              and implementation'', Computer Speech and Language 7, pp.333-358.

       Shieber, S.M. (1986), An Introduction to Unification-Based Approaches  to  Grammar,  CSLI  Lecture  Notes
              Number 4, Stanford University

AUTHOR

       Dominique Petitpierre, ISSCO, <petitp@divsun.unige.ch>

ACKNOWLEDGEMENTS

       The  parser  for  the  morphology  description formalism was written using yacc(1) and flex(1).  Flex was
       written by Vern Paxson, <vern@ee.lbl.gov>, and is distributed in the framework of the GNU  project  under
       the condition of the GNU General Public License

       The  database  module  in  the current version uses the db library package developed at the University of
       California, Berkeley by Margo Seltzer, Keith Bostic <bostic@cs.berkeley.edu> and Ozan Yigit.

       The crc procedures used for taking a signature of the typed feature structure declarations are taken from
       the fingerprint package by Daniel J. Bernstein and use code written by Gary S. Brown.

                                            Version 2.3, October 1995                                  MMORPH(1)