Provided by: mmorph_2.3.4.2-12.1_i386 bug

NAME

       mmorph - MULTEXT morphology tool

SYNOPSIS

       information:
              mmorph [ -vh ]

       parse only:
              mmorph -y | -z [ -a addfile ]
              -m  morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile
              ]]

       generate:
              mmorph -c | -n [ -t trace_level  ]  [  -s  trace_level  ]  [  -a
              addfile ]
              -m  morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile
              ]]

       simple lookup:
              mmorph [ -fi ] [ -b | -k ] [ -r rejectfile ]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [  outfile
              ]]

       record/field lookup:
              mmorph  -C  classes [ -fU ] [ -E | -O ] [ -b | [ -k ] [ -B class
              ]]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [  outfile
              ]]

       dump database:
              mmorph -p | -q
              -m  morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile
              ]]

DESCRIPTION

       In the simplest mode of operation, with just the -m  morphfile  option,
       mmorph  operates  in  lookup  mode:   it will open an existing database
       called  morphfile.db  and  lookup  all  the  string  segments  (usually
       corresponding to words) in the input.

       To   create   the  database  from  the  lexical  entries  specified  in
       "morphfile", use -c -m morphfile.  The  file  morphfile.db  should  not
       exist.   When  the  database is complete it will lookup the segments in
       the input. If used ineractively (input and output  is  a  terminal),  a
       prompt  is  printed when the program expects the user to type a segment
       string.  No prompting occurs in record/field mode.

       To test the rule applications  on  the  lexical  entries  specified  in
       morphfile, without creating a database and without looking up segments,
       use -n -m morphfile.  This automatically sets the trace level to  1  if
       it was not specified.

       In  order  to do the same operations as above, but on the alternate set
       of lexical entries in addfile, use the extra option  -a  addfile.   The
       lexical  entries  in  morphfile  will  be ignored.  This is useful when
       making additions to a standard  morphological  description.   Be  aware
       that entries added to the database morphfile.db do not replace existing
       ones.

   How to test a morphological description
       Use the -n option.  In the Grammar section,  specify  goal  rules  that
       will  match  the  desired  results.  In the Lexicon section specify the
       lexical items you want to test.  When running all rules will be applied
       (recursively)  to  the  lexical  items, if the rule is a goal, then the
       result of the application is printed on the output.

       Suggestion: Put the two parts mentioned above (goal rules  and  Lexicon
       section)  in  separate files and reference these files with an #include
       directive where they should occur in the main input file.

       If you are using an existing description and  want  to  test  only  new
       lexical  entries,  use  the  options -n -a addfile, and put the lexical
       entries in addfile.

OPTIONS

       -a addfile
              Ignore lexical entries in  morphfile,  take  them  from  addfile
              instead.

       -B class
              Specifies the record class that occurs before the beginning of a
              sentence.  Capitalized words occurring just after  such  records
              will  also  be  looked  up  with  all their letters converted to
              lowercase (according to LC_CTYPE, see below).

       -b     fold case before lookup.  Uppercase  letters  are  converted  to
              lowercase  letters  (according  to LC_CTYPE, see below) before a
              word is looked up.

       -C classes
              Determines record/field mode. Specifies the record classes  that
              should  be  looked  up. Class names should be separated by comma
              ",", TAB, space, bar "|" or backslash "\".

       -c     Create a new database for lookup.  The name of the created  file
              is the name of morphfile (-m option) with suffix .db.  It should
              not exist; if it exists  the  user  should  remove  it  manually
              before  running  mmorph -c (this is a minimal protection against
              accidental overwriting a database that might have taken  a  long
              time to create).

       -d debug_map
              Specify  which  debug  options are wanted. Each bit in debug_map
              corresponds to an option.
              bit decimal  hexadecimal purpose
          no bits       0  0x0    no debug option (default)
                1       1  0x1    debug initialisation
                2       2  0x2    debug yacc parsing
                3       4  0x4    debug rule combination
                4       8  0x8    debug spelling application
                5      16  0x10   print statistics with -p or -q options
         all bits      -1  0xffff all debug options whatever they are
              To  combine  options  add  the  decimal  or  hexadecimal  values
              together.  Example: -t 0x5 specifies bits (options) 1 and 4.

       -E     In record/field mode, extends the morphology annotations if they
              already exist (the default is to leave existing  annotations  as
              is).

       -O     In  record/field  mode,  overwrite the morphology annotations if
              they already exist (the default is to leave existing annotations
              as is).

       -f     Flush  the output after each segment lookup. This is useful only
              if input and output are piped from and to a program  that  needs
              to synchronize them.

       -h     Print help and exit.

       -i     Prepend  the  result  of  each lookup with the identifier of the
              input segment it corresponds to. Currently  input  segments  are
              identified by their sequential number, starting at 0.  With this
              indication, the  extra  newline  separating  the  solutions  for
              different  input  segments  is  not  printed  because  it is not
              needed.   If  a  lookup  has  no  solutions,  only  the  segment
              identifier  is  printed on the output. The segment identifier is
              also prepended to rejected segments.  A tab always  follows  the
              segment identifier.

       -k     fallback  fold  case.  If a word lookup failed, then convert all
              uppercase  letters  to   lowercase   and   try   lookup   again.
              (conversion is done according to LC_CTYPE, see below).

       -l logfile
              Specify the file for writing trace and error messages.  Defaults
              to standard error.

       -m morphfile
              Specify the file containing  the  morphology  description.   See
              mmorph (5) for a description of the formalism's syntax.

       -n     No database creation or lookup (test mode).

       -p     Dump  the  typed  feature  structure  database  to  outfile  (or
              standard output).  The count of distinct tfs  is  given  in  the
              logfile (or standard error) if bit 5 of debug option is set.

       -q     Dump  the forms in the database to outfile (or standard output).
              Some statistics are given in the logfile (or standard error)  if
              bit 5 of debug option is set.

       -r rejectfile
              In  non  record/field  mode,  specifies  the file where to write
              input segments  that  could  not  be  looked  up.   Defaults  to
              standard error.

       -s trace_level
              Trace spelling rule application:
              0  no tracing (default).
              1  trace valid surface forms.
              2  trace rules whose lexical part match.
              3  trace surface left context match (surface word construction).
              4  trace surface right context mismatch and rule blocking.
              5  trace rule non blocking.
              A trace_level implies all preceding ones.

       -t trace_level
              Specify the level of tracing for rule application:
              0  no tracing (default).
              1  trace goal rules that apply.
              2   trace  all  rules  that  apply,  indentation  indicates  the
              recursion depth.
              10 trace also rules that were tried but did not apply
              A trace_level implies all preceding ones.

       -U     In  record/field   mode,   unknown   words   (i.e.   that   were
              unsuccessfully looked up) are annotated with ??\??.

       -v     Print version and exit.

       -y     Parse only: do not process the description other than for syntax
              checking.  While developping a morphology  description  you  may
              use  this  option  to  catch  syntax  errors  quickly after each
              modification before running it "for real".

       -z     implies  -y.  Parse  and  output  the  lexical  descriptions  in
              normalized form.

       infile file  containing  the segments to lookup, one per line. Defaults
              to the standard input.

       outfile
              file in which the output of the program is  written.   One  line
              per   solution.   Solutions  of  different  input  segments  are
              separated by an empty line.  Defaults to the standard output.

WORD GRAMMAR AND SPELLING RULES

       For a detailed account of the principles and mechanisms used in mmorph,
       please refer to the documents cited in the SEE ALSO section below.

       Briefly  sketched,  morphosyntactic  descriptions  written  for  mmorph
       describe how words are constructed by the concatenation  of  morphemes,
       and  how  this  concatenation  process  changes  the  spelling of these
       morphemes.  The first part, the word structure grammar, is specified by
       restricted  context  free  rewrite rules whose formalism is inspired by
       unification based systems (cf.  Shieber 1986).  The  second  part,  the
       spelling  changes,  is specified by spelling rules in a formalism based
       on the two level model of morphology.  This approach to  morphology  is
       described  in  Ritchie,  Russell  et.   al,  1992 and more concisely in
       Pulman and Hepple 1993.

ENVIRONMENT VARIABLES

       To decide which characters are displayable on the output,  mmorph  uses
       the  language  specific description that setlocale(3) sets according to
       the environment variable LC_CTYPE.  For the languages  that  are  dealt
       with  in  MULTEXT  it  is  a  good  idea  to  have that variable set to
       iso_8859_1.

EXAMPLES

       Here is a summary of the common usage of mmorph options:

              mmorph -n -m morphfile
       Test mode: reads the whole of morphfile and prints results on  standard
       error.  No database is created, no words are looked up.

              mmorph -c -m morphfile
       Database creation:  reads the whole of morphfile and stores the results
       in a database (morphfile.db).  Typed feature structures  are  collected
       in  a  separate file (morphfile.tfs).  Standard input is read for words
       to look up in the new database.

              mmorph -m morphfile
       Lookup mode: reads only the Alphabets, Attributes and Types sections of
       morphfile. Standard input is read for words to look up according to the
       existing database (mmorphfile.db and morphfile.tfs).

              mmorph -m morphfile -a addfile
       Addition mode:  ignores the Lexicon section of morphfile,  but  addfile
       is  consulted,  and  the  results  are added to the database.  Standard
       input is read for words to look up according to the augmented  database
       (mmorphfile.db and morphfile.tfs).

DIAGNOSTICS

       Error  messages  should be self explanatory.  Please refer to mmorph(5)
       for a formal description of the syntax.

FILES

       morphfile.db
              database file  of  forms  generated  for  descriptions  in  file
              morphfile given as option -m.

       morphfile.tfs
              database   file   of  typed  feature  structures  associated  to
              morphfile.db.

SEE ALSO

       mmorph(5), setlocale(3).

       G. Russell and D. Petitpierre, MMORPH - The Multext Morphology Program,
              Version  2.3,  October1995,  MULTEXT deliverable report for task
              2.3.1.

       Ritchie, G. D., G.J.  Russell,  A.W.  Black  and  S.G.  Pulman  (1992),
              Computational  Morphology:  Practical Mechanisms for the English
              Lexicon, Cambridge Mass., MIT Press.

       Pulman, S.G. and M.R. Hepple, (1993) ``A  feature-based  formalism  for
              two   level   phonology:  a  description  and  implementation'',
              Computer Speech and Language 7, pp.333-358.

       Shieber, S.M. (1986), An Introduction to  Unification-Based  Approaches
              to Grammar, CSLI Lecture Notes Number 4, Stanford University

AUTHOR

       Dominique Petitpierre, ISSCO, <petitp@divsun.unige.ch>

ACKNOWLEDGEMENTS

       The  parser  for the morphology description formalism was written using
       yacc(1)   and   flex(1).    Flex   was   written   by   Vern    Paxson,
       <vern@ee.lbl.gov>,  and  is  distributed  in  the  framework of the GNU
       project under the condition of the GNU General Public License

       The database module in the current version uses the db library  package
       developed  at  the University of California, Berkeley by Margo Seltzer,
       Keith Bostic <bostic@cs.berkeley.edu> and Ozan Yigit.

       The crc procedures used for taking a signature  of  the  typed  feature
       structure declarations are taken from the fingerprint package by Daniel
       J. Bernstein and use code written by Gary S. Brown.

                           Version 2.3, October 1995                 MMORPH(1)