jammy (1) mmorph.1.gz

Provided by: mmorph_2.3.4.2-17_amd64 bug

NAME

       mmorph - MULTEXT morphology tool

SYNOPSIS

       information:
              mmorph [ -vh ]

       parse only:
              mmorph -y | -z [ -a addfile ]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       generate:
              mmorph -c | -n [ -t trace_level ] [ -s trace_level ] [ -a addfile ]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       simple lookup:
              mmorph [ -fi ] [ -b | -k ] [ -r rejectfile ]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       record/field lookup:
              mmorph -C classes [ -fU ] [ -E | -O ] [ -b | [ -k ] [ -B class ]]
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

       dump database:
              mmorph -p | -q
              -m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]

DESCRIPTION

       In  the  simplest mode of operation, with just the -m morphfile option, mmorph operates in
       lookup mode:  it will open an existing database called morphfile.db  and  lookup  all  the
       string segments (usually corresponding to words) in the input.

       To  create  the  database  from  the  lexical  entries specified in "morphfile", use -c -m
       morphfile.  The file morphfile.db should not exist.  When the database is complete it will
       lookup the segments in the input. If used ineractively (input and output is a terminal), a
       prompt is printed when the program  expects  the  user  to  type  a  segment  string.   No
       prompting occurs in record/field mode.

       To  test  the  rule  applications  on  the lexical entries specified in morphfile, without
       creating a  database  and  without  looking  up  segments,  use  -n  -m  morphfile.   This
       automatically sets the trace level to 1 if it was not specified.

       In  order  to do the same operations as above, but on the alternate set of lexical entries
       in addfile, use the extra option -a addfile.  The lexical entries  in  morphfile  will  be
       ignored.   This  is  useful when making additions to a standard morphological description.
       Be aware that entries added to the database morphfile.db do not replace existing ones.

   How to test a morphological description
       Use the -n option.  In the Grammar section, specify goal rules that will match the desired
       results.  In the Lexicon section specify the lexical items you want to test.  When running
       all rules will be applied (recursively) to the lexical items, if the rule is a goal,  then
       the result of the application is printed on the output.

       Suggestion: Put the two parts mentioned above (goal rules and Lexicon section) in separate
       files and reference these files with an #include directive where they should occur in  the
       main input file.

       If  you  are  using an existing description and want to test only new lexical entries, use
       the options -n -a addfile, and put the lexical entries in addfile.

OPTIONS

       -a addfile
              Ignore lexical entries in morphfile, take them from addfile instead.

       -B class
              Specifies the record  class  that  occurs  before  the  beginning  of  a  sentence.
              Capitalized words occurring just after such records will also be looked up with all
              their letters converted to lowercase (according to LC_CTYPE, see below).

       -b     fold case before lookup. Uppercase  letters  are  converted  to  lowercase  letters
              (according to LC_CTYPE, see below) before a word is looked up.

       -C classes
              Determines  record/field  mode.  Specifies the record classes that should be looked
              up. Class names should be separated by comma ",", TAB, space, bar "|" or  backslash
              "\".

       -c     Create  a  new  database  for  lookup.  The name of the created file is the name of
              morphfile (-m option) with suffix .db.  It should not exist; if it exists the  user
              should  remove  it  manually before running mmorph -c (this is a minimal protection
              against accidental overwriting a database that might have  taken  a  long  time  to
              create).

       -d debug_map
              Specify  which  debug  options  are wanted. Each bit in debug_map corresponds to an
              option.
              bit decimal  hexadecimal purpose
          no bits       0  0x0    no debug option (default)
                1       1  0x1    debug initialisation
                2       2  0x2    debug yacc parsing
                3       4  0x4    debug rule combination
                4       8  0x8    debug spelling application
                5      16  0x10   print statistics with -p or -q options
         all bits      -1  0xffff all debug options whatever they are
              To combine options add the decimal or hexadecimal values together.  Example: -t 0x5
              specifies bits (options) 1 and 4.

       -E     In record/field mode, extends the morphology annotations if they already exist (the
              default is to leave existing annotations as is).

       -O     In record/field mode, overwrite the morphology annotations if  they  already  exist
              (the default is to leave existing annotations as is).

       -f     Flush the output after each segment lookup. This is useful only if input and output
              are piped from and to a program that needs to synchronize them.

       -h     Print help and exit.

       -i     Prepend the result of each lookup with the  identifier  of  the  input  segment  it
              corresponds to. Currently input segments are identified by their sequential number,
              starting at 0.  With this indication, the extra newline  separating  the  solutions
              for  different input segments is not printed because it is not needed.  If a lookup
              has no solutions, only the segment identifier is printed on the output. The segment
              identifier  is  also  prepended  to  rejected  segments.   A tab always follows the
              segment identifier.

       -k     fallback fold case.  If a word lookup failed, then convert all uppercase letters to
              lowercase  and  try  lookup  again.  (conversion is done according to LC_CTYPE, see
              below).

       -l logfile
              Specify the file for writing trace and error messages.  Defaults to standard error.

       -m morphfile
              Specify the file containing the morphology  description.   See  mmorph  (5)  for  a
              description of the formalism's syntax.

       -n     No database creation or lookup (test mode).

       -p     Dump  the  typed  feature  structure database to outfile (or standard output).  The
              count of distinct tfs is given in the logfile (or standard error) if bit 5 of debug
              option is set.

       -q     Dump  the  forms  in the database to outfile (or standard output).  Some statistics
              are given in the logfile (or standard error) if bit 5 of debug option is set.

       -r rejectfile
              In non record/field mode, specifies the file where to  write  input  segments  that
              could not be looked up.  Defaults to standard error.

       -s trace_level
              Trace spelling rule application:
              0  no tracing (default).
              1  trace valid surface forms.
              2  trace rules whose lexical part match.
              3  trace surface left context match (surface word construction).
              4  trace surface right context mismatch and rule blocking.
              5  trace rule non blocking.
              A trace_level implies all preceding ones.

       -t trace_level
              Specify the level of tracing for rule application:
              0  no tracing (default).
              1  trace goal rules that apply.
              2  trace all rules that apply, indentation indicates the recursion depth.
              10 trace also rules that were tried but did not apply
              A trace_level implies all preceding ones.

       -U     In  record/field  mode, unknown words (i.e. that were unsuccessfully looked up) are
              annotated with ??\??.

       -v     Print version and exit.

       -y     Parse only: do not process the description other than for syntax  checking.   While
              developping a morphology description you may use this option to catch syntax errors
              quickly after each modification before running it "for real".

       -z     implies -y. Parse and output the lexical descriptions in normalized form.

       infile file containing the segments to lookup, one per  line.  Defaults  to  the  standard
              input.

       outfile
              file  in  which  the  output  of  the  program  is written.  One line per solution.
              Solutions of different input segments are separated by an empty line.  Defaults  to
              the standard output.

WORD GRAMMAR AND SPELLING RULES

       For  a  detailed  account of the principles and mechanisms used in mmorph, please refer to
       the documents cited in the SEE ALSO section below.

       Briefly sketched, morphosyntactic descriptions written for mmorph describe how  words  are
       constructed  by the concatenation of morphemes, and how this concatenation process changes
       the spelling of these morphemes.  The first part, the word structure grammar, is specified
       by  restricted context free rewrite rules whose formalism is inspired by unification based
       systems (cf.  Shieber 1986).  The second part,  the  spelling  changes,  is  specified  by
       spelling  rules  in a formalism based on the two level model of morphology.  This approach
       to morphology is described in Ritchie, Russell et.  al, 1992 and more concisely in  Pulman
       and Hepple 1993.

ENVIRONMENT VARIABLES

       To  decide  which  characters  are  displayable  on  the  output, mmorph uses the language
       specific  description  that  setlocale(3)  sets  according  to  the  environment  variable
       LC_CTYPE.  For the languages that are dealt with in MULTEXT it is a good idea to have that
       variable set to iso_8859_1.

EXAMPLES

       Here is a summary of the common usage of mmorph options:

              mmorph -n -m morphfile
       Test mode: reads the whole of morphfile and prints results on standard error.  No database
       is created, no words are looked up.

              mmorph -c -m morphfile
       Database  creation:   reads  the  whole  of morphfile and stores the results in a database
       (morphfile.db).   Typed  feature   structures   are   collected   in   a   separate   file
       (morphfile.tfs).  Standard input is read for words to look up in the new database.

              mmorph -m morphfile
       Lookup  mode:  reads  only  the  Alphabets,  Attributes  and  Types sections of morphfile.
       Standard input  is  read  for  words  to  look  up  according  to  the  existing  database
       (mmorphfile.db and morphfile.tfs).

              mmorph -m morphfile -a addfile
       Addition  mode:   ignores  the Lexicon section of morphfile, but addfile is consulted, and
       the results are added to the database.  Standard input  is  read  for  words  to  look  up
       according to the augmented database (mmorphfile.db and morphfile.tfs).

DIAGNOSTICS

       Error  messages  should  be  self  explanatory.   Please  refer  to mmorph(5) for a formal
       description of the syntax.

FILES

       morphfile.db
              database file of forms generated for descriptions in file morphfile given as option
              -m.

       morphfile.tfs
              database file of typed feature structures associated to morphfile.db.

SEE ALSO

       mmorph(5), setlocale(3).

       G.  Russell  and  D.  Petitpierre,  MMORPH  - The Multext Morphology Program, Version 2.3,
              October1995, MULTEXT deliverable report for task 2.3.1.

       Ritchie, G. D., G.J. Russell, A.W. Black and S.G. Pulman (1992), Computational Morphology:
              Practical Mechanisms for the English Lexicon, Cambridge Mass., MIT Press.

       Pulman,  S.G. and M.R. Hepple, (1993) ``A feature-based formalism for two level phonology:
              a description and implementation'', Computer Speech and Language 7, pp.333-358.

       Shieber, S.M. (1986), An Introduction to Unification-Based  Approaches  to  Grammar,  CSLI
              Lecture Notes Number 4, Stanford University

AUTHOR

       Dominique Petitpierre, ISSCO, <petitp@divsun.unige.ch>

ACKNOWLEDGEMENTS

       The parser for the morphology description formalism was written using yacc(1) and flex(1).
       Flex was written by Vern Paxson, <vern@ee.lbl.gov>, and is distributed in the framework of
       the GNU project under the condition of the GNU General Public License

       The  database  module  in the current version uses the db library package developed at the
       University of California, Berkeley by Margo Seltzer, Keith Bostic <bostic@cs.berkeley.edu>
       and Ozan Yigit.

       The crc procedures used for taking a signature of the typed feature structure declarations
       are taken from the fingerprint package by Daniel J. Bernstein and use code written by Gary
       S. Brown.

                                    Version 2.3, October 1995                           MMORPH(1)