Provided by: mmorph_2.3.4.2-8_i386 bug

NAME

       mmorph - MULTEXT morphology tool formalism syntax

DESCRIPTION

       A  mmorph  morphology  description  file  is  divided  into declaration
       sections.  Each section starts  by  a  section  header  (‘@ Alphabets’,
       ‘@ Attributes’,  etc.)   followed  by a sequence of declarations.  Each
       declaration starts by a  name,  followed  by  a  colon  (‘:’)  and  the
       definition associated to the name.  Here is a brief description of each
       section:

@ Alphabets

       In this section the lexical and surface  alphabet  are  declared.   All
       symbols  forming each alphabet has to be listed.  Symbols may appear in
       both the lexical and surface alphabet definition in which  case  it  is
       considered a bi-level symbol, otherwise it is a lexical only or surface
       only symbol.  Symbols are usually letters (eg.  a, b, c) , but may also
       consist  of longer names (beta, schwa).  Symbol names consisting of one
       special character (‘:’ or ‘(’) may be specified by  enclosing  them  in
       double quotes (‘:’ or ‘(’).
       Example:

              Lexical  :  a  b c d e f g h i j k l m n o p q r s t u v w x y z
                     "-" "." "," "?" "!" "\"" "" ":" ";" "(" ")" strong_e

              Surface : a b c d e f g h i j k l m n o p q r s t u v w  x  y  z
                     "-" "." "," "?" "!" "\"" "" ":" ";" "(" ")" " "

       In  this  example,  the symbol strong_e is lexical only, the symbol " "
       (space) is surface only.  All the other symbols are bi-level.

       All the strings appearing in the rest  of  the  grammar  will  be  made
       exclusively of symbols declared in this section.

@ Attributes

       In this section, the name of attributes (sometimes called features) and
       their associated value  set.   At  most  32  different  values  may  be
       declared for an attribute.
       Examples:

              Gender : feminine masculine neuter
              Number : singular plural
              Person : 1st 2nd 3rd
              Transitive : yes no
              Inflection : base intermediate final

       In  the  current  version of the implementation value sets of different
       attributes are incompatible, even if they are defined identically.   To
       overcome  this  restriction,  in  a future version this section will be
       split  into  two:   declaration  of  value  sets  and  declaration   of
       attributes.

@ Types

       In  this  section,  the  different  types  of  feature  structures  are
       declared.  The attributes allowed for each type are listed.  Attributes
       that  are  only  used  within the scope of the tool and have no meaning
       outside can be listed after a bar (‘|’).  The  values  of  these  local
       attributes ar not stored in the database or written on the final output
       of the program.
       Examples:

              Noun : Gender Number
              Verb : Tense Person Gender Number Transitive | Inflection

Typed feature structures

       Typed feature structures are used in the grammar  and  spelling  rules.
       It  is  the  specification  of  a type and the value of some associated
       attributes.  The list of attribute specifications is enclosed in square
       brackets (‘[’ and ‘]’).
       Example:

              Noun[ Gender=feminine Number=singular ]

       It  is  possible to specify a set of values for an attribute by listing
       the possible valuse separated with a bar (‘|’), or the complement of  a
       set  (with  respect to all possible values of that attribute) indicated
       with ‘!=’ instead of ‘=’.
       Example:  Assuming the declaration of Gender as  above,  the  following
       two typed feature structures are equivalent

              Noun[ Gender=masculine|neuter ]
              Noun[ Gender!=feminine ]

@ Grammar

       This  section  contains  the rules that specify the structure of words.
       It has the general shape of a context free grammar over  typed  feature
       structures.   There  are  three basic types of rules:  binary, goal and
       affixes.

       Binary rules specify the result of the concatenation of  two  elements.
       This is written as:

              Rule_name : Lhs <- Rhs1 Rhs2

       where Lhs is called the left hand side, and Rhs1 and Rhs2 the first and
       second part of the right hand side.  Lhs, Rhs1 and Rhs2  are  specified
       as typed feature structures.
       Example:

              Rule_1  : Noun[ Gender=feminine Number=singular ]
                      <- Noun[ Gender=feminine Number=singular ]
                         NounSuffix[ Gender=feminine ]

       Variables  can  be  used to indicate that some attributes have the same
       value.  A variable is a name starting with a dollar (‘$’).
       Example:

              Rule_2  : Noun[ Gender=$A Number=$number ]
                      <- Noun[ Gender=$A Number=$number ]
                         NounSuffix[ Gender=$A ]

       If needed, both a variable and a value specification can be  given  for
       an attribute (only once per attribute):
       Example:

              Rule_3  : Noun[ Gender=$A Number=$number ]
                      <- Noun[ Gender=$A Number=$number ]
                         NounSuffix[ Gender=$A=masculine|neuter ]

       Affix  rules  define  basic elements of the concatenations specified by
       binary rules (together with lexical entries, see the section  @ Lexicon
       below).  An affix rule consists of lexical string associated to a typed
       feature structure.
       Examples:

              Plural_s : "s" NounSuffix[ Number=plural ]
              Feminine_e : "e" NounSuffix[ Gender=feminine ]
              ing : "ing" VerbSuffix[ Tense=present_participle ]

       Goal rules specify the valid results constructed by the grammar.   They
       consist of just a typed feature structure.
       Examples:

              Goal_1  : Noun[]
              Goal_2  : Verb[ inflection=final ]

       In addition to these three basic rule types, there are prefix or suffix
       composite rules and unary rules.  A unary rule consist of a  left  hand
       side and a right hand side.
       Example:

              Rule_4  : Noun[ gender=$G number=plural ]
                      <- Noun[ gender=$G number=singular invariant=yes]

       Prefix  and  suffix composite rules have the same shape as binary rules
       except that one part of the right hand side is an affix  (i.e.  has  an
       associated string).
       Examples:

              Append_e   : Noun[ Gender=feminine Number=$number ]
                      <- Noun[ Gender=feminine Number=$number ]
                         "e" NounSuffix[ Gender=feminine ]

              anti    : Noun[ Gender=$gender Number=$number ]
                      <- "anti" NounPrefix[]
                         Noun[ Gender=$gender Number=$number ]

@ Classes

       This  optional  section contains the definition of symbol classes. Each
       class is defined as a set of symbols, or other classes.  If  the  class
       contains only bi-level elements it is a bi-level class, otherwise it is
       a lexical or surface class.
       Examples:

              Dental : d t
              Vowel : a e i o u
              Vowel_y : Vowel y
              Consonant: b c d f g h j k l m n p q r s t v w x z

@ Pairs

       This optional section contains the  definition  of  pair  disjunctions.
       Each  disjunction is defined as a set of pairs.  Explicit pairs specify
       a sequence of surface symbols and a sequence of  zero  or  one  lexical
       symbol,  one  of  them  possibly empty.  A sequence is enclosed between
       angle brackets ‘<’ and ‘>’.  The empty sequence is indicated with ‘<>’.
       In  the current implementation only the surface part of a pair can be a
       sequence of more than one element.  The special symbol ‘?’  stands  for
       the  class  of  all  possible  symbols, including the morpheme and word
       boundary.
       Examples:

              s_x_z_1 : s/s x/x z/z
              VowelPair1: a/a e/e i/i o/o u/u
              VowelPair2: Vowel/Vowel
              ie.y: <i e>/y
              Delete_e: <>/e
              Insert_d: d/<>
              Surface_Vowel: Vowel/?
              Lexical_s:  ?/s

              DoubleConsonant: <b b>/b <d d>/d <f f>/f <g g>/g <k k>/k <m m>/m
                     <p p>/p <s s>/s <t t>/t <v v>/v <z z>/z

       Note  that  VowelPair1  and  VowelPair2  don’t  specify the same thing:
       VowelPair2 would match a/o but VowelPair1 would not.

       Implicit pairs are specified by the name of a bi-level symbol or a  bi-
       level class.
       Examples:   the  following s_x_z_2 and VowelPair3 are equivalent to the
       above s_x_z_1 and VowelPair2 (assuming that s, x, z and Vowel  are  bi-
       level symbols and classes).

              s_x_z_2 : s x z
              VowelPair3 : Vowel

       In  a pair disjunction all lexical parts should be disjoint. This means
       you cannot specify for the same pair disjunction a/a and o/a or a/a and
       Vowel/Vowel.

       In  a  future  version  this section will be split in two:  simple pair
       disjunctions and pair sequences.

@ Spelling

       In this section are declared the two level spelling rules.  A  spelling
       rule consist of a kind indicator followed by a left context a focus and
       a right context.  The kind indicator is ‘=>’ if the rule  is  optional,
       ‘<=>’  if  it  is obligatory and ‘<=’ if it is a surface coercion rule.
       The contexts may be empty.  The focus is surrounded by  two  ‘-’.   The
       contexts  and  the  focus  consist  of  a  sequence  of  pairs  or pair
       disjunctions declared in the ‘@ Pairs section.  A morpheme boundary  is
       indicated by a ‘+’ or a ‘*’, a word boundary is indicated by a ‘~’.
       Examples:

              Sibilant_s: <=> s_x_z_1 * - e/<> - s
              Gemination: <=>
                      Consonant Vowel - DoubleConsonant - * Vowel
              i_y_optionnel: => a - i/y - * ?/e

       Constraints  may  be  specified  in the form of a list of typed feature
       structures.  They are affix-driven:  the rule is licensed if  at  least
       one  of  them  subsumes  the closest corresponding affix.  The morpheme
       boundary indicated by a star (‘*’) will  be  used  to  determine  which
       affix  it  is.  If there is no such indication, then the affix adjacent
       to the morpheme where the first character of the focus occurs is  used.
       In  case  there is no affix, the typed feature structure of the lexical
       stem is used.
       Example:

              Sibilant_s: <=>
                  s_x_z_1 * - e/<> - s NounSuffix[ Number=plural ]

@ Lexicon

       This section is optional and can also be repeated.  This section  lists
       all  the  lexical entries of the morphological description.  Unlike the
       other sections, definitions do not have a name.  A  definition  consist
       of  a  typed  feature strucure followed by a list of lexical stems that
       share that feature structure.  A lexical stem consists  of  the  string
       used  in  the  concatenation specified by the grammar rules followed by
       ‘=’ and a reference string.  The reference string can be  anything  and
       usually  is  used  to  indicate  the  canonical  form of the word or an
       identifier of an external database entry.
       Examples:
              Noun[ Number=singular ] "table" = "table" "chair" = "chair"
              Verb[ Transitive=yes|no Inflection=base ] "bow" = "bow1"
              Noun[ Number=singular ] "bow" = "bow2"

       If the stem string and the reference strings are  identical,  only  one
       needs to be specified.
       Example:

              Noun[ Number=singular ] "table" "chair"

FORMAL SYNTAX

       The  formal syntax description below is in Backus Naur Form (BNF).  The
       following conventions apply:

       <id>      is a non-terminal symbol (within angle brackets).
       ID        is a token (terminal symbol, all uppercase).
       <id>?     means zero or one occurrence of <id> (i.e. <id> is optional).
       <id>*     is zero or more occurrences of <id>.
       <id>+     is one or more occurrences of <id>.
       ::=       separates a non-terminal symbol and its expansion.
       |         indicates an alternative expansion.
       ;         starts a comment (not part of the definition).

       The start symbol corresponding  to  a  complete  description  is  named
       <Start>.    Symbols   that   parse  but  do  nothing  are  marked  with
       ‘; not operational’.

       <Start>           ::= <AlphabetDecl> <AttDecl> <TypeDecl> <GramDecl>
                             <ClassDecl>? <PairDecl>? <SpellDecl>? <LexDecl>*

       <AlphabetDecl>    ::= ALPHABETS <LexicalDef> <SurfaceDef>

       <LexicalDef>      ::= <LexicalName> COLON <LexicalSymbol>+

       <SurfaceDef>      ::= <SurfaceName> COLON <SurfaceSymbol>+

       <LexicalSymbol>   ::= <LexicalSymbolName>    ; lexical only
                         |   <BiLevelSymbolName>    ; both lexical and surface

       <SurfaceSymbol>   ::= <SurfaceSymbolName>    ; surface only
                         |   <BiLevelSymbolName>    ; both lexical and surface

       <AttDecl>         ::= ATTRIBUTES <AttDef>+

       <AttDef>          ::= <AttName> COLON <ValName>+

       <TypeDecl>        ::= TYPES <TypeDef>+

       <TypeDef>         ::= <TypeName> COLON <AttName>+ <NoProjAtt>?

       <NoProjAtt>       ::= BAR <AttName>+

       <LexDecl>         ::= LEXICON <LexDef>+

       <LexDef>          ::= <Tfs> <Lexical>+

       <Lexical>         ::= LEXICALSTRING <BaseForm>?

       <BaseForm>        ::= EQUAL LEXICALSTRING

       <Tfs>             ::= <TypeName> <AttSpec>?

       <VarTfs>          ::= <TypeName> <VarAttSpec>?

       <AttSpec>         ::= LBRA <AttVal>* RBRA

       <VarAttSpec>      ::= LBRA <VarAttVal>* RBRA

       <AttVal>          ::= <AttName> <ValSpec>

       <VarAttVal>       ::= <AttName> <VarValSpec>

       <ValSpec>         ::= EQUAL <ValSet>
                         |   NOTEQUAL <ValSet>

       <VarValSpec>      ::= <ValSpec>
                         |   EQUAL DOLLAR <VarName>
                         |   EQUAL DOLLAR <VarName> <ValSpec>

       <ValSet>          ::= <ValName> <ValSetRest>*

       <ValSetRest>      ::= BAR <ValName>

       <GramDecl>        ::= GRAMMAR <Rule>+

       <RuleDef>         ::= <RuleName> COLON <RuleBody>

       <RuleBody>        ::= <VarTfs> LARROW <Rhs>
                         |   <Tfs>    ; goal rule
                         |   LEXICALSTRING <Tfs>    ; lexical affix

       <Rhs>             ::= <VarTfs>    ; unary rule
                         |   <VarTfs> <VarTfs>    ; binary rule
                         |   LEXICALSTRING <Tfs> <VarTfs>   ; prefix rule
                         |   <VarTfs> <Tfs> LEXICALSTRING    ; suffix rule

       <ClassDecl>       ::= CLASSES<ClassDef>+

       <ClassDef>        ::= <LexicalClassName> COLON <LexicalClass>+
                         |   <SurfaceClassName> COLON <SurfaceClass>+
                         |   <BiLevelClassName> COLON <BiLevelClass>+

       <LexicalClass>    ::= <LexicalSymbol>
                         |   <LexicalClassName>
                         |   <BiLevelClassName>

       <SurfaceClass>    ::= <SurfaceSymbol>
                         |   <SurfaceClassName>
                         |   <BiLevelClassName>

       <BiLevelClass>    ::= <BiLevelSymbolName>
                         |   <BiLevelClassName>

       <PairDecl>        ::= PAIRS <PairDef>+

       <PairDef>         ::= <PairName> COLON <PairDef>+

       <PairDef>         ::= <PairName> COLON <Pair>+

       <Pair>            ::= <SurfaceSequence> SLASH <LexicalSequence>
                         |   <PairName>
                         |   <BiLevelClassName>
                         |   <BiLevelSymbolName>

       SurfaceSequence   ::= LANGLE <SurfaceSymbol>* RANGLE
                         |   SURFACESTRING
                         |   <SurfaceClass>
                         |   ANY

       LexicalSequence   ::= LANGLE <LexicalSymbol>* RANGLE
                         |   LEXICALSTRING
                         |   <LexicalClass>
                         |   ANY

       <SpellDecl>       ::= SPELLING <SpellDef>+

       <SpellDef>        ::= <SpellName> COLON <Arrow> <LeftContext> <Focus>
                                 <RightContext> <Constraint>*

       <LeftContext>     ::= <Pattern>*

       <RightContext>    ::= <Pattern>*

       <Focus>           ::= CONTEXTBOUNDARY <Pattern>+ CONTEXTBOUNDARY

       <Pattern>         ::= <Pair>
                         |   MORPHEMEBOUNDARY
                         |   WORDBOUNDARY
                         |   CONCATBOUNDARY

       <Constraint>      ::= <Tfs>

       <Arrow>           ::= RARROW
                         |   BIARROW
                         |   COERCEARROW

       <AttName>           ::= NAME
       <BiLevelClassName>  ::= NAME
       <BiLevelSymbolName> ::= NAME  | SYMBOLSTRING
       <LexicalClassName>  ::= NAME
       <LexicalName>       ::= NAME
       <LexicalSymbolName> ::= NAME  | SYMBOLSTRING
       <PairName>          ::= NAME
       <RuleName>          ::= NAME
       <SpellName>         ::= NAME
       <SurfaceClassName>  ::= NAME
       <SurfaceName>       ::= NAME
       <SurfaceSymbolName> ::= NAME  | SYMBOLSTRING
       <TypeName>          ::= NAME
       <ValName>           ::= NAME
       <VarName>           ::= NAME

   Simple tokens
       Simple tokens of the BNF above are defined as follow: The token name on
       the  left  correspond  to  the  literal  character or characters on the
       right:

       ANY                 ?
       BAR                 |
       BIARROW             <=>
       COERCEARROW         <=
       COLON               :
       CONCATBOUNDARY      *
       CONTEXTBOUNDARY     -
       DOLLAR              $
       EQUAL               =
       LANGLE              <
       LARROW              <-
       LBRA                ]
       MORPHEMEBOUNDARY    +
       NOTEQUAL            !=
       RARROW              =>
       RANGLE              <
       RBRA                [
       SLASH               /
       WORDBOUNDARY        ~

       ALPHABETS           @Alphabets
       ATTRIBUTES          @Attributes
       CLASSES             @Classes
       GRAMMAR             @Grammar
       LEXICON             @Lexicon
       PAIRS               @Pairs
       SPELLING            @Spelling
       TYPES               @Types

       In the section header tokens above, spaces may separate  the  ‘@’  from
       the reserved word.

   Complex tokens
       NAME
              is  any sequence of letter, digit, underline (‘_’), period (‘.’)
              Examples:
              category
              33
              Rule_9
              __2__
              Proper.Noun

       LEXICALSTRING
              is a string of lexical symbols

       SURFACESTRING
              is a string of surface symbols

       SYMBOLSTRING
              is a string of just just one character (used  only  in  alphabet
              declaration).

       A string consist of zero or more characters within double quotes (‘"’).
       Characters preceded by a backslash  (‘\’)  are  escaped  (the  usual  C
       escaping  convention  apply).  Symbols that have a name longer than one
       character  are  represented  using  a  SGML   entity   like   notation:
       ‘&symbolname;’.  The maximum number of symbols in a string is 127.
       Examples:

              "table"
              ","
              ""
              "double quote is \" and backslash is \\"
              "&strong_e;"
              "escape like in C : \t is ASCII tab"
              "escape with octal code: \011 is ASCII tab"

       Tokens can be separated by one or many blanks or comments.
       A blank separator is space, tab or newline.
       A  comment  starts  with  a  semicolon and finishes at the next newline
       (except when the semicolon occurs in a string.

       Inclusion  of  files  can  be  specified  with  the  usual   ‘#include’
       directive:
       Example:
              #include "verb.entries"

       will  splice in the content of the file verb.entries at the point where
       this directive occurs.

       The ‘#’ should be the first character on the line.  Tabs or spaces  may
       separate  ‘#’  and ‘include’.  The file name must be quoted.  Only tabs
       or spaces may occur on the rest of the line.  Inclusion can  be  nested
       up to 10 levels.

SEE ALSO

       mmorph(1).

       G. Russell and D. Petitpierre, MMORPH - The Multext Morphology Program,
              Version 2.3, October 1995, MULTEXT deliverable report  for  task
              2.3.1.

AUTHOR

       Dominique Petitpierre, ISSCO, <petitp@divsun.unige.ch>

COMMENTS

       The  parser  for the morphology description formalims above was written
       using yacc (1)  and  flex  (1).   Flex  was  written  by  Vern  Paxson,
       <vern@ee.lbl.gov>,  and  is  distributed  in  the  framework of the GNU
       project under the condition of the GNU General Public License

                           Version 2.3, October 1995                 MMORPH(5)