oracular (7) morphy.7WN.gz

Provided by: wordnet_3.0-37_amd64 bug

NAME

       morphy - discussion of WordNet's morphological processing

DESCRIPTION

       Although  only  base forms of words are usually stored in WordNet, searches may be done on
       inflected forms.  A set of morphology functions, Morphy, is applied to the  search  string
       to generate a form that is present in WordNet.

       Morphology in WordNet uses two types of processes to try to convert the string passed into
       one that can be found in the WordNet database.  There are lists of  inflectional  endings,
       based  on  syntactic category, that can be detached from individual words in an attempt to
       find a form of the word that is in WordNet.  There are also exception list files, one  for
       each syntactic category, in which a search for an inflected form is done.  Morphy tries to
       use these two processes in an intelligent manner to translate the  string  passed  to  the
       base  form  found  in WordNet.  Morphy first checks for exceptions, then uses the rules of
       detachment.   The  Morphy  functions  are  not  independent  from  WordNet.   After   each
       transformation,  WordNet  is  searched  for the resulting string in the syntactic category
       specified.

       The Morphy functions are passed a string and a syntactic category.  A string is  either  a
       single  word or a collocation.  Since some words, such as axes can have more than one base
       form (axe and axis), Morphy works in the following manner.  The first time that Morphy  is
       called with a specific string, it returns a base form.  For each subsequent call to Morphy
       made with a NULL string argument, Morphy  returns  another  base  form.   Whenever  Morphy
       cannot perform a transformation, whether on the first call for a word or subsequent calls,
       NULL is returned.  A transformation to a valid English string will return NULL if the base
       form of the string is not in WordNet.

       The  morphological  functions  are  found  in  the  WordNet  library.   See morph(3WN) for
       information on using these functions.

   Rules of Detachment
       The following table shows the rules of detachment used by Morphy.  If a word ends with one
       of the suffixes, it is stripped from the word and the corresponding ending is added.  Then
       WordNet is searched for the resulting string.  No rules are applicable to adverbs.

                                              │        │
                                         POSSuffixEnding
                                         ─────┼────────┼────────
                                         NOUN │ "s"    │ ""
                                         NOUN │ "ses"  │ "s"
                                         NOUN │ "xes"  │ "x"
                                         NOUN │ "zes"  │ "z"
                                         NOUN │ "ches" │ "ch"
                                         NOUN │ "shes" │ "sh"
                                         NOUN │ "men"  │ "man"
                                         NOUN │ "ies"  │ "y"
                                         VERB │ "s"    │ ""
                                         VERB │ "ies"  │ "y"
                                         VERB │ "es"   │ "e"
                                         VERB │ "es"   │ ""
                                         VERB │ "ed"   │ "e"
                                         VERB │ "ed"   │ ""
                                         VERB │ "ing"  │ "e"
                                         VERB │ "ing"  │ ""
                                         ADJ  │ "er"   │ ""
                                         ADJ  │ "est"  │ ""
                                         ADJ  │ "er"   │ "e"
                                         ADJ  │ "est"  │ "e"

   Exception Lists
       There is one exception list file for each syntactic category.  The exception lists contain
       the morphological transformations for strings that are not regular and therefore cannot be
       processed in an algorithmic manner.  Each line of an exception list contains an  inflected
       form  of  a  word or collocation, followed by one or more base forms.  The list is kept in
       alphabetical order and a binary search  is  used  to  find  words  in  these  lists.   See
       wndb(5WN) for information on the format of the exception list files.

   Single Words
       In  general, single words are relatively easy to process.  Morphy first looks for the word
       in the exception list.  If it is found the first base form is returned.  Subsequent  calls
       with  a  NULL  argument return additional base forms, if present.  A NULL is returned when
       there are no more base forms of the word.

       If the word is not found in the exception list corresponding to the syntactic category, an
       algorithmic  process  using  the  rules  of  detachment looks for a matching suffix.  If a
       matching suffix is found, a corresponding ending is applied (sometimes this  ending  is  a
       NULL  string,  so in effect the suffix is removed from the word), and WordNet is consulted
       to see if the resulting word is found in the desired part of speech.

   Collocations
       As opposed to single words, collocations can be quite difficult to transform into  a  base
       form  that  is  present  in  WordNet.   In  general,  only base forms of words, even those
       comprising collocations, are stored in WordNet, such  as  attorney general.   Transforming
       the collocation attorneys general is then simply a matter of finding the base forms of the
       individual words comprising the collocation.  This usually works for nouns, therefore non-
       conforming nouns, such as customs duty are presently entered in the noun exception list.

       Verb  collocations  that contain prepositions, such as ask for it, are more difficult.  As
       with single words, the exception list is searched first.  If the collocation is not found,
       special  code  in Morphy determines whether a verb collocation includes a preposition.  If
       it does, a function is called to try to find the base form in the following manner.  It is
       assumed that the first word in the collocation is a verb and that the last word is a noun.
       The algorithm then builds a search string with the  base  forms  of  the  verb  and  noun,
       leaving the remainder of the collocation (usually just the preposition, but more words may
       be involved) in the middle.  For example, passed asking for it, the database search  would
       be  performed  with  ask for it,  which  is  found in WordNet, and therefore returned from
       Morphy.  If a verb collocation does not contain a preposition, then the base form of  each
       word in the collocation is found and WordNet is searched for the resulting string.

   Hyphenation
       Hyphenation  also  presents  special  difficulties  when searching WordNet.  It is often a
       subjective decision as to whether a word is hyphenated,  joined  as  one  word,  or  is  a
       collocation  of  several  words,  and which of the various forms are entered into WordNet.
       When Morphy breaks a string into  "words",  it  looks  for  both  spaces  and  hyphens  as
       delimiters.   It  also  looks for periods in strings and removes them if an exact match is
       not found.  A search for an abbreviation like oct. return the synset for { October, Oct }.
       Not  every pattern of hyphenated and collocated string is searched for properly, so it may
       be advantageous to specify several search strings if the results of a search attempt  seem
       incomplete.

   Special Processing for nouns ending with 'ful'
       Morphy contains code that searches for nouns ending with ful and performs a transformation
       on the substring preceeding it.  It then appends 'ful' back onto the resulting string  and
       returns it. For example, if passed the nouns boxesful, it will return boxful.

BUGS

       Since many noun collocations contains prepositions, such as line of products, an algorithm
       similar to that used for verbs should be written for nouns.  In  the  present  scheme,  if
       Morphy  is  passed  lines of products, the search string becomes line of product, which is
       not in WordNet

       Morphy will allow non-words to be converted to words, if they  follow  one  of  the  rules
       described above.  For example, it will happily convert plantes to plants.

ENVIRONMENT VARIABLES (UNIX)

       WNHOME              Base directory for WordNet.  Default is /usr/local/WordNet-3.0.

       WNSEARCHDIR         Directory  in  which the WordNet database has been installed.  Default
                           is WNHOME/dict.

REGISTRY (WINDOWS)

       HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome
                           Base directory for WordNet.  Default is C:\Program Files\WordNet\3.0.

FILES

       pos.exc             morphology exception lists

SEE ALSO

       wn(1WN), wnb(1WN), binsrch(3WN), morph(3WN), wndb(5WN), wninput(7WN).