bionic (5) senseidx.5WN.gz

Provided by: wordnet-sense-index_3.0-35_all bug

NAME

       index.sense, sense.idx - WordNet's sense index

DESCRIPTION

       The WordNet sense index provides an alternate method for accessing synsets and word senses in the WordNet
       database.  It is useful to applications that retrieve synsets or other information related to a  specific
       sense  in  WordNet,  rather than all the senses of a word or collocation.  It can also be used with tools
       like grep and Perl to find all senses of a word in one or more  parts  of  speech.   A  specific  WordNet
       sense, encoded as a sense_key, can be used as an index into this file to obtain its WordNet sense number,
       the database byte offset of the synset containing the sense, and the number of times it has  been  tagged
       in the semantic concordance texts.

       Concatenating  the  lemma  and lex_sense fields of a semantically tagged word (represented in a <wf ... >
       attribute/value pair) in a semantic concordance file, using % as the concatenation character, creates the
       sense_key for that sense, which can in turn be used to search the sense index file.

       A  sense_key  is  the  best  way  to represent a sense in semantic tagging or other systems that refer to
       WordNet senses.  sense_keys are independent of WordNet  sense  numbers  and  synset_offsets,  which  vary
       between  versions  of the database.  Using the sense index and a sense_key, the corresponding synset (via
       the synset_offset) and WordNet sense number can easily be obtained.  A mapping from  noun  sense_keys  in
       WordNet  1.6  to  corresponding  2.0  sense_keys  is  provided  with  version  2.0,  and  is described in
       sensemap(5WN).

       See wndb(5WN) for a thorough discussion of the WordNet database files.

   File Format
       The sense index file lists all of the senses in the WordNet database  with  each  line  representing  one
       sense.  The file is in alphabetical order, fields are separated by one space, and each line is terminated
       with a newline character.

       Each line is of the form:

              sense_key  synset_offset  sense_number  tag_cnt

       sense_key is an encoding of the word sense.  Programs can construct a sense key in this format and use it
       as a binary search key into the sense index file.  The format of a sense_key is described below.

       synset_offset  is the byte offset that the synset containing the sense is found at in the database "data"
       file corresponding to the part of speech encoded in the sense_key.  synset_offset is an  8  digit,  zero-
       filled  decimal  integer, and can be used with fseek(3) to read a synset from the data file.  When passed
       to the WordNet library function read_synset()  along  with  the  syntactic  category,  a  data  structure
       containing the parsed synset is returned.

       sense_number  is  a  decimal  integer  indicating the sense number of the word, within the part of speech
       encoded in sense_key, in the WordNet database.  See wndb(5WN) for information about how sense numbers are
       assigned.

       tag_cnt represents the decimal number of times the sense is tagged in various semantic concordance texts.
       A tag_cnt of 0 indicates that the sense has not been semantically tagged.

   Sense Key Encoding
       A sense_key is represented as:

              lemma%lex_sense

       where lex_sense is encoded as:

              ss_type:lex_filenum:lex_id:head_word:head_id

       lemma is the ASCII text of the  word  or  collocation  as  found  in  the  WordNet  database  index  file
       corresponding  to  pos.   lemma is in lower case, and collocations are formed by joining individual words
       with an underscore (_) character.

       ss_type is a one digit decimal integer representing the synset type for the sense.  See Synset Type below
       for a listing of the numbers corresponding to each synset type.

       lex_filenum is a two digit decimal integer representing the name of the lexicographer file containing the
       synset for the sense.  See lexnames(5WN) for the list of lexicographer file names and their corresponding
       numbers.

       lex_id  is a two digit decimal integer that, when appended onto lemma, uniquely identifies a sense within
       a lexicographer file.  lex_id numbers usually start with 00, and are incremented as additional senses  of
       the  word are added to the same file, although there is no requirement that the numbers be consecutive or
       begin with 00.  Note that a value of 00 is the default, and therefore is  not  present  in  lexicographer
       files.   Only  non-default  lex_id  values  must  be  explicitly  assigned  in  lexicographer files.  See
       wninput(5WN) for information on the format of lexicographer files.

       head_word is only present if the sense is in an adjective satellite synset.  It is the lemma of the first
       word of the satellite's head synset.

       head_id  is a two digit decimal integer that, when appended onto head_word, uniquely identifies the sense
       of head_word within a lexicographer file, as described for lex_id.  There is a value in this  field  only
       if head_word is present.

   Synset Type
       The synset type is encoded as follows:

              1    NOUN
              2    VERB
              3    ADJECTIVE
              4    ADVERB
              5    ADJECTIVE SATELLITE

NOTES

       For  non-satellite  senses  the  head_word and head_id fields have no values, however the field separator
       character (:) is present.

ENVIRONMENT VARIABLES (UNIX)

       WNHOME              Base directory for WordNet.  Default is /usr/local/WordNet-3.0.

       WNSEARCHDIR         Directory in which the WordNet database has been installed.  Default is WNHOME/dict.

REGISTRY (WINDOWS)

       HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome
                           Base directory for WordNet.  Default is C:\Program Files\WordNet\3.0.

FILES

       index.sense         sense index

SEE ALSO

       binsrch(3WN), wnsearch(3WN), lexnames(5WN), wnintro(5WN), sensemap(5WN), wndb(5WN), wninput(5WN).