Ubuntu Manpage: WordNet::QueryData - direct perl interface to WordNet database

name
synopsis
description
usage
notes
copyright
see also

Provided by: libwordnet-querydata-perl_1.49-1_all

NAME

       WordNet::QueryData - direct perl interface to WordNet database

SYNOPSIS

         use WordNet::QueryData;

         my $wn = WordNet::QueryData->new( noload => 1);

         print "Synset: ", join(", ", $wn->querySense("cat#n#7", "syns")), "\n";
         print "Hyponyms: ", join(", ", $wn->querySense("cat#n#1", "hypo")), "\n";
         print "Parts of Speech: ", join(", ", $wn->querySense("run")), "\n";
         print "Senses: ", join(", ", $wn->querySense("run#v")), "\n";
         print "Forms: ", join(", ", $wn->validForms("lay down#v")), "\n";
         print "Noun count: ", scalar($wn->listAllWords("noun")), "\n";
         print "Antonyms: ", join(", ", $wn->queryWord("dark#n#1", "ants")), "\n";

DESCRIPTION

       WordNet::QueryData provides a direct interface to the WordNet database files.  It requires the WordNet
       package (http://www.cogsci.princeton.edu/~wn/).  It allows the user direct access to the full WordNet
       semantic lexicon.  All parts of speech are supported and access is generally very efficient because the
       index and morphical exclusion tables are loaded at initialization. The module can optionally be used to
       load the indexes into memory for extra-fast lookups.

USAGE

   LOCATING THE WORDNET DATABASE
       To use QueryData, you must tell it where your WordNet database is.  There are two ways you can do this:
       1) by setting the appropriate environment variables, or 2) by passing the location to QueryData when you
       invoke the "new" function.

       QueryData knows about two environment variables, WNHOME and WNSEARCHDIR.  If WNSEARCHDIR is set,
       QueryData looks for WordNet data files there.  Otherwise, QueryData looks for WordNet data files in
       WNHOME/dict (WNHOME\dict on a PC).  If WNHOME is not set, it defaults to "/usr/local/WordNet-3.0" on Unix
       and "C:\Program Files\WordNet\3.0" on a PC.  Normally, all you have to do is to set the WNHOME variable
       to the location where you unpacked your WordNet distribution.  The database files are normally unpacked
       to the "dict" subdirectory.

       You can also pass the location of the database files directly to QueryData.  To do this, pass the
       location to "new":

         my $wn = WordNet::QueryData->new("/usr/local/wordnet/dict");

       You can instead call the constructor with a hash of params, as in:

         my $wn = WordNet::QueryData->new(
             dir => "/usr/local/wordnet/dict",
             verbose => 0,
             noload => 1
         );

       When calling "new" in this fashion, two additional arguments are supported; "verbose" will output
       debugging information, and "noload" will cause the object to *not* load the indexes at startup.

   CACHING VERSUS NOLOAD
       The "noload" option results in data being retrieved using a dictionary lookup rather than caching the
       indexes in RAM.  This method yields an immediate startup time but *slightly* (though less than you might
       think) longer lookup time. For the curious, here are some profile data for each method on a duo core
       intel mac, averaged seconds over 10000 iterations:

       Caching versus noload times in seconds

                                                 noload => 1  noload => 0
       ------------------------------------------------------------------
       new()                                     0.00001      2.55
       queryWord("descending")                   0.0009       0.0001
       querySense("sunset#n#1", "hype")          0.0007       0.0001
       validForms ("lay down#2")                 0.0004       0.0001

       Obviously the new() comparison is not very useful, because nothing is happening with the constructor in
       the case of noload => 1. Similarly, lookups with caching are basically just hash lookups, and therefore
       very fast. The lookup times for noload => 1 illustrate the tradeoff between caching at new() time and
       using dictionary lookups.

       Because of the lookup speed increase when noload => 0, many users will find it useful to set noload to 1
       during development cycles, and to 0 when RAM is less of a concern than speed. The bottom line is that
       noload => 1 saves you over 2 seconds of startup time, and costs you about 0.0005 seconds per lookup.

   QUERYING THE DATABASE
       There are two primary query functions, 'querySense' and 'queryWord'.  querySense accesses semantic (sense
       to sense) relations; queryWord accesses lexical (word to word) relations.  The majority of relations are
       semantic.  Some relations, including "also see", antonym, pertainym, "participle of verb", and derived
       forms are lexical.  See the following WordNet documentation for additional information:

         http://wordnet.princeton.edu/man/wninput.5WN#sect3

       Both functions take as their first argument a query string that takes one of three types:

         (1) word (e.g. "dog")
         (2) word#pos (e.g. "house#n")
         (3) word#pos#sense (e.g. "ghostly#a#1")

       Types (1) or (2) passed to querySense or queryWord will return a list of possible query strings at the
       next level of specificity.  When type (3) is passed to querySense or queryWord, it requires a second
       argument, a relation.  Relations generally only work with one function or the other, though some
       relations can be either semantic or lexical; hence they may work for both functions.  Below is a list of
       known relations, grouped according to the function they're most likely to work with:

         queryWord
         ---------
         also - also see
         ants - antonyms
         deri - derived forms (nouns and verbs only)
         part - participle of verb (adjectives only)
         pert - pertainym (pertains to noun) (adjectives only)
         vgrp - verb group (verbs only)

         querySense
         ----------
         also - also see
         glos - word definition
         syns - synset words
         hype - hypernyms
         inst - instance of
         hypes - hypernyms and "instance of"
         hypo - hyponyms
         hasi - has instance
         hypos - hyponums and "has instance"
         mmem - member meronyms
         msub - substance meronyms
         mprt - part meronyms
         mero - all meronyms
         hmem - member holonyms
         hsub - substance holonyms
         hprt - part holonyms
         holo - all holonyms
         attr - attributes (?)
         sim  - similar to (adjectives only)
         enta - entailment (verbs only)
         caus - cause (verbs only)
         domn - domain - all
         dmnc - domain - category
         dmnu - domain - usage
         dmnr - domain - region
         domt - member of domain - all (nouns only)
         dmtc - member of domain - category (nouns only)
         dmtu - member of domain - usage (nouns only)
         dmtr - member of domain - region (nouns only)

       When called in this manner, querySense and queryWord will return a list of related words/senses.  Note
       that as of WordNet 2.1, many hypernyms have become "instance of" and many hyponyms have become "has
       instance."

       Note that querySense and queryWord use type (3) query strings in different ways.  A type (3) string
       passed to querySense specifies a synset.  A type (3) string passed to queryWord specifies a specific
       sense of a specific word.

   OTHER FUNCTIONS
       "validForms" accepts a type (1) or (2) query string.  It returns a list of all alternate forms (alternate
       spellings, conjugations, plural/singular forms, etc.).  The type (1) query returns alternates for all
       parts of speech (noun, verb, adjective, adverb).  WARNING: Only the first argument returned by validForms
       is certain to be valid (i.e. recognized by WordNet).  Remaining arguments may not be valid.

       "listAllWords" accepts a part of speech and returns the full list of words in the WordNet database for
       that part of speech.

       "level" accepts a type (3) query string and returns a distance (not necessarily the shortest or longest)
       to the root in the hypernym directed acyclic graph.

       "offset" accepts a type (3) query string and returns the binary offset of that sense's location in the
       corresponding data file.

       "tagSenseCnt" accepts a type (2) query string and returns the tagsense_cnt value for that lemma: "number
       of senses of lemma that are ranked according to their frequency of occurrence in semantic concordance
       texts."

       "lexname" accepts a type (3) query string and returns the lexname of the sense; see WordNet lexnames man
       page for more information.

       "frequency" accepts a type (3) query string and returns the frequency count of the sense from tagged
       text; see WordNet cntlist man page for more information.

       See test.pl for additional example usage.

NOTES

       Requires access to WordNet database files (data.noun/noun.dat, index.noun/noun.idx, etc.)

COPYRIGHT

       Copyright 2000-2005 Jason Rennie.  All rights reserved.

       This module is free software; you can redistribute it and/or modify it under the same terms as Perl
       itself.

NAME

SYNOPSIS

DESCRIPTION

USAGE

NOTES

COPYRIGHT

SEE ALSO