oracular (3) Biblio::Thesaurus.3pm.gz

Provided by: libbiblio-thesaurus-perl_0.43-3_all bug

NAME

       Biblio::Thesaurus - Perl extension for managing ISO thesaurus

SYNOPSIS

         use Biblio::Thesaurus;

         $obj = thesaurusNew();
         $obj = thesaurusLoad('iso-file');
         $obj = thesaurusRetrieve('storable-file');
         $obj = thesaurusMultiLoad('iso-file1','iso-file2',...);

         $obj->save('iso-file');
         $obj->storeOn('storable-file');

         $obj->addTerm('term');
         $obj->addRelation('term','relation','term1',...,'termn');
         $obj->deleteTerm('term');

         $obj->isDefined('term');

         $obj->describe( { rel='NT', desc="Narrow Term", lang=>"UK" } );

         $obj->addInverse('Relation1','Relation2');

         $obj->order('rela1', 'rel2', ....);
         @order = $obj->order();

         $obj->languages('l1', 'l2', ....);
         @langs = $obj->languages();

         $obj->baselang('l');
         $lang = $obj->baselang();

         $obj->topName('term');
         $term = $obj->topName();

         $html = $obj->navigate(+{configuration},%parameters);

         $html = $obj->getHTMLTop();

         $output = $obj->downtr(\%handler);
         $output = $obj->downtr(\%handler,'term', ... );

         $obj->appendThesaurus("iso-file");
         $obj->appendThesaurus($tobj);

         $obj->tc('term', 'relation1', 'relation2');
         $obj->depth_first('term', 2, "NT", "UF")

         $latex = $obj->toTex( ...)
         $xml   = $obj->toXml( ...)

ABSTRACT

       This module provides transparent methods to maintain Thesaurus files.  The module uses a
       subset from ISO 2788 which defines some standard features to be found on thesaurus files.
       The module also supports multilingual thesaurus and some extensions to the ISOs standard.

DESCRIPTION

       A Thesaurus is a classification structure. We can see it as a graph where nodes are terms
       and the vertices are relations between terms.

       This module provides transparent methods to maintain Thesaurus files.  The module uses a
       subset from ISO 2788 which defines some standard features to be found on thesaurus files.
       This ISO includes a set of relations that can be seen as standard but, this program can
       use user defined ones.  So, it can be used on ISO or not ISO thesaurus files.

File Structure

       Thesaurus used with this module are standard ASCII documents. This file can contain
       processing instructions, comments or term definitions. The instructions area is used to
       define new relations and mathematical properties between them.

       We can see the file with this structure:

          ______________
         |              |
         |    HEADER    | --> Can contain, only, processing instructions,
         |______________|     comment or empty lines.
         |              |
         |  Def Term 1  | --> Each term definition should be separated
         |              |     from each other with an empty line.
         |  Def Term 2  |
         |              |
         |     .....    |
         |              |
         |  Def Term n  |
         |______________|

       Comments can appear on any line. Meanwhile, the comment character (#) should be the first
       character on the line (with no spaces before).  Comments line span to the end of the line
       (until the first carriage return).

       Processing instructions lines, like comments, should start with the percent sign (%). We
       describe these instructions later on this document.

       Terms definitions can't contain any empty line because they are used to separate
       definitions from each other. On the first line of term definition record should appear the
       defined term. Next lines defines relations with other terms. The first characters should
       be an abbreviation of the relation (on upper case) and spaces. Then, should appear a comma
       separated list of terms.

       There can be more than one line with the same relation. Thesaurus module will concatenate
       the lists. If you want to continue a list on the next line you can repeat the relation
       term of leave some spaces between the start of the line and the terms list.

       Here is an example:

         Animal
         NT cat, dog, cow
            fish, ant
         NT camel
         BT Life being

         cat
         BT Animal
         SN domestic animal to be kicked when
            anything bad occurs.

       There can be defined a special term ("_top_"). It should be used when you want a top tree
       for thesaurus navigation. So, we normally define the "_top_" term with the more
       interesting terms to be navigated.

       The ISO subset used are:

       TT - Top Term
           The broadest term we can define about the current term.

       NT - Narrower Term
           Terms more specific than current term.

       BT - Broader Term
           More generic terms than current term.

       USE - Synonym
           Another chances when finding a Synonym.

       UF - Quasi-Synonym
           Terms that are no synonyms of current term but can be used, sometimes with that
           meaning.

       RT - Related Term
           Related term that can't be inserted on any other category.

       SN - Scope Note
           Text. Note of context of the current term. Use for definitions or comments about the
           scope you are using that term.

   Processing Instructions
       Processing instructions, as said before, are written on a line starting with the percent
       sign. Current commands are:

       top When presenting a thesaurus, we need a term, to know where to start.  Normally, we
           want the thesaurus to have some kind of top level, where to start navigating. This
           command specifies that term, the term that should be used when no term is specified.

           Example:

             %top Contents

             Contents
             NT Biography ...
             RT ...

       encoding
           This command defines the encoding used in the thesaurus file.

           Example:

            %enc utf8

       inverse
           This command defines the mathematic inverse of the relation. That is, if you define
           "inverse A B" and you know that "foo" is related by "A" with "bar", then, "bar" is
           related by "B" with "foo".

           Example:

             %inv BT NT
             %inverse UF USE

       description
           This command defines a description for some relation class. These descriptions are
           used when outputting thesaurus on HTML.

           Example:

             %desc SN Note of Scope
             %description IOF Instance of

           If you are constructing a multilingual thesaurus, you will want to translate the
           relation class description. To do this, you should use the "description" command with
           the language in from of it:

             %desc[PT] SN Nota de Contexto
             %description[PT] IOF Instancia de

       externals
           This defines classes that does not relate terms but, instead, relate a term with some
           text (a scope note, an url, etc.). This can be used like this:

             %ext SN URL
             %externals SN URL

           Note that you can specify more than one relation type per line.

       languages
           This other command permits the construction of a multilingual thesaurus. TO specify
           languages classifiers (like PT, EN, FR, and so on) you can use one of these lines:

             %lang PT EN FR
             %languages PT EN FR

           To describe (legend) the language names, you should use the description command, so,
           you could append:

             %description PT Portuguese
             %description EN English
             %description FR French

       baselanguage
           This one makes it possible to explicitly name the base language for the thesaurus.
           This command should be used with the "description" one, to describe the language name.
           Here is a simple example:

             %baselang PT
             %languages EN FR

             %description PT Portuguese
             %description EN English
             %description FR French

   I18N
       Internationalization functions, "languages" and "setLanguage" should be used before any
       other function or constructor. Note that when loading a saved thesaurus, descriptions
       defined on that file will be not translated.  That's important!

         interfaceLanguages()

       This function returns a list of languages that can be used on the current Thesaurus
       version.

         interfaceSetLanguage( <lang-name> )

       This function turns on the language specified. So, it is the first function you should
       call when using this module. By default, it uses Portuguese. Future version can change
       this, so you should call it any way.

API

       This module uses a perl object oriented model programming, so you must create an object
       with one of the "thesaurusNew", "thesaurusLoad" or "thesaurusRetrieve" commands. Next
       commands should be called using the OO fashion.

Constructors

   thesaurusNew
       To create an empty thesaurus object. The returned newly created object contains the
       inversion properties from the ISO classes and some stub descriptions for the same classes.

   thesaurusLoad
       To use the "thesaurusLoad" function, you must supply a file name.  This file name should
       correspond to the ISO ASCII file as defined on earlier sections. It returns the object
       with the contents of the file. If the file does not defined relations and descriptions
       about the ISO classes, they are added.

       Also,

         $obj = thesaurusLoad({ completed => 1}, 'iso-file');

       can be used to say that the thesaurus needs  not to be complete after load.

   thesaurusMultiLoad
       You can join different thesaurus ISO files using this function:

         $obj = thesaurusMultiLoad('iso-file1','iso-file2',...);

   appendThesaurus
       You can also append a thesaurus ISO (or another thesaurus object) to a loaded thesaurus.
       For that, use one of:

         $obj->appendThesaurus("iso-file");
         $obj->appendThesaurus( $other_thesaurus_object );

   thesaurusLoadM
       This method is used to load a thesaurus on the meta-thesaurus format. This is still under
       development.

   thesaurusRetrieve
       Everybody knows that text access and parsing of files is not efficient. So, this module
       can save and load thesaurus from Storable files. This function should receive a file name
       from a file which was saved using the "storeOn" function.

Methods

   save
       This method dumps the object on an ISO ASCII file. Note that the sequence "thesaurusLoad",
       "save" is not the identity function. Comments are removed and processing instructions can
       be added. To use it, you should supply a file name.

       Note: if the process fails, this method will return 0. Any other method die when failing
       to save on a file.

   meta2str
       This method returns the ISO ascii description of the metadata.

   storeOn
       This method saves the thesaurus object in Storable format. You should use it when you want
       to load with the "thesaurusRetrieve" function.

   addTerm
       You can add terms definitions using the perl API. This method adds a term on the
       thesaurus. Note that if that term already exists, all its relations will be deleted.

   all_terms
       Returns an array with all terms for the thesaurus base language.  NOTE: this function is
       deprecated. Use allTerms instead.

   allTerms
       Returns an array with all terms for the thesaurus base language.

   topName
       Returns the term in the top of the thesaurus, or defined a new one if called with an
       argument.

   top_name
       Deprecated. See "<topName">;

   addRelation
       To add relations to a term, use this method. It can be called again and again. Previous
       inserted relations will not be deleted.  This method can be used with a list of terms for
       the relation like:

         $obj->addRelation('Animal','NT','cat','dog','cow','camel');

       Note: After you add a big amount of relations, autocomplete the thesaurus using the
       $obj->complete() method. Completing after each relation addiction is time and cpu
       consuming.

   hasRelation
       Checks if a specific relation exists in the Thesaurus:

         if ($obj->hasRelation('Animal','NT','cat')) { ... }

       You can check if a term has a relation "X" with anything:

         if ($obj->hasRelation('Animal','SN')) { ... }

   deleteRelation
         $obj->deleteRelation('Animal','NT','cat','dog','cow','camel');

   deleteTerm
       Use this method to remove all references of the term supplied. Note that all references
       will be deleted.

   describe
       You can use this method to describe some relation class. You can use it to change the
       description of an existing class (like the ISO ones) or to define a new class.

   isDefined
       Use this method to check if a term exists in the thesaurus.

   setExternal
       Use this method to define that a relation is "extern".

   isExternal
       Use this method to check if a relation is "extern".

   isLanguage
       Use this method to check if a relation is a Language.

   getdefinition
       Deprecated. Use "<getDefinition"

   getDefinition
       Returns the definition for a term. The definition is a feature structure containing the
       term information.

   getDescription
       Given a relation name and a language (or the default will be used), it returns the
       description for that relation.

   relations
       Call this method with a term, and it returns a list of the relations defined for that
       term.

   addInverse
       This method should be used to describe the inversion property to relation classes. Note
       that if there is some previous property about any of the relations, it will de deleted. If
       any of the relations does not exist, it will be added.

   order
       With this method you can define (and access) the order of classes. This order is used
       whenever you call a dump function, or the navigation CGI.

   navigate
       This function is a some what type of CGI included on a object method. You must supply an
       associative array of CGI parameters. This method prints an HTML thesaurus for Web
       Navigation.

       The typical thesaurus navigation CGI is:

         #!/usr/bin/perl -w

         use CGI qw/:standard/;
         use Biblio::Thesaurus;

         print header;
         for (param()) { $arg{$_} = param($_) }
         $thesaurus = thesaurusLoad("thesaurus_file");
         print $thesaurus->navigate(%arg);

       This method can receive, as first argument, a reference to an associative array with some
       configuration variables like what relations to be expanded and what language to be used by
       default.

       So, in the last example we could write

         $thesaurus->navigate(+{expand=>['NT', 'USE'],
                                lang  =>'EN'})

       meaning that the structure should show two levels of 'NT' and 'USE' relations, and that it
       should use the English language.

       These options include:

       capitalize
           try to capitalize terms when they are the title of the page.

       expand
           a reference to a list of relations that should be expanded at first level; Defaults to
           the empty list.

       title
           can be "yes" or "no". If it is "no", the current term will not be shown as a title;
           Defaults to "yes".

       scriptname
           the name of the script the links should point on. Defaults to current page name.

       level1hide
           a reference to a list of relations to do not show on the first level.  Defaults to the
           empty list. Useful to hide the 'LEN' relation when using Library::Simple.

       level2size
           the number of terms to be shown on each second level relation; Defaults to 0 (all
           terms).

       level2hide
           a reference to a list of relations to do not show on the second level. Defaults to the
           empty list.

       topic_name
           the name of the topic CGI parameter (default: "t")

   dumpHTML
       This method returns a big string containing all the thesaurus in HTML. It is mainly used
       for debug.

   getHTMLTop
       This method returns the HTML needed for the top level of the browsing thesaurus. It can be
       useful when putting a top level on the first page of a portal.

   complete
       This function completes the thesaurus based on the invertibility properties. This
       operation is only needed when adding terms and relations by this API. Whenever the system
       loads a thesaurus ISO file, it is completed.

   baselang
       Use this method to set or retrieve the base language of the thesaurus.  If no baselang is
       provided, the value "_" is returned

   downtr
       The "downtr" method is used to produce something from a set of terms.  When no term is
       given, the all thesaurus is taken.  It should be passed as argument a term and an
       associative array (handler) with anonymous subroutines that process each relation. These
       functions can use the pre-instantiated variables $term, $rel, @terms.  The handler can
       have three special functions: "-default" (default handler for relations that don't have a
       defined function in the handler), "-eachTerm" executed with each term output (received as
       $_), and "-end" executed over the output of the the other functions (received as $_),

       If a "-order" array reference is provided, the correspondent order of the relations will
       be used.

       Example:

          $the->downtr( { NT       => sub{ ""},    #Do nothing with NT relations
                          -default => sub{ print "$rel", join(",",@terms) }
                        },
                        "frog" );

          print $thesaurus->downtr(
            {-default  => sub { "\n$rel \t".join("\n\t",@terms)},
             -eachTerm => sub { "\n______________ $term $_"},
             -end      => sub { "Thesaurus :\n $_ \nFIM\n"},
             -order    => ["BT","NT","RT"],
            });

       Both functions return a output value: the concatenation of the internal values (but
       functions can also work with side effects)

   depth_first
       The "depth_first" method is used to get the list of terms (in fact the tree of terms)
       related with $term by relations @r up to the level $lev

         $hashref = $the->depth_first($term ,$lev, @r)

         $hashref = $the->depth_first("frog", 2, "NT","UF")

       $lev should be an integer greater then 0.

   tc transitive closure
       The "tc" method is used to eval the transitive closure of the relations @r starting from a
       term $term

         $the->tc($term , @r)

         $the->tc("frog", "NT","UF")

   terms
       The "terms" method is used to get all the terms related by relations @r with $term

         $the->terms($term , @r);

         $the->terms("frog", "NT", "UF");

   toTex
       Writes a thesaurus in LaTeX format...  The first argument is used to pass a tag
       substitution hash.  It uses downtr function to make the translation; a downtr handler can
       be given to tune some transformations details...

         print $thesaurus->toTex(
                {EN=>["\\\\\\emph{Ingles} -- ",""]},
                {FR => sub{""}})

   toXml
       This method writes a thesaurus in XML format...  The first argument is used fo pass a tag
       substitution hash.  It uses downtr function to make the translation; a downtr handler can
       be given to tune some transformations details...

         print $thesaurus->toXml();

   toJson
       Returns a JSON tree based on NT relation. Other relation can be supplied as an argument.
       Future versions might include language selection.

         print $thesaurus->toJson();

   toHash
       Returns a Hash reference with a tree based on NT relation. Other relation can be supplied
       as an argument. Future versions might include language selection.

         print $thesaurus->toHash();

AUTHOR

       Alberto Simoes, <albie@alfarrabio.di.uminho.pt>

       Jos� Joao Almeida, <jj@di.uminho.pt>

       Sara Correia,  <sara.correia@portugalmail.com>

       This module is included in the Natura project. You can visit it at
       http://natura.di.uminho.pt, and access the SVN tree.

       Copyright 2000-2012 Project Natura.

       This program is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.

SEE ALSO

       The example thesaurus file ("examples/thesaurus"),

       Manpages:

         Biblio::WebPortal(3)
         Biblio::Catalog(3)
         Biblio::Catalog::Bibtex(3)
         perl(1) manpages.

   loading from Iso 2788 =head2 building a thesaurus with internal constructors =head2 writing a
       thesaurus in another format