Provided by: libsearch-estraier-perl_0.09-5_all bug

NAME

       Search::Estraier - pure perl module to use Hyper Estraier search engine

SYNOPSIS

   Simple indexer
               use Search::Estraier;

               # create and configure node
               my $node = new Search::Estraier::Node(
                       url => 'http://localhost:1978/node/test',
                       user => 'admin',
                       passwd => 'admin',
                       create => 1,
                       label => 'Label for node',
                       croak_on_error => 1,
               );

               # create document
               my $doc = new Search::Estraier::Document;

               # add attributes
               $doc->add_attr('@uri', "http://estraier.gov/example.txt");
               $doc->add_attr('@title', "Over the Rainbow");

               # add body text to document
               $doc->add_text("Somewhere over the rainbow.  Way up high.");
               $doc->add_text("There's a land that I heard of once in a lullaby.");

               die "error: ", $node->status,"\n" unless (eval { $node->put_doc($doc) });

   Simple searcher
               use Search::Estraier;

               # create and configure node
               my $node = new Search::Estraier::Node(
                       url => 'http://localhost:1978/node/test',
                       user => 'admin',
                       passwd => 'admin',
                       croak_on_error => 1,
               );

               # create condition
               my $cond = new Search::Estraier::Condition;

               # set search phrase
               $cond->set_phrase("rainbow AND lullaby");

               my $nres = $node->search($cond, 0);

               if (defined($nres)) {
                       print "Got ", $nres->hits, " results\n";

                       # for each document in results
                       for my $i ( 0 ... $nres->doc_num - 1 ) {
                               # get result document
                               my $rdoc = $nres->get_doc($i);
                               # display attribte
                               print "URI: ", $rdoc->attr('@uri'),"\n";
                               print "Title: ", $rdoc->attr('@title'),"\n";
                               print $rdoc->snippet,"\n";
                       }
               } else {
                       die "error: ", $node->status,"\n";
               }

DESCRIPTION

       This module is implementation of node API of Hyper Estraier. Since it's perl-only module
       with dependencies only on standard perl modules, it will run on all platforms on which
       perl runs. It doesn't require compilation or Hyper Estraier development files on target
       machine.

       It is implemented as multiple packages which closly resamble Ruby implementation. It also
       includes methods to manage nodes.

       There are few examples in "scripts" directory of this distribution.

Inheritable common methods

       This methods should really move somewhere else.

   _s
       Remove multiple whitespaces from string, as well as whitespaces at beginning or end

        my $text = $self->_s(" this  is a text  ");
        $text = 'this is a text';

Search::Estraier::Document

       This class implements Document which is single item in Hyper Estraier.

       It's is collection of:

       attributes
           'key' => 'value' pairs which can later be used for filtering of results

           You can add common filters to "attrindex" in estmaster's "_conf" file for better
           performance. See "attrindex" in Hyper Estraier P2P Guide
           <http://hyperestraier.sourceforge.net/nguide-en.html>.

       vectors
           also 'key' => 'value' pairs

       display text
           Text which will be used to create searchable corpus of your index and included in
           snippet output.

       hidden text
           Text which will be searchable, but will not be included in snippet.

   new
       Create new document, empty or from draft.

         my $doc = new Search::HyperEstraier::Document;
         my $doc2 = new Search::HyperEstraier::Document( $draft );

   add_attr
       Add an attribute.

         $doc->add_attr( name => 'value' );

       Delete attribute using

         $doc->add_attr( name => undef );

   add_text
       Add a sentence of text.

         $doc->add_text('this is example text to display');

   add_hidden_text
       Add a hidden sentence.

         $doc->add_hidden_text('this is example text just for search');

   add_vectors
       Add a vectors

         $doc->add_vector(
               'vector_name' => 42,
               'another' => 12345,
         );

   set_score
       Set the substitute score

         $doc->set_score(12345);

   score
       Get the substitute score

   id
       Get the ID number of document. If the object has never been registred, "-1" is returned.

         print $doc->id;

   attr_names
       Returns array with attribute names from document object.

         my @attrs = $doc->attr_names;

   attr
       Returns value of an attribute.

         my $value = $doc->attr( 'attribute' );

   texts
       Returns array with text sentences.

         my @texts = $doc->texts;

   cat_texts
       Return whole text as single scalar.

        my $text = $doc->cat_texts;

   dump_draft
       Dump draft data from document object.

         print $doc->dump_draft;

   delete
       Empty document object

         $doc->delete;

       This function is addition to original Ruby API, and since it was included in C wrappers
       it's here as a convinience. Document objects which go out of scope will be destroyed
       automatically.

Search::Estraier::Condition

   new
         my $cond = new Search::HyperEstraier::Condition;

   set_phrase
         $cond->set_phrase('search phrase');

   add_attr
         $cond->add_attr('@URI STRINC /~dpavlin/');

   set_order
         $cond->set_order('@mdate NUMD');

   set_max
         $cond->set_max(42);

   set_options
         $cond->set_options( 'SURE' );

         $cond->set_options( qw/AGITO NOIDF SIMPLE/ );

       Possible options are:

       SURE    check every N-gram

       USUAL   check every second N-gram

       FAST    check every third N-gram

       AGITO   check every fourth N-gram

       NOIDF   don't perform TF-IDF tuning

       SIMPLE  use simplified query phrase

       Skipping N-grams will speed up search, but reduce accuracy. Every call to "set_options"
       will reset previous options;

       This option changed in version 0.04 of this module. It's backwards compatibile.

   phrase
       Return search phrase.

         print $cond->phrase;

   order
       Return search result order.

         print $cond->order;

   attrs
       Return search result attrs.

         my @cond_attrs = $cond->attrs;

   max
       Return maximum number of results.

         print $cond->max;

       "-1" is returned for unitialized value, 0 is unlimited.

   options
       Return options for this condition.

         print $cond->options;

       Options are returned in numerical form.

   set_skip
       Set number of skipped documents from beginning of results

         $cond->set_skip(42);

       Similar to "offset" in RDBMS.

   skip
       Return skip for this condition.

         print $cond->skip;

   set_distinct
         $cond->set_distinct('@author');

   distinct
       Return distinct attribute

         print $cond->distinct;

   set_mask
       Filter out some links when searching.

       Argument array of link numbers, starting with 0 (current node).

         $cond->set_mask(qw/0 1 4/);

Search::Estraier::ResultDocument

   new
         my $rdoc = new Search::HyperEstraier::ResultDocument(
               uri => 'http://localhost/document/uri/42',
               attrs => {
                       foo => 1,
                       bar => 2,
               },
               snippet => 'this is a text of snippet'
               keywords => 'this\tare\tkeywords'
         );

   uri
       Return URI of result document

         print $rdoc->uri;

   attr_names
       Returns array with attribute names from result document object.

         my @attrs = $rdoc->attr_names;

   attr
       Returns value of an attribute.

         my $value = $rdoc->attr( 'attribute' );

   snippet
       Return snippet from result document

         print $rdoc->snippet;

   keywords
       Return keywords from result document

         print $rdoc->keywords;

Search::Estraier::NodeResult

   new
         my $res = new Search::HyperEstraier::NodeResult(
               docs => @array_of_rdocs,
               hits => %hash_with_hints,
         );

   doc_num
       Return number of documents

         print $res->doc_num;

       This will return real number of documents (limited by "max").  If you want to get total
       number of hits, see "hits".

   get_doc
       Return single document

         my $doc = $res->get_doc( 42 );

       Returns undef if document doesn't exist.

   hint
       Return specific hint from results.

         print $res->hint( 'VERSION' );

       Possible hints are: "VERSION", "NODE", "HIT", "HINT#n", "DOCNUM", "WORDNUM", "TIME",
       "LINK#n", "VIEW".

   hints
       More perlish version of "hint". This one returns hash.

         my %hints = $res->hints;

   hits
       Syntaxtic sugar for total number of hits for this query

         print $res->hits;

       It's same as

         print $res->hint('HIT');

       but shorter.

Search::Estraier::Node

   new
         my $node = new Search::HyperEstraier::Node;

       or optionally with "url" as parametar

         my $node = new Search::HyperEstraier::Node( 'http://localhost:1978/node/test' );

       or in more verbose form

         my $node = new Search::HyperEstraier::Node(
               url => 'http://localhost:1978/node/test',
               user => 'admin',
               passwd => 'admin'
               create => 1,
               label => 'optional node label',
               debug => 1,
               croak_on_error => 1
         );

       with following arguments:

       url URL to node

       user
           specify username for node server authentication

       passwd
           password for authentication

       create
           create node if it doesn't exists

       label
           optional label for new node if "create" is used

       debug
           dumps a lot of debugging output

       croak_on_error
           very helpful during development. It will croak on all errors instead of silently
           returning "-1" (which is convention of Hyper Estraier API in other languages).

   set_url
       Specify URL to node server

         $node->set_url('http://localhost:1978');

   set_proxy
       Specify proxy server to connect to node server

         $node->set_proxy('proxy.example.com', 8080);

   set_timeout
       Specify timeout of connection in seconds

         $node->set_timeout( 15 );

   set_auth
       Specify name and password for authentication to node server.

         $node->set_auth('clint','eastwood');

   status
       Return status code of last request.

         print $node->status;

       "-1" means connection failure.

   put_doc
       Add a document

         $node->put_doc( $document_draft ) or die "can't add document";

       Return true on success or false on failure.

   out_doc
       Remove a document

         $node->out_doc( document_id ) or "can't remove document";

       Return true on success or false on failture.

   out_doc_by_uri
       Remove a registrated document using it's uri

         $node->out_doc_by_uri( 'file:///document/uri/42' ) or "can't remove document";

       Return true on success or false on failture.

   edit_doc
       Edit attributes of a document

         $node->edit_doc( $document_draft ) or die "can't edit document";

       Return true on success or false on failture.

   get_doc
       Retreive document

         my $doc = $node->get_doc( document_id ) or die "can't get document";

       Return true on success or false on failture.

   get_doc_by_uri
       Retreive document

         my $doc = $node->get_doc_by_uri( 'file:///document/uri/42' ) or die "can't get document";

       Return true on success or false on failture.

   get_doc_attr
       Retrieve the value of an atribute from object

         my $val = $node->get_doc_attr( document_id, 'attribute_name' ) or
               die "can't get document attribute";

   get_doc_attr_by_uri
       Retrieve the value of an atribute from object

         my $val = $node->get_doc_attr_by_uri( document_id, 'attribute_name' ) or
               die "can't get document attribute";

   etch_doc
       Exctract document keywords

         my $keywords = $node->etch_doc( document_id ) or die "can't etch document";

   etch_doc_by_uri
       Retreive document

         my $keywords = $node->etch_doc_by_uri( 'file:///document/uri/42' ) or die "can't etch document";

       Return true on success or false on failture.

   uri_to_id
       Get ID of document specified by URI

         my $id = $node->uri_to_id( 'file:///document/uri/42' );

       This method won't croak, even if using "croak_on_error".

   _fetch_doc
       Private function used for implementing of "get_doc", "get_doc_by_uri", "etch_doc",
       "etch_doc_by_uri".

        # this will decode received draft into Search::Estraier::Document object
        my $doc = $node->_fetch_doc( id => 42 );
        my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42' );

        # to extract keywords, add etch
        my $doc = $node->_fetch_doc( id => 42, etch => 1 );
        my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', etch => 1 );

        # to get document attrubute add attr
        my $doc = $node->_fetch_doc( id => 42, attr => '@mdate' );
        my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', attr => '@mdate' );

        # more general form which allows implementation of
        # uri_to_id
        my $id = $node->_fetch_doc(
               uri => 'file:///document/uri/42',
               path => '/uri_to_id',
               chomp_resbody => 1
        );

   name
         my $node_name = $node->name;

   label
         my $node_label = $node->label;

   doc_num
         my $documents_in_node = $node->doc_num;

   word_num
         my $words_in_node = $node->word_num;

   size
         my $node_size = $node->size;

   search
       Search documents which match condition

         my $nres = $node->search( $cond, $depth );

       $cond is "Search::Estraier::Condition" object, while <$depth> specifies depth for meta
       search.

       Function results "Search::Estraier::NodeResult" object.

   cond_to_query
       Return URI encoded string generated from Search::Estraier::Condition

         my $args = $node->cond_to_query( $cond, $depth );

   shuttle_url
       This is method which uses "LWP::UserAgent" to communicate with Hyper Estraier node master.

         my $rv = shuttle_url( $url, $content_type, $req_body, \$resbody );

       $resheads and $resbody booleans controll if response headers and/or response body will be
       saved within object.

   set_snippet_width
       Set width of snippets in results

         $node->set_snippet_width( $wwidth, $hwidth, $awidth );

       $wwidth specifies whole width of snippet. It's 480 by default. If it's 0 snippet is not
       sent with results. If it is negative, whole document text is sent instead of snippet.

       $hwidth specified width of strings from beginning of string. Default value is 96. Negative
       or zero value keep previous value.

       $awidth specifies width of strings around each highlighted word. It's 96 by default.  If
       negative of zero value is provided previous value is kept unchanged.

   set_user
       Manage users of node

         $node->set_user( 'name', $mode );

       $mode can be one of:

       0   delete account

       1   set administrative right for user

       2   set user account as guest

       Return true on success, otherwise false.

   set_link
       Manage node links

         $node->set_link('http://localhost:1978/node/another', 'another node label', $credit);

       If $credit is negative, link is removed.

   admins
        my @admins = @{ $node->admins };

       Return array of users with admin rights on node

   guests
        my @guests = @{ $node->guests };

       Return array of users with guest rights on node

   links
        my $links = @{ $node->links };

       Return array of links for this node

   cacheusage
       Return cache usage for a node

         my $cache = $node->cacheusage;

   master
       Set actions on Hyper Estraier node master ("estmaster" process)

         $node->master(
               action => 'sync'
         );

       All available actions are documented in
       http://hyperestraier.sourceforge.net/nguide-en.html#protocol
       <http://hyperestraier.sourceforge.net/nguide-en.html#protocol>

PRIVATE METHODS

       You could call those directly, but you don't have to. I hope.

   _set_info
       Set information for node

         $node->_set_info;

   _clear_info
       Clear information for node

         $node->_clear_info;

       On next call to "name", "label", "doc_num", "word_num" or "size" node info will be fetch
       again from Hyper Estraier.

EXPORT

       Nothing.

SEE ALSO

       <http://hyperestraier.sourceforge.net/>

       Hyper Estraier Ruby interface on which this module is based.

       Hyper Estraier now also has pure-perl binding included in distribution. It's a faster way
       to access databases directly if you are not running "estmaster" P2P server.

AUTHOR

       Dobrica Pavlinusic, <dpavlin@rot13.org>

       Robert Klep <robert@klep.name> contributed refactored search code

COPYRIGHT AND LICENSE

       Copyright (C) 2005-2006 by Dobrica Pavlinusic

       This library is free software; you can redistribute it and/or modify it under the GPL v2
       or later.