Provided by: libsearch-estraier-perl_0.09-5_all
NAME
Search::Estraier - pure perl module to use Hyper Estraier search engine
SYNOPSIS
Simple indexer use Search::Estraier; # create and configure node my $node = new Search::Estraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin', create => 1, label => 'Label for node', croak_on_error => 1, ); # create document my $doc = new Search::Estraier::Document; # add attributes $doc->add_attr('@uri', "http://estraier.gov/example.txt"); $doc->add_attr('@title', "Over the Rainbow"); # add body text to document $doc->add_text("Somewhere over the rainbow. Way up high."); $doc->add_text("There's a land that I heard of once in a lullaby."); die "error: ", $node->status,"\n" unless (eval { $node->put_doc($doc) }); Simple searcher use Search::Estraier; # create and configure node my $node = new Search::Estraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin', croak_on_error => 1, ); # create condition my $cond = new Search::Estraier::Condition; # set search phrase $cond->set_phrase("rainbow AND lullaby"); my $nres = $node->search($cond, 0); if (defined($nres)) { print "Got ", $nres->hits, " results\n"; # for each document in results for my $i ( 0 ... $nres->doc_num - 1 ) { # get result document my $rdoc = $nres->get_doc($i); # display attribte print "URI: ", $rdoc->attr('@uri'),"\n"; print "Title: ", $rdoc->attr('@title'),"\n"; print $rdoc->snippet,"\n"; } } else { die "error: ", $node->status,"\n"; }
DESCRIPTION
This module is implementation of node API of Hyper Estraier. Since it's perl-only module with dependencies only on standard perl modules, it will run on all platforms on which perl runs. It doesn't require compilation or Hyper Estraier development files on target machine. It is implemented as multiple packages which closly resamble Ruby implementation. It also includes methods to manage nodes. There are few examples in "scripts" directory of this distribution.
Inheritable common methods
This methods should really move somewhere else. _s Remove multiple whitespaces from string, as well as whitespaces at beginning or end my $text = $self->_s(" this is a text "); $text = 'this is a text';
Search::Estraier::Document
This class implements Document which is single item in Hyper Estraier. It's is collection of: attributes 'key' => 'value' pairs which can later be used for filtering of results You can add common filters to "attrindex" in estmaster's "_conf" file for better performance. See "attrindex" in Hyper Estraier P2P Guide <http://hyperestraier.sourceforge.net/nguide-en.html>. vectors also 'key' => 'value' pairs display text Text which will be used to create searchable corpus of your index and included in snippet output. hidden text Text which will be searchable, but will not be included in snippet. new Create new document, empty or from draft. my $doc = new Search::HyperEstraier::Document; my $doc2 = new Search::HyperEstraier::Document( $draft ); add_attr Add an attribute. $doc->add_attr( name => 'value' ); Delete attribute using $doc->add_attr( name => undef ); add_text Add a sentence of text. $doc->add_text('this is example text to display'); add_hidden_text Add a hidden sentence. $doc->add_hidden_text('this is example text just for search'); add_vectors Add a vectors $doc->add_vector( 'vector_name' => 42, 'another' => 12345, ); set_score Set the substitute score $doc->set_score(12345); score Get the substitute score id Get the ID number of document. If the object has never been registred, "-1" is returned. print $doc->id; attr_names Returns array with attribute names from document object. my @attrs = $doc->attr_names; attr Returns value of an attribute. my $value = $doc->attr( 'attribute' ); texts Returns array with text sentences. my @texts = $doc->texts; cat_texts Return whole text as single scalar. my $text = $doc->cat_texts; dump_draft Dump draft data from document object. print $doc->dump_draft; delete Empty document object $doc->delete; This function is addition to original Ruby API, and since it was included in C wrappers it's here as a convinience. Document objects which go out of scope will be destroyed automatically.
Search::Estraier::Condition
new my $cond = new Search::HyperEstraier::Condition; set_phrase $cond->set_phrase('search phrase'); add_attr $cond->add_attr('@URI STRINC /~dpavlin/'); set_order $cond->set_order('@mdate NUMD'); set_max $cond->set_max(42); set_options $cond->set_options( 'SURE' ); $cond->set_options( qw/AGITO NOIDF SIMPLE/ ); Possible options are: SURE check every N-gram USUAL check every second N-gram FAST check every third N-gram AGITO check every fourth N-gram NOIDF don't perform TF-IDF tuning SIMPLE use simplified query phrase Skipping N-grams will speed up search, but reduce accuracy. Every call to "set_options" will reset previous options; This option changed in version 0.04 of this module. It's backwards compatibile. phrase Return search phrase. print $cond->phrase; order Return search result order. print $cond->order; attrs Return search result attrs. my @cond_attrs = $cond->attrs; max Return maximum number of results. print $cond->max; "-1" is returned for unitialized value, 0 is unlimited. options Return options for this condition. print $cond->options; Options are returned in numerical form. set_skip Set number of skipped documents from beginning of results $cond->set_skip(42); Similar to "offset" in RDBMS. skip Return skip for this condition. print $cond->skip; set_distinct $cond->set_distinct('@author'); distinct Return distinct attribute print $cond->distinct; set_mask Filter out some links when searching. Argument array of link numbers, starting with 0 (current node). $cond->set_mask(qw/0 1 4/);
Search::Estraier::ResultDocument
new my $rdoc = new Search::HyperEstraier::ResultDocument( uri => 'http://localhost/document/uri/42', attrs => { foo => 1, bar => 2, }, snippet => 'this is a text of snippet' keywords => 'this\tare\tkeywords' ); uri Return URI of result document print $rdoc->uri; attr_names Returns array with attribute names from result document object. my @attrs = $rdoc->attr_names; attr Returns value of an attribute. my $value = $rdoc->attr( 'attribute' ); snippet Return snippet from result document print $rdoc->snippet; keywords Return keywords from result document print $rdoc->keywords;
Search::Estraier::NodeResult
new my $res = new Search::HyperEstraier::NodeResult( docs => @array_of_rdocs, hits => %hash_with_hints, ); doc_num Return number of documents print $res->doc_num; This will return real number of documents (limited by "max"). If you want to get total number of hits, see "hits". get_doc Return single document my $doc = $res->get_doc( 42 ); Returns undef if document doesn't exist. hint Return specific hint from results. print $res->hint( 'VERSION' ); Possible hints are: "VERSION", "NODE", "HIT", "HINT#n", "DOCNUM", "WORDNUM", "TIME", "LINK#n", "VIEW". hints More perlish version of "hint". This one returns hash. my %hints = $res->hints; hits Syntaxtic sugar for total number of hits for this query print $res->hits; It's same as print $res->hint('HIT'); but shorter.
Search::Estraier::Node
new my $node = new Search::HyperEstraier::Node; or optionally with "url" as parametar my $node = new Search::HyperEstraier::Node( 'http://localhost:1978/node/test' ); or in more verbose form my $node = new Search::HyperEstraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin' create => 1, label => 'optional node label', debug => 1, croak_on_error => 1 ); with following arguments: url URL to node user specify username for node server authentication passwd password for authentication create create node if it doesn't exists label optional label for new node if "create" is used debug dumps a lot of debugging output croak_on_error very helpful during development. It will croak on all errors instead of silently returning "-1" (which is convention of Hyper Estraier API in other languages). set_url Specify URL to node server $node->set_url('http://localhost:1978'); set_proxy Specify proxy server to connect to node server $node->set_proxy('proxy.example.com', 8080); set_timeout Specify timeout of connection in seconds $node->set_timeout( 15 ); set_auth Specify name and password for authentication to node server. $node->set_auth('clint','eastwood'); status Return status code of last request. print $node->status; "-1" means connection failure. put_doc Add a document $node->put_doc( $document_draft ) or die "can't add document"; Return true on success or false on failure. out_doc Remove a document $node->out_doc( document_id ) or "can't remove document"; Return true on success or false on failture. out_doc_by_uri Remove a registrated document using it's uri $node->out_doc_by_uri( 'file:///document/uri/42' ) or "can't remove document"; Return true on success or false on failture. edit_doc Edit attributes of a document $node->edit_doc( $document_draft ) or die "can't edit document"; Return true on success or false on failture. get_doc Retreive document my $doc = $node->get_doc( document_id ) or die "can't get document"; Return true on success or false on failture. get_doc_by_uri Retreive document my $doc = $node->get_doc_by_uri( 'file:///document/uri/42' ) or die "can't get document"; Return true on success or false on failture. get_doc_attr Retrieve the value of an atribute from object my $val = $node->get_doc_attr( document_id, 'attribute_name' ) or die "can't get document attribute"; get_doc_attr_by_uri Retrieve the value of an atribute from object my $val = $node->get_doc_attr_by_uri( document_id, 'attribute_name' ) or die "can't get document attribute"; etch_doc Exctract document keywords my $keywords = $node->etch_doc( document_id ) or die "can't etch document"; etch_doc_by_uri Retreive document my $keywords = $node->etch_doc_by_uri( 'file:///document/uri/42' ) or die "can't etch document"; Return true on success or false on failture. uri_to_id Get ID of document specified by URI my $id = $node->uri_to_id( 'file:///document/uri/42' ); This method won't croak, even if using "croak_on_error". _fetch_doc Private function used for implementing of "get_doc", "get_doc_by_uri", "etch_doc", "etch_doc_by_uri". # this will decode received draft into Search::Estraier::Document object my $doc = $node->_fetch_doc( id => 42 ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42' ); # to extract keywords, add etch my $doc = $node->_fetch_doc( id => 42, etch => 1 ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', etch => 1 ); # to get document attrubute add attr my $doc = $node->_fetch_doc( id => 42, attr => '@mdate' ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', attr => '@mdate' ); # more general form which allows implementation of # uri_to_id my $id = $node->_fetch_doc( uri => 'file:///document/uri/42', path => '/uri_to_id', chomp_resbody => 1 ); name my $node_name = $node->name; label my $node_label = $node->label; doc_num my $documents_in_node = $node->doc_num; word_num my $words_in_node = $node->word_num; size my $node_size = $node->size; search Search documents which match condition my $nres = $node->search( $cond, $depth ); $cond is "Search::Estraier::Condition" object, while <$depth> specifies depth for meta search. Function results "Search::Estraier::NodeResult" object. cond_to_query Return URI encoded string generated from Search::Estraier::Condition my $args = $node->cond_to_query( $cond, $depth ); shuttle_url This is method which uses "LWP::UserAgent" to communicate with Hyper Estraier node master. my $rv = shuttle_url( $url, $content_type, $req_body, \$resbody ); $resheads and $resbody booleans controll if response headers and/or response body will be saved within object. set_snippet_width Set width of snippets in results $node->set_snippet_width( $wwidth, $hwidth, $awidth ); $wwidth specifies whole width of snippet. It's 480 by default. If it's 0 snippet is not sent with results. If it is negative, whole document text is sent instead of snippet. $hwidth specified width of strings from beginning of string. Default value is 96. Negative or zero value keep previous value. $awidth specifies width of strings around each highlighted word. It's 96 by default. If negative of zero value is provided previous value is kept unchanged. set_user Manage users of node $node->set_user( 'name', $mode ); $mode can be one of: 0 delete account 1 set administrative right for user 2 set user account as guest Return true on success, otherwise false. set_link Manage node links $node->set_link('http://localhost:1978/node/another', 'another node label', $credit); If $credit is negative, link is removed. admins my @admins = @{ $node->admins }; Return array of users with admin rights on node guests my @guests = @{ $node->guests }; Return array of users with guest rights on node links my $links = @{ $node->links }; Return array of links for this node cacheusage Return cache usage for a node my $cache = $node->cacheusage; master Set actions on Hyper Estraier node master ("estmaster" process) $node->master( action => 'sync' ); All available actions are documented in http://hyperestraier.sourceforge.net/nguide-en.html#protocol <http://hyperestraier.sourceforge.net/nguide-en.html#protocol>
PRIVATE METHODS
You could call those directly, but you don't have to. I hope. _set_info Set information for node $node->_set_info; _clear_info Clear information for node $node->_clear_info; On next call to "name", "label", "doc_num", "word_num" or "size" node info will be fetch again from Hyper Estraier.
EXPORT
Nothing.
SEE ALSO
<http://hyperestraier.sourceforge.net/> Hyper Estraier Ruby interface on which this module is based. Hyper Estraier now also has pure-perl binding included in distribution. It's a faster way to access databases directly if you are not running "estmaster" P2P server.
AUTHOR
Dobrica Pavlinusic, <dpavlin@rot13.org> Robert Klep <robert@klep.name> contributed refactored search code
COPYRIGHT AND LICENSE
Copyright (C) 2005-2006 by Dobrica Pavlinusic This library is free software; you can redistribute it and/or modify it under the GPL v2 or later.