Ubuntu Manpage: Sphinx::Search - Sphinx search engine API Perl client

Provided by: libsphinx-search-perl_0.29-2_all

NAME

       Sphinx::Search - Sphinx search engine API Perl client

VERSION

       Please note that you *MUST* install a version which is compatible with your version of
       Sphinx.

       Use version 0.29 for Sphinx-2.2.8-release or later (or use DBI instead)

       Use version 0.28 for Sphinx-2.0.8-release or later

       Use version 0.27.2 for Sphinx-2.0.3-release (svn-r3043)

       Use version 0.26.1 for Sphinx-2.0.1-beta (svn-r2792)

       Use version 0.25_03 for Sphinx svn-r2575

       Use version 0.24.1 for Sphinx-1.10-beta (svn-r2420)

       Use version 0.23_02 for Sphinx svn-r2269 (experimental)

       Use version 0.22 for Sphinx 0.9.9-rc2 and later (Please read the Compatibility Note under
       SetEncoders regarding encoding changes)

       Use version 0.15 for Sphinx 0.9.9-svn-r1674

       Use version 0.12 for Sphinx 0.9.8

       Use version 0.11 for Sphinx 0.9.8-rc1

       Use version 0.10 for Sphinx 0.9.8-svn-r1112

       Use version 0.09 for Sphinx 0.9.8-svn-r985

       Use version 0.08 for Sphinx 0.9.8-svn-r871

       Use version 0.06 for Sphinx 0.9.8-svn-r820

       Use version 0.05 for Sphinx 0.9.8-cvs-20070907

       Use version 0.02 for Sphinx 0.9.8-cvs-20070818

SYNOPSIS

           use Sphinx::Search;

           $sph = Sphinx::Search->new();

           # Standard API query
           $results = $sph->SetSortMode(SPH_SORT_RELEVANCE)
                          ->Query("search terms");

           # SphinxQL query
           $results = $sph->SphinxQL("SELECT * FROM myindex WHERE MATCH('search terms')");

DESCRIPTION

       This is the Perl API client for the Sphinx open-source SQL full-text indexing search
       engine, <http://www.sphinxsearch.com>.

       Since 0.9.9, Sphinx supports a native MySQL-protocol client, i.e. DBI with DBD::mysql.
       That is, you can configure the server to have a mysql41 listening port and then simply do

         my $dbh = DBI->connect('dbi:mysql:host=127.0.0.1;port=9306;mysql_enable_utf8=1') or die "Failed to connect via DBI";
         my $sth = $dbh->prepare_cached("SELECT * FROM myindex WHERE MATCH('search terms')");
         $sth->execute();
         while (my $row = $sth->fetchrow_arrayref) {
             ... # Collect results
         }

       The DBI client turns out to be significantly (about 5x) faster than this pure-Perl API.
       You should probably be using that instead.

       This module also supports SphinxQL queries, with the small advantage that you can use
       either the standard API or the SphinxQL API over the one port (i.e. the searchd server
       does not need to be configured with a mysql41 listening port).

       Given that the DBI client has several advantages over this API, future updates of this
       module are unlikely.

CONSTRUCTOR

   new
           $sph = Sphinx::Search->new;
           $sph = Sphinx::Search->new(\%options);

       Create a new Sphinx::Search instance.

       OPTIONS

       log Specify an optional logger instance.  This can be any class that provides error, warn,
           info, and debug methods (e.g. see Log::Log4perl).  Logging is disabled if no logger
           instance is provided.

       debug
           Debug flag.  If set (and a logger instance is specified), debugging messages will be
           generated.

METHODS

   GetLastError
           $error = $sph->GetLastError;

       Get last error message (string)

   GetLastWarning
           $warning = $sph->GetLastWarning;

       Get last warning message (string)

   IsConnectError
       Check connection error flag (to differentiate between network connection errors and bad
       responses).  Returns true value on connection error.

   SetEncoders
           $sph->SetEncoders(\&encode_function, \&decode_function)

       COMPATIBILITY NOTE: SetEncoders() was introduced in version 0.17.  Prior to that, all
       strings were considered to be sequences of bytes which may have led to issues with multi-
       byte characters.  If you were previously encoding/decoding strings external to
       Sphinx::Search, you will need to disable encoding/decoding by setting Sphinx::Search to
       use raw values as explained below (or modify your code and let Sphinx::Search do the
       recoding).

       Set the string encoder/decoder functions for transferring strings between perl and Sphinx.
       The encoder should take the perl internal representation and convert to the bytestream
       that searchd expects, and the decoder should take the bytestream returned by searchd and
       convert to perl format.

       The searchd format will depend on the 'charset_type' index setting in the Sphinx
       configuration file.

       The coders default to encode_utf8 and decode_utf8 respectively, which are compatible with
       the 'utf8' charset_type.

       If either the encoder or decoder functions are left undefined in the call to SetEncoders,
       they return to their default values.

       If you wish to send raw values (no encoding/decoding), supply a function that simply
       returns its argument, e.g.

           $sph->SetEncoders( sub { shift }, sub { shift });

       Returns $sph.

   SetServer
           $sph->SetServer($host, $port);
           $sph->SetServer($path, $port);

       In the first form, sets the host (string) and port (integer) details for the searchd
       server using a network (INET) socket (default is localhost:9312).

       In the second form, where $path is a local filesystem path (optionally prefixed by
       'unix://'), sets the client to access the searchd server via a local (UNIX domain) socket
       at the specified path.

       Returns $sph.

   SetConnectTimeout
           $sph->SetConnectTimeout($timeout)

       Set server connection timeout (in seconds).

       Returns $sph.

   SetConnectRetries
           $sph->SetConnectRetries($retries)

       Set server connection retries (in case of connection fail).

       Returns $sph.

   SetLimits
           $sph->SetLimits($offset, $limit);
           $sph->SetLimits($offset, $limit, $max);

       Set match offset/limits, and optionally the max number of matches to return.

       Returns $sph.

   SetMaxQueryTime
           $sph->SetMaxQueryTime($millisec);

       Set maximum query time, in milliseconds, per index.

       The value may not be negative; 0 means "do not limit".

       Returns $sph.

   SetMatchMode
       ** DEPRECATED **

           $sph->SetMatchMode($mode);

       Set match mode, which may be one of:

       •   SPH_MATCH_ALL

           Match all words

       •   SPH_MATCH_ANY

           Match any words

       •   SPH_MATCH_PHRASE

           Exact phrase match

       •   SPH_MATCH_BOOLEAN

           Boolean match, using AND (&), OR (|), NOT (!,-) and parenthetic grouping.

       •   SPH_MATCH_EXTENDED

           Extended match, which includes the Boolean syntax plus field, phrase and proximity
           operators.

       Returns $sph.

   SetRankingMode
           $sph->SetRankingMode(SPH_RANK_BM25, $rank_exp);

       Set ranking mode, which may be one of:

       •   SPH_RANK_PROXIMITY_BM25

           Default mode, phrase proximity major factor and BM25 minor one

       •   SPH_RANK_BM25

           Statistical mode, BM25 ranking only (faster but worse quality)

       •   SPH_RANK_NONE

           No ranking, all matches get a weight of 1

       •   SPH_RANK_WORDCOUNT

           Simple word-count weighting, rank is a weighted sum of per-field keyword occurence
           counts

       •   SPH_RANK_MATCHANY

           Returns rank as it was computed in SPH_MATCH_ANY mode earlier, and is internally used
           to emulate SPH_MATCH_ANY queries.

       •   SPH_RANK_FIELDMASK

           Returns a 32-bit mask with N-th bit corresponding to N-th fulltext field, numbering
           from 0. The bit will only be set when the respective field has any keyword occurences
           satisfiying the query.

       •   SPH_RANK_SPH04

           SPH_RANK_SPH04 is generally based on the default SPH_RANK_PROXIMITY_BM25 ranker, but
           additionally boosts the matches when they occur in the very beginning or the very end
           of a text field.

       •   SPH_RANK_EXPR

           Allows the ranking formula to be specified at run time. It exposes a number of
           internal text factors and lets you define how the final weight should be computed from
           those factors.  $rank_exp should be set to the ranking expression string, e.g. to
           emulate SPH_RANK_PROXIMITY_BM25, use "sum(lcs*user_weight)*1000+bm25".

       Returns $sph.

   SetSortMode
           $sph->SetSortMode(SPH_SORT_RELEVANCE);
           $sph->SetSortMode($mode, $sortby);

       Set sort mode, which may be any of:

       SPH_SORT_RELEVANCE - sort by relevance
       SPH_SORT_ATTR_DESC, SPH_SORT_ATTR_ASC
           Sort by attribute descending/ascending.  $sortby specifies the sorting attribute.

       SPH_SORT_TIME_SEGMENTS
           Sort by time segments (last hour/day/week/month) in descending order, and then by
           relevance in descending order.  $sortby specifies the time attribute.

       SPH_SORT_EXTENDED
           Sort by SQL-like syntax.  $sortby is the sorting specification.

       SPH_SORT_EXPR

       Returns $sph.

   SetWeights
       ** DEPRECATED **

           $sph->SetWeights([ 1, 2, 3, 4]);

       This method is deprecated.  Use SetFieldWeights instead.

       Set per-field (integer) weights.  The ordering of the weights correspond to the ordering
       of fields as indexed.

       Returns $sph.

   SetFieldWeights
           $sph->SetFieldWeights(\%weights);

       Set per-field (integer) weights by field name.  The weights hash provides field name to
       weight mappings.

       Takes precedence over SetWeights.

       Unknown names will be silently ignored.  Missing fields will be given a weight of 1.

       Returns $sph.

   SetIndexWeights
           $sph->SetIndexWeights(\%weights);

       Set per-index (integer) weights.  The weights hash is a mapping of index name to integer
       weight.

       Returns $sph.

   SetIDRange
           $sph->SetIDRange($min, $max);

       Set IDs range only match those records where document ID is between $min and $max
       (including $min and $max)

       Returns $sph.

   SetFilter
           $sph->SetFilter($attr, \@values);
           $sph->SetFilter($attr, \@values, $exclude);

       Sets the results to be filtered on the given attribute.  Only results which have
       attributes matching the given values will be returned.  (Attribute values must be
       integers).

       This may be called multiple times with different attributes to select on multiple
       attributes.

       If 'exclude' is set, excludes results that match the filter.

       Returns $sph.

   SetFilterString
           $sph->SetFilterString($attr, $value)
           $sph->SetFilterString($attr, $value, $exclude)

       Adds new string value filter.  Only those documents where $attr column value matches the
       string value from $value will be matched (or rejected, if $exclude is true).

   SetFilterRange
           $sph->SetFilterRange($attr, $min, $max);
           $sph->SetFilterRange($attr, $min, $max, $exclude);

       Sets the results to be filtered on a range of values for the given attribute. Only those
       records where $attr column value is between $min and $max (including $min and $max) will
       be returned.

       If 'exclude' is set, excludes results that fall within the given range.

       Returns $sph.

   SetFilterFloatRange
           $sph->SetFilterFloatRange($attr, $min, $max, $exclude);

       Same as SetFilterRange, but allows floating point values.

       Returns $sph.

   SetGeoAnchor
           $sph->SetGeoAnchor($attrlat, $attrlong, $lat, $long);

       Setup anchor point for using geosphere distance calculations in filters and sorting.
       Distance will be computed with respect to this point

       $attrlat is the name of latitude attribute
       $attrlong is the name of longitude attribute
       $lat is anchor point latitude, in radians
       $long is anchor point longitude, in radians

       Returns $sph.

   SetGroupBy
           $sph->SetGroupBy($attr, $func);
           $sph->SetGroupBy($attr, $func, $groupsort);

       Sets attribute and function of results grouping.

       In grouping mode, all matches are assigned to different groups based on grouping function
       value. Each group keeps track of the total match count, and the best match (in this group)
       according to current sorting function. The final result set contains one best match per
       group, with grouping function value and matches count attached.

       $attr is any valid attribute.  Use ResetGroupBy to disable grouping.

       $func is one of:

       •   SPH_GROUPBY_DAY

           Group by day (assumes timestamp type attribute of form YYYYMMDD)

       •   SPH_GROUPBY_WEEK

           Group by week (assumes timestamp type attribute of form YYYYNNN)

       •   SPH_GROUPBY_MONTH

           Group by month (assumes timestamp type attribute of form YYYYMM)

       •   SPH_GROUPBY_YEAR

           Group by year (assumes timestamp type attribute of form YYYY)

       •   SPH_GROUPBY_ATTR

           Group by attribute value

       •   SPH_GROUPBY_ATTRPAIR

           Group by two attributes, being the given attribute and the attribute that immediately
           follows it in the sequence of indexed attributes.  The specified attribute may
           therefore not be the last of the indexed attributes.

       Groups in the set of results can be sorted by any SQL-like sorting clause, including both
       document attributes and the following special internal Sphinx attributes:

       @id - document ID;
       @weight, @rank, @relevance -  match weight;
       @group - group by function value;
       @count - number of matches in group.

       The default mode is to sort by groupby value in descending order, ie. by "@group desc".

       In the results set, "total_found" contains the total amount of matching groups over the
       whole index.

       WARNING: grouping is done in fixed memory and thus its results are only approximate; so
       there might be more groups reported in total_found than actually present. @count might
       also be underestimated.

       For example, if sorting by relevance and grouping by a "published" attribute with
       SPH_GROUPBY_DAY function, then the result set will contain only the most relevant match
       for each day when there were any matches published, with day number and per-day match
       count attached, and sorted by day number in descending order (ie. recent days first).

   SetGroupDistinct
           $sph->SetGroupDistinct($attr);

       Set count-distinct attribute for group-by queries

   SetRetries
           $sph->SetRetries($count, $delay);

       Set distributed retries count and delay

   SetOverride
        ** DEPRECATED **

           $sph->SetOverride($attrname, $attrtype, $values);

        Set attribute values override. There can be only one override per attribute.
        $values must be a hash that maps document IDs to attribute values

   SetSelect
           $sph->SetSelect($select)

       Set select list (attributes or expressions).  SQL-like syntax.

   SetQueryFlag
           $sph->SetQueryFlag($flag_name, $flag_value);

   SetOuterSelect
         $sph->SetOuterSelect($orderby, $offset, $limit)

   ResetFilters
           $sph->ResetFilters;

       Clear all filters.

   ResetGroupBy
           $sph->ResetGroupBy;

       Clear all group-by settings (for multi-queries)

   ResetOverrides
       Clear all attribute value overrides (for multi-queries)

   ResetQueryFlag
       Clear all query flags.

   ResetOuterSelect
       Clear all outer select settings.

   Query
           $results = $sph->Query($query, $index);

       Connect to searchd server and run given search query.

       query is query string
       index is index name to query, default is "*" which means to query all indexes.  Use a
       space or comma separated list to search multiple indexes.

       Returns undef on failure

       Returns hash which has the following keys on success:

       matches
           Array containing hashes with found documents ( "doc", "weight", "group", "stamp" )

       total
           Total amount of matches retrieved (upto SPH_MAX_MATCHES, see sphinx.h)

       total_found
           Total amount of matching documents in index

       time
           Search time

       words
           Hash which maps query terms (stemmed!) to ( "docs", "hits" ) hash

       Returns the results array on success, undef on error.

   AddQuery
          $sph->AddQuery($query, $index);

       Add a query to a batch request.

       Batch queries enable searchd to perform internal optimizations, if possible; and reduce
       network connection overheads in all cases.

       For instance, running exactly the same query with different groupby settings will enable
       searched to perform expensive full-text search and ranking operation only once, but
       compute multiple groupby results from its output.

       Parameters are exactly the same as in Query() call.

       Returns corresponding index to the results array returned by RunQueries() call.

   RunQueries
           $sph->RunQueries

       Run batch of queries, as added by AddQuery.

       Returns undef on network IO failure.

       Returns an array of result sets on success.

       Each result set in the returned array is a hash which contains the same keys as the hash
       returned by Query, plus:

       •   error

           Errors, if any, for this query.

       •   warning

           Any warnings associated with the query.

   SphinxQL
         my $results = $sph->SphinxQL($sphinxql_query);

       This is an alternative implementation of the SphinxQL API to the DBI option. Frankly, it
       was an experiment, and the DBI driver proved to have much better performance. Whilst this
       may be useful to some, in general if you are considering using this method then you should
       probably look at connecting directly via DBI instead.

       Results are return in a hash containing an array of 'columns' and 'rows' and possibly a
       warning count. If a server-side error occurs, the hash contains the 'error' field. If a
       communication error occurs, the return value will be undefined. In either error case,
       GetLastError will return the error.

   BuildExcerpts
           $excerpts = $sph->BuildExcerpts($docs, $index, $words, $opts)

       Generate document excerpts for the specified documents.

       docs
           An array reference of strings which represent the document contents

       index
           A string specifiying the index whose settings will be used for stemming, lexing and
           case folding

       words
           A string which contains the words to highlight

       opts
           A hash which contains additional optional highlighting parameters:

           before_match - a string to insert before a set of matching words, default is "<b>"
           after_match - a string to insert after a set of matching words, default is "<b>"
           chunk_separator - a string to insert between excerpts chunks, default is " ... "
           limit - max excerpt size in symbols (codepoints), default is 256
           limit_passages - Limits the maximum number of passages that can be included into the
           snippet. Integer, default is 0 (no limit).
           limit_words - Limits the maximum number of keywords that can be included into the
           snippet. Integer, default is 0 (no limit).
           around - how many words to highlight around each match, default is 5
           exact_phrase - whether to highlight exact phrase matches only, default is false
           single_passage - whether to extract single best passage only, default is false
           use_boundaries
           weight_order - Whether to sort the extracted passages in order of relevance
           (decreasing weight), or in order of appearance in the document (increasing position).
           Boolean, default is false.
           query_mode - Whether to handle $words as a query in extended syntax, or as a bag of
           words (default behavior). For instance, in query mode ("one two" | "three four") will
           only highlight and include those occurrences "one two" or "three four" when the two
           words from each pair are adjacent to each other. In default mode, any single
           occurrence of "one", "two", "three", or "four" would be highlighted. Boolean, default
           is false.
           force_all_words - Ignores the snippet length limit until it includes all the keywords.
           Boolean, default is false.
           start_passage_id - Specifies the starting value of %PASSAGE_ID% macro (that gets
           detected and expanded in before_match, after_match strings). Integer, default is 1.
           load_files - Whether to handle $docs as data to extract snippets from (default
           behavior), or to treat it as file names, and load data from specified files on the
           server side. Boolean, default is false.
           html_strip_mode - HTML stripping mode setting. Defaults to "index", which means that
           index settings will be used. The other values are "none" and "strip", that forcibly
           skip or apply stripping irregardless of index settings; and "retain", that retains
           HTML markup and protects it from highlighting. The "retain" mode can only be used when
           highlighting full documents and thus requires that no snippet size limits are set.
           String, allowed values are "none", "strip", "index", and "retain".
           allow_empty - Allows empty string to be returned as highlighting result when a snippet
           could not be generated (no keywords match, or no passages fit the limit). By default,
           the beginning of original text would be returned instead of an empty string. Boolean,
           default is false.
           passage_boundary
           emit_zones
           load_files_scattered

       Returns undef on failure.

       Returns an array ref of string excerpts on success.

   BuildKeywords
           $results = $sph->BuildKeywords($query, $index, $hits)

       Generate keyword list for a given query Returns undef on failure, Returns an array of
       hashes, where each hash describes a word in the query with the following keys:

       •   tokenized

           Tokenised term from query

       •   normalized

           Normalised term from query

       •   docs

           Number of docs in which word was found (if $hits is true)

       •   hits

           Number of occurrences of word (if $hits is true)

   EscapeString
           $escaped = $sph->EscapeString('abcde!@#$%')

       Inserts backslash before all non-word characters in the given string.

   UpdateAttributes
           $sph->UpdateAttributes($index, \@attrs, \%values);
           $sph->UpdateAttributes($index, \@attrs, \%values, $mva);
           $sph->UpdateAttributes($index, \@attrs, \%values, $mva, $ignorenonexistent);

       Update specified attributes on specified documents

       index
           Name of the index to be updated

       attrs
           Array of attribute name strings

       values
           A hash with key as document id, value as an array of new attribute values

       mva If set, indicates that there is update of MVA attributes

       ignorenonexistent
           If set, the update will silently ignore any warnings about trying to update a column
           which is not exists in current index schema.

       Returns number of actually updated documents (0 or more) on success

       Returns undef on failure

       Usage example:

        $sph->UpdateAttributes("test1", [ qw/group_id/ ], { 1 => [ 456] }) );

   Open
           $sph->Open()

       Opens a persistent connection for subsequent queries.

       To reduce the network connection overhead of making Sphinx queries, you can call
       $sph->Open(), then run any number of queries, and call $sph->Close() when finished.

       Returns 1 on success, 0 on failure.

   Close
           $sph->Close()

       Closes a persistent connection.

       Returns 1 on success, 0 on failure.

   Status
           $status = $sph->Status()
           $status = $sph->Status($session)

       Queries searchd status, and returns a hash of status variable name and value pairs.

       Returns undef on failure.

   FlushAttributes

NOTES

       There is (or was) a bundled Sphinx.pm in the contrib area of the Sphinx source
       distribution, which was used as the starting point of Sphinx::Search.  Maintenance of that
       version appears to have lapsed at sphinx-0.9.7, so many of the newer API calls are not
       available there.  Sphinx::Search is mostly compatible with the old Sphinx.pm except:

       On failure, Sphinx::Search returns undef rather than 0 or -1.
       Sphinx::Search 'Set' functions are cascadable, e.g. you can do Sphinx::Search->new
       ->SetMatchMode(SPH_MATCH_ALL) ->SetSortMode(SPH_SORT_RELEVANCE) ->Query("search terms")

       Sphinx::Search also provides documentation and unit tests, which were the main motivations
       for branching from the earlier work.

AUTHOR

       Jon Schutz

       <http://notes.jschutz.net>

BUGS

       Please report any bugs or feature requests to "bug-sphinx-search at rt.cpan.org", or
       through the web interface at
       <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Sphinx-Search>.  I will be notified, and
       then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

       You can find documentation for this module with the perldoc command.

           perldoc Sphinx::Search

       You can also look for information at:

       •   AnnoCPAN: Annotated CPAN documentation

           <http://annocpan.org/dist/Sphinx-Search>

       •   CPAN Ratings

           <http://cpanratings.perl.org/d/Sphinx-Search>

       •   RT: CPAN's request tracker

           <http://rt.cpan.org/NoAuth/Bugs.html?Dist=Sphinx-Search>

       •   Search CPAN

           <http://search.cpan.org/dist/Sphinx-Search>

ACKNOWLEDGEMENTS

       This module is based on Sphinx.pm (not deployed to CPAN) for Sphinx version 0.9.7-rc1, by
       Len Kranendonk, which was in turn based on the Sphinx PHP API.

       Thanks to Alexey Kholodkov for contributing a significant patch for handling persistent
       connections.

COPYRIGHT & LICENSE

       Copyright 2015 Jon Schutz, all rights reserved.

       This program is free software; you can redistribute it and/or modify it under the terms of
       the GNU General Public License.

NAME

VERSION

SYNOPSIS

DESCRIPTION

CONSTRUCTOR

METHODS

SEE ALSO

NOTES

AUTHOR

BUGS

SUPPORT

ACKNOWLEDGEMENTS

COPYRIGHT & LICENSE