Provided by: libsearch-queryparser-perl_0.95-2_all bug

NAME

       Search::QueryParser - parses a query string into a data structure suitable for external
       search engines

SYNOPSIS

         my $qp = new Search::QueryParser;
         my $s = '+mandatoryWord -excludedWord +field:word "exact phrase"';
         my $query = $qp->parse($s)  or die "Error in query : " . $qp->err;
         $someIndexer->search($query);

         # query with comparison operators and implicit plus (second arg is true)
         $query = $qp->parse("txt~'^foo.*' date>='01.01.2001' date<='02.02.2002'", 1);

         # boolean operators (example below is equivalent to "+a +(b c) -d")
         $query = $qp->parse("a AND (b OR c) AND NOT d");

         # subset of rows
         $query = $qp->parse("Id#123,444,555,666 AND (b OR c)");

DESCRIPTION

       This module parses a query string into a data structure to be handled by external search
       engines.  For examples of such engines, see File::Tabular and Search::Indexer.

       The query string can contain simple terms, "exact phrases", field names and comparison
       operators, '+/-' prefixes, parentheses, and boolean connectors.

       The parser can be parameterized by regular expressions for specific notions of "term",
       "field name" or "operator" ; see the new method. The parser has no support for
       lemmatization or other term transformations : these should be done externally, before
       passing the query data structure to the search engine.

       The data structure resulting from a parsed query is a tree of terms and operators, as
       described below in the parse method.  The interpretation of the structure is up to the
       external search engine that will receive the parsed query ; the present module does not
       make any assumption about what it means to be "equal" or to "contain" a term.

QUERY STRING

       The query string is decomposed into "items", where each item has an optional sign prefix,
       an optional field name and comparison operator, and a mandatory value.

   Sign prefix
       Prefix '+' means that the item is mandatory.  Prefix '-' means that the item must be
       excluded.  No prefix means that the item will be searched for, but is not mandatory.

       As far as the result set is concerned, "+a +b c" is strictly equivalent to "+a +b" : the
       search engine will return documents containing both terms 'a' and 'b', and possibly also
       term 'c'. However, if the search engine also returns relevance scores, query "+a +b c"
       might give a better score to documents containing also term 'c'.

       See also section "Boolean connectors" below, which is another way to combine items into a
       query.

   Field name and comparison operator
       Internally, each query item has a field name and comparison operator; if not written
       explicitly in the query, these take default values '' (empty field name) and ':' (colon
       operator).

       Operators have a left operand (the field name) and a right operand (the value to be
       compared with); for example, "foo:bar" means "search documents containing term 'bar' in
       field 'foo'", whereas "foo=bar" means "search documents where field 'foo' has exact value
       'bar'".

       Here is the list of admitted operators with their intended meaning :

       ":" treat value as a term to be searched within field.  This is the default operator.

       "~" or "=~"
           treat value as a regex; match field against the regex.

       "!~"
           negation of above

       "==" or "=", "<=", ">=", "!=", "<", ">"
           classical relational operators

       "#" Inclusion in the set of comma-separated integers supplied on the right-hand side.

       Operators ":", "~", "=~", "!~" and "#" admit an empty left operand (so the field name will
       be '').  Search engines will usually interpret this as "any field" or "the whole data
       record".

   Value
       A value (right operand to a comparison operator) can be

       •   just a term (as recognized by regex "rxTerm", see new method below)

       •   A quoted phrase, i.e. a collection of terms within single or double quotes.

           Quotes can be used not only for "exact phrases", but also to prevent misinterpretation
           of some values : for example "-2" would mean "value '2' with prefix '-'", in other
           words "exclude term '2'", so if you want to search for value -2, you should write "-2"
           instead. In the last example of the synopsis, quotes were used to prevent splitting of
           dates into several search terms.

       •   a subquery within parentheses.  Field names and operators distribute over parentheses,
           so for example "foo:(bar bie)" is equivalent to "foo:bar foo:bie".  Nested field names
           such as "foo:(bar:bie)" are not allowed.  Sign prefixes do not distribute : "+(foo
           bar) +bie" is not equivalent to "+foo +bar +bie".

   Boolean connectors
       Queries can contain boolean connectors 'AND', 'OR', 'NOT' (or their equivalent in some
       other languages).  This is mere syntactic sugar for the '+' and '-' prefixes : "a AND b"
       is translated into "+a +b"; "a OR b" is translated into "(a b)"; "NOT a" is translated
       into "-a".  "+a OR b" does not make sense, but it is translated into "(a b)", under the
       assumption that the user understands "OR" better than a '+' prefix.  "-a OR b" does not
       make sense either, but has no meaningful approximation, so it is rejected.

       Combinations of AND/OR clauses must be surrounded by parentheses, i.e. "(a AND b) OR c" or
       "a AND (b OR c)" are allowed, but "a AND b OR c" is not.

METHODS

       new
             new(rxTerm   => qr/.../, rxOp => qr/.../, ...)

           Creates a new query parser, initialized with (optional) regular expressions :

           rxTerm
               Regular expression for matching a term.  Of course it should not match the empty
               string.  Default value is "qr/[^\s()]+/".  A term should not be allowed to include
               parenthesis, otherwise the parser might get into trouble.

           rxField
               Regular expression for matching a field name.  Default value is "qr/\w+/" (meaning
               of "\w" according to "use locale").

           rxOp
               Regular expression for matching an operator.  Default value is
               "qr/==|<=|>=|!=|=~|!~|:|=|<|>|~/".  Note that the longest operators come first in
               the regex, because "alternatives are tried from left to right" (see "Version 8
               Regular Expressions" in perlre) : this is to avoid "a<=3" being parsed as "a <
               '=3'".

           rxOpNoField
               Regular expression for a subset of the operators which admit an empty left operand
               (no field name).  Default value is "qr/=~|!~|~|:/".  Such operators can be
               meaningful for comparisons with "any field" or with "the whole record" ; the
               precise interpretation depends on the search engine.

           rxAnd
               Regular expression for boolean connector AND.  Default value is
               "qr/AND|ET|UND|E/".

           rxOr
               Regular expression for boolean connector OR.  Default value is "qr/OR|OU|ODER|O/".

           rxNot
               Regular expression for boolean connector NOT.  Default value is
               "qr/NOT|PAS|NICHT|NON/".

           defField
               If no field is specified in the query, use defField.  The default is the empty
               string "".

       parse
             $q = $queryParser->parse($queryString, $implicitPlus);

           Returns a data structure corresponding to the parsed string.  The second argument is
           optional; if true, it adds an implicit '+' in front of each term without prefix, so
           "parse("+a b c -d", 1)" is equivalent to "parse("+a +b +c -d")".  This is often seen
           in common WWW search engines as an option "match all words".

           The return value has following structure :

             { '+' => [{field=>'f1', op=>':', value=>'v1', quote=>'q1'},
                       {field=>'f2', op=>':', value=>'v2', quote=>'q2'}, ...],
               ''  => [...],
               '-' => [...]
             }

           In other words, it is a hash ref with 3 keys '+', '' and '-', corresponding to the 3
           sign prefixes (mandatory, ordinary or excluded items). Each key holds either a ref to
           an array of items, or "undef" (no items with this prefix in the query).

           An item is a hash ref containing

           "field"
               scalar, field name (may be the empty string)

           "op"
               scalar, operator

           "quote"
               scalar, character that was used for quoting the value ('"', "'" or undef)

           "value"
               Either

               •   a scalar (simple term), or

               •   a recursive ref to another query structure. In that case, "op" is necessarily
                   '()' ; this corresponds to a subquery in parentheses.

           In case of a parsing error, "parse" returns "undef"; method err can be called to get
           an explanatory message.

       err
             $msg = $queryParser->err;

           Message describing the last parse error

       unparse
             $s = $queryParser->unparse($query);

           Returns a string representation of the $query data structure.

AUTHOR

       Laurent Dami, <laurent.dami AT etat ge ch>

COPYRIGHT AND LICENSE

       Copyright (C) 2005, 2007 by Laurent Dami.

       This library is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.