Ubuntu Manpage: Lucy::Index::Similarity - Judge how well a document matches a query.

NAME

       Lucy::Index::Similarity - Judge how well a document matches a query.

SYNOPSIS

           package MySimilarity;

           sub length_norm { return 1.0 }    # disable length normalization

           package MyFullTextType;
           use base qw( Lucy::Plan::FullTextType );

           sub make_similarity { MySimilarity->new }

DESCRIPTION

       After determining whether a document matches a given query, a score must be calculated
       which indicates how well the document matches the query.  The Similarity class is used to
       judge how "similar" the query and the document are to each other; the closer the
       resemblance, they higher the document scores.

       The default implementation uses Lucene's modified cosine similarity measure.  Subclasses
       might tweak the existing algorithms, or might be used in conjunction with custom Query
       subclasses to implement arbitrary scoring schemes.

       Most of the methods operate on single fields, but some are used to combine scores from
       multiple fields.

CONSTRUCTORS

   new()
           my $sim = Lucy::Index::Similarity->new;

       Constructor. Takes no arguments.

METHODS

   length_norm(num_tokens)
       Dampen the scores of long documents.

       After a field is broken up into terms at index-time, each term must be assigned a weight.
       One of the factors in calculating this weight is the number of tokens that the original
       field was broken into.

       Typically, we assume that the more tokens in a field, the less important any one of them
       is -- so that, e.g. 5 mentions of "Kafka" in a short article are given more heft than 5
       mentions of "Kafka" in an entire book.  The default implementation of length_norm
       expresses this using an inverted square root.

       However, the inverted square root has a tendency to reward very short fields highly, which
       isn't always appropriate for fields you expect to have a lot of tokens on average.

INHERITANCE

       Lucy::Index::Similarity isa Lucy::Object::Obj.