Provided by: liblucy-perl_0.3.3-6build1_amd64 

NAME
Lucy::Index::Similarity - Judge how well a document matches a query.
SYNOPSIS
package MySimilarity;
sub length_norm { return 1.0 } # disable length normalization
package MyFullTextType;
use base qw( Lucy::Plan::FullTextType );
sub make_similarity { MySimilarity->new }
DESCRIPTION
After determining whether a document matches a given query, a score must be calculated which indicates
how well the document matches the query. The Similarity class is used to judge how "similar" the query
and the document are to each other; the closer the resemblance, they higher the document scores.
The default implementation uses Lucene's modified cosine similarity measure. Subclasses might tweak the
existing algorithms, or might be used in conjunction with custom Query subclasses to implement arbitrary
scoring schemes.
Most of the methods operate on single fields, but some are used to combine scores from multiple fields.
CONSTRUCTORS
new()
my $sim = Lucy::Index::Similarity->new;
Constructor. Takes no arguments.
METHODS
length_norm(num_tokens)
Dampen the scores of long documents.
After a field is broken up into terms at index-time, each term must be assigned a weight. One of the
factors in calculating this weight is the number of tokens that the original field was broken into.
Typically, we assume that the more tokens in a field, the less important any one of them is -- so that,
e.g. 5 mentions of "Kafka" in a short article are given more heft than 5 mentions of "Kafka" in an entire
book. The default implementation of length_norm expresses this using an inverted square root.
However, the inverted square root has a tendency to reward very short fields highly, which isn't always
appropriate for fields you expect to have a lot of tokens on average.
INHERITANCE
Lucy::Index::Similarity isa Lucy::Object::Obj.
perl v5.22.1 2015-12-18 Lucy::Index::Similarity(3pm)