Ubuntu Manpage: KinoSearch1::Analysis::Token

Provided by: libkinosearch1-perl_1.01-4build2_amd64

NAME

       KinoSearch1::Analysis::Token - unit of text

SYNOPSIS

           # private class - no public API

PRIVATE CLASS

       You can't actually instantiate a Token object at the Perl level -- however, you can affect
       individual Tokens within a TokenBatch by way of TokenBatch's (experimental) API.

DESCRIPTION

       Token is the fundamental unit used by KinoSearch1's Analyzer subclasses.  Each Token has 4
       attributes: text, start_offset, end_offset, and pos_inc (for position increment).

       The text of a token is a string.

       A Token's start_offset and end_offset locate it within a larger text, even if the Token's
       text attribute gets modified -- by stemming, for instance.  The Token for "beating" in the
       text "beating a dead horse" begins life with a start_offset of 0 and an end_offset of 7;
       after stemming, the text is "beat", but the end_offset is still 7.

       The position increment, which defaults to 1, is a an advanced tool for manipulating phrase
       matching.  Ordinarily, Tokens are assigned consecutive position numbers: 0, 1, and 2 for
       "three blind mice".  However, if you set the position increment for "blind" to, say, 1000,
       then the three tokens will end up assigned to positions 0, 1, and 1001 -- and will no
       longer produce a phrase match for the query '"three blind mice"'.

COPYRIGHT

       Copyright 2006-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

       See KinoSearch1 version 1.01.