Ubuntu Manpage: KinoSearch1::Analysis::PolyAnalyzer

Provided by: libkinosearch1-perl_1.01-5build1_amd64

NAME

       KinoSearch1::Analysis::PolyAnalyzer - multiple analyzers in series

SYNOPSIS

           my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
               language  => 'es',
           );

           # or...
           my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
               analyzers => [
                   $lc_normalizer,
                   $custom_tokenizer,
                   $snowball_stemmer,
               ],
           );

DESCRIPTION

       A PolyAnalyzer is a series of Analyzers -- objects which inherit from
       KinoSearch1::Analysis::Analyzer -- each of which will be called upon to "analyze" text in
       turn.  You can either provide the Analyzers yourself, or you can specify a supported
       language, in which case a PolyAnalyzer consisting of an LCNormalizer, a Tokenizer, and a
       Stemmer will be generated for you.

       Supported languages:

           en => English,
           da => Danish,
           de => German,
           es => Spanish,
           fi => Finnish,
           fr => French,
           it => Italian,
           nl => Dutch,
           no => Norwegian,
           pt => Portuguese,
           ru => Russian,
           sv => Swedish,

CONSTRUCTOR

   new()
           my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
               language   => 'en',
           );

       Construct a PolyAnalyzer object.  If the parameter "analyzers" is specified, it will
       override "language" and no attempt will be made to generate a default set of Analyzers.

       •   language - Must be an ISO code from the list of supported languages.

       •   analyzers - Must be an arrayref.  Each element in the array must inherit from
           KinoSearch1::Analysis::Analyzer.  The order of the analyzers matters.  Don't put a
           Stemmer before a Tokenizer (can't stem whole documents or paragraphs -- just
           individual words), or a Stopalizer after a Stemmer (stemmed words, e.g. "themselv",
           will not appear in a stoplist).  In general, the sequence should be: normalize,
           tokenize, stopalize, stem.

COPYRIGHT

       Copyright 2005-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

       See KinoSearch1 version 1.01.