bionic (3) KinoSearch1::Analysis::PolyAnalyzer.3pm.gz

Provided by: libkinosearch1-perl_1.01-4build2_amd64 bug

NAME

       KinoSearch1::Analysis::PolyAnalyzer - multiple analyzers in series

SYNOPSIS

           my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
               language  => 'es',
           );

           # or...
           my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
               analyzers => [
                   $lc_normalizer,
                   $custom_tokenizer,
                   $snowball_stemmer,
               ],
           );

DESCRIPTION

       A PolyAnalyzer is a series of Analyzers -- objects which inherit from KinoSearch1::Analysis::Analyzer --
       each of which will be called upon to "analyze" text in turn.  You can either provide the Analyzers
       yourself, or you can specify a supported language, in which case a PolyAnalyzer consisting of an
       LCNormalizer, a Tokenizer, and a Stemmer will be generated for you.

       Supported languages:

           en => English,
           da => Danish,
           de => German,
           es => Spanish,
           fi => Finnish,
           fr => French,
           it => Italian,
           nl => Dutch,
           no => Norwegian,
           pt => Portuguese,
           ru => Russian,
           sv => Swedish,

CONSTRUCTOR

   new()
           my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
               language   => 'en',
           );

       Construct a PolyAnalyzer object.  If the parameter "analyzers" is specified, it will override "language"
       and no attempt will be made to generate a default set of Analyzers.

       •   language - Must be an ISO code from the list of supported languages.

       •   analyzers - Must be an arrayref.  Each element in the array must inherit from
           KinoSearch1::Analysis::Analyzer.  The order of the analyzers matters.  Don't put a Stemmer before a
           Tokenizer (can't stem whole documents or paragraphs -- just individual words), or a Stopalizer after
           a Stemmer (stemmed words, e.g. "themselv", will not appear in a stoplist).  In general, the sequence
           should be: normalize, tokenize, stopalize, stem.

       Copyright 2005-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

       See KinoSearch1 version 1.01.