Ubuntu Manpage: Lucy::Analysis::PolyAnalyzer - Multiple Analyzers in series.

Provided by: liblucy-perl_0.3.3-6build1_amd64

NAME

       Lucy::Analysis::PolyAnalyzer - Multiple Analyzers in series.

SYNOPSIS

           my $schema = Lucy::Plan::Schema->new;
           my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
               language => 'en',
           );
           my $type = Lucy::Plan::FullTextType->new(
               analyzer => $polyanalyzer,
           );
           $schema->spec_field( name => 'title',   type => $type );
           $schema->spec_field( name => 'content', type => $type );

DESCRIPTION

       A PolyAnalyzer is a series of Analyzers, each of which will be called upon to "analyze"
       text in turn.  You can either provide the Analyzers yourself, or you can specify a
       supported language, in which case a PolyAnalyzer consisting of a CaseFolder, a
       RegexTokenizer, and a SnowballStemmer will be generated for you.

       Supported languages:

           en => English,
           da => Danish,
           de => German,
           es => Spanish,
           fi => Finnish,
           fr => French,
           hu => Hungarian,
           it => Italian,
           nl => Dutch,
           no => Norwegian,
           pt => Portuguese,
           ro => Romanian,
           ru => Russian,
           sv => Swedish,
           tr => Turkish,

CONSTRUCTORS

   new( [labeled params] )
           my $analyzer = Lucy::Analysis::PolyAnalyzer->new(
               language  => 'es',
           );

           # or...

           my $case_folder  = Lucy::Analysis::CaseFolder->new;
           my $tokenizer    = Lucy::Analysis::RegexTokenizer->new;
           my $stemmer      = Lucy::Analysis::SnowballStemmer->new( language => 'en' );
           my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
               analyzers => [ $case_folder, $whitespace_tokenizer, $stemmer, ], );

       •   language - An ISO code from the list of supported languages.

       •   analyzers - An array of Analyzers.  The order of the analyzers matters.  Don't put a
           SnowballStemmer before a RegexTokenizer (can't stem whole documents or paragraphs --
           just individual words), or a SnowballStopFilter after a SnowballStemmer (stemmed
           words, e.g. "themselv", will not appear in a stoplist).  In general, the sequence
           should be: normalize, tokenize, stopalize, stem.

METHODS

   get_analyzers()
       Getter for "analyzers" member.

INHERITANCE

       Lucy::Analysis::PolyAnalyzer isa Lucy::Analysis::Analyzer isa Lucy::Object::Obj.