Provided by: liblucy-perl_0.3.3-6build1_amd64 bug

NAME

       Lucy::Analysis::StandardTokenizer - Split a string into tokens.

SYNOPSIS

           my $tokenizer = Lucy::Analysis::StandardTokenizer->new;

           # Then... once you have a tokenizer, put it into a PolyAnalyzer:
           my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
               analyzers => [ $case_folder, $tokenizer, $stemmer ], );

DESCRIPTION

       Generically, "tokenizing" is a process of breaking up a string into an array of "tokens".
       For instance, the string "three blind mice" might be tokenized into "three", "blind",
       "mice".

       Lucy::Analysis::StandardTokenizer breaks up the text at the word boundaries defined in
       Unicode Standard Annex #29. It then returns those words that start with an alphabetic or
       numeric character.

CONSTRUCTORS

   new()
           my $tokenizer = Lucy::Analysis::StandardTokenizer->new;

       Constructor.  Takes no arguments.

INHERITANCE

       Lucy::Analysis::StandardTokenizer isa Lucy::Analysis::Analyzer isa Lucy::Object::Obj.