Ubuntu Manpage: Lucy::Analysis::Normalizer - Unicode normalization, case folding and accent stripping

NAME

       Lucy::Analysis::Normalizer - Unicode normalization, case folding and accent stripping

       Normalizer is an Analyzer which normalizes tokens to one of the Unicode normalization
       forms.

SYNOPSIS

           my $normalizer = Lucy::Analysis::Normalizer->new;

           my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
               analyzers => [ $normalizer, $tokenizer, $stemmer ],
           );

DESCRIPTION

       Optionally, it performs Unicode case folding and converts accented characters to their
       base character.

       If you use highlighting, Normalizer should be run after tokenization because it might add
       or remove characters.

CONSTRUCTORS

   new( [labeled params] )
           my $normalizer = Lucy::Analysis::Normalizer->new(
               normalization_form => 'NFKC',
               case_fold          => 1,
               strip_accents      => 0,
           );

       •   normalization_form - Unicode normalization form, can be one of 'NFC', 'NFKC', 'NFD',
           'NFKD'. Defaults to 'NFKC'.

       •   case_fold - Perform case folding, default is true.

       •   strip_accents - Strip accents, default is false.

INHERITANCE

       Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa Lucy::Object::Obj.