Lucy::Analysis::Normalizer
Unicode normalization, case folding and accent stripping Normalizer is an Analyzer which normalizes tokens to one of the Unicode normalization forms.
- Provided by: liblucy-perl (Version: 0.3.3-6build1)
- Report a bug
Unicode normalization, case folding and accent stripping Normalizer is an Analyzer which normalizes tokens to one of the Unicode normalization forms.
my $normalizer = Lucy::Analysis::Normalizer->new;
my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
analyzers => [ $normalizer, $tokenizer, $stemmer ],
);
Optionally, it performs Unicode case folding and converts accented characters to their base character.
If you use highlighting, Normalizer should be run after tokenization because it might add or remove characters.
my $normalizer = Lucy::Analysis::Normalizer->new(
normalization_form => 'NFKC',
case_fold => 1,
strip_accents => 0,
);
Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa Lucy::Object::Obj.