Provided by: liblingua-stem-perl_2.30-1_all bug

NAME

       Lingua::Stem::En - Porter's stemming algorithm for 'generic' English

SYNOPSIS

           use Lingua::Stem::En;
           my $stems   = Lingua::Stem::En::stem({ -words => $word_list_reference,
                                               -locale => 'en',
                                           -exceptions => $exceptions_hash,
                                            });

DESCRIPTION

       This routine applies the Porter Stemming Algorithm to its parameters, returning the
       stemmed words.

       It is derived from the C program "stemmer.c" as found in freewais and elsewhere, which
       contains these notes:

          Purpose:    Implementation of the Porter stemming algorithm documented
                      in: Porter, M.F., "An Algorithm For Suffix Stripping,"
                      Program 14 (3), July 1980, pp. 130-137.
          Provenance: Written by B. Frakes and C. Cox, 1986.

       I have re-interpreted areas that use Frakes and Cox's "WordSize" function. My version may
       misbehave on short words starting with "y", but I can't think of any examples.

       The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which
       I've not seen).  Porter's algorithm still has rough spots (e.g current/currency, -ings
       words), which I've not attempted to cure, although I have added support for the British
       -ise suffix.

CHANGES

        1999.06.15 - Changed to '.pm' module, moved into Lingua::Stem namespace,
                     optionalized the export of the 'stem' routine
                     into the caller's namespace, added named parameters

        1999.06.24 - Switch core implementation of the Porter stemmer to
                     the one written by Jim Richardson <jimr@maths.usyd.edu.au>

        2000.08.25 - 2.11 Added stemming cache

        2000.09.14 - 2.12 Fixed *major* :( implementation error of Porter's algorithm
                     Error was entirely my fault - I completely forgot to include
                     rule sets 2,3, and 4 starting with Lingua::Stem 0.30.
                     -- Jerilyn Franz

        2003.09.28 - 2.13 Corrected documentation error pointed out by Simon Cozens.

        2005.11.20 - 2.14 Changed rule declarations to conform to Perl style convention
                     for 'private' subroutines. Changed Exporter invokation to more
                     portable 'require' vice 'use'.

        2006.02.14 - 2.15 Added ability to pass word list by 'handle' for in-place stemming.

        2009.07.27 - 2.16 Documentation Fix

        2020.06.20 - 2.30 Version renumber for module consistency.

METHODS

       stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions });
           Stems a list of passed words using the rules of US English. Returns an anonymous array
           reference to the stemmed words.

           Example:

             my @words         = ( 'wordy', 'another' );
             my $stemmed_words = Lingua::Stem::En::stem({ -words => \@words,
                                                         -locale => 'en',
                                                     -exceptions => \%exceptions,
                                     });

           If the first element of @words is a list reference, then the stemming is performed 'in
           place' on that list (modifying the passed list directly instead of copying it to a new
           array).

           This is only useful if you do not need to keep the original list. If you do need to
           keep the original list, use the normal semantic of having 'stem' return a new list
           instead - that is faster than making your own copy and using the 'in place' semantics
           since the primary difference between 'in place' and 'by value' stemming is the
           creation of a copy of the original list.  If you don't need the original list, then
           the 'in place' stemming is about 60% faster.

           Example of 'in place' stemming:

             my $words         = [ 'wordy', 'another' ];
             my $stemmed_words = Lingua::Stem::En::stem({ -words => [$words],
                                     -locale => 'en',
                                 -exceptions => \%exceptions,
                                 });

           The 'in place' mode returns a reference to the original list with the words stemmed.

       stem_caching({ -level => 0|1|2 });
           Sets the level of stem caching.

           '0' means 'no caching'. This is the default level.

           '1' means 'cache per run'. This caches stemming results during a single
               call to 'stem'.

           '2' means 'cache indefinitely'. This caches stemming results until
               either the process exits or the 'clear_stem_cache' method is called.

       clear_stem_cache;
           Clears the cache of stemmed words

NOTES

       This code is almost entirely derived from the Porter 2.1 module written by Jim Richardson.

SEE ALSO

        Lingua::Stem

AUTHOR

         Jim Richardson, University of Sydney
         jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html

         Integration in Lingua::Stem by
         Jerilyn Franz, FreeRun Technologies,
         <cpan@jerilyn.info>

COPYRIGHT

       Jim Richardson, University of Sydney Jerilyn Franz, FreeRun Technologies

       This code is freely available under the same terms as Perl.

BUGS

TODO