Provided by: liblingua-stopwords-perl_0.12-1_all bug

NAME

       Lingua::StopWords - Stop words for several languages.

SYNOPSIS

           use Lingua::StopWords qw( getStopWords );
           my $stopwords = getStopWords('en');

           my @words = qw( i am the walrus goo goo g'joob );

           # prints "walrus goo goo g'joob"
           print join ' ', grep { !$stopwords->{$_} } @words;

DESCRIPTION

       In keyword search, it is common practice to suppress a collection of "stopwords": words
       such as "the", "and", "maybe", etc. which exist in in a large number of documents and do
       not tell you anything important about any document which contains them.  This module
       provides such "stoplists" in several languages.

   Supported Languages
           |-----------------------------------------------------------|
           | Language   | ISO code | default encoding | also available |
           |-----------------------------------------------------------|
           | Danish     | da       | ISO-8859-1       | UTF-8          |
           | Dutch      | nl       | ISO-8859-1       | UTF-8          |
           | English    | en       | ISO-8859-1       | UTF-8          |
           | Finnish    | fi       | ISO-8859-1       | UTF-8          |
           | French     | fr       | ISO-8859-1       | UTF-8          |
           | German     | de       | ISO-8859-1       | UTF-8          |
           | Hungarian  | hu       | ISO-8859-2       | UTF-8          |
           | Indonesian | id       | ISO-8859-1       | UTF-8          |
           | Italian    | it       | ISO-8859-1       | UTF-8          |
           | Norwegian  | no       | ISO-8859-1       | UTF-8          |
           | Portuguese | pt       | ISO-8859-1       | UTF-8          |
           | Romanian   | ro       | ISO-8859-2       | UTF-8          |
           | Spanish    | es       | ISO-8859-1       | UTF-8          |
           | Swedish    | sv       | ISO-8859-1       | UTF-8          |
           | Russian    | ru       | KOI8-R           | UTF-8          |
           |-----------------------------------------------------------|

FUNCTIONS

   getStopWords
           my $stoplist      = getStopWords('en');
           my $utf8_stoplist = getStopWords('en', 'UTF-8');

       Retrieve a stoplist in the form of a hashref where the keys are all stopwords and the
       values are all 1.

           $stoplist = {
               and => 1,
               if  => 1,
               # ...
           };

       getStopWords() expects 1-2 arguments.  The first, which is required, is an ISO code
       representing a supported language.  If the ISO code cannot be found, getStopWords returns
       undef.

       The second argument should be 'UTF-8' if you want the stopwords encoded in UTF-8.  The
       UTF-8 flag will be turned on, so make sure you understand all the implications of that.

INSTALLATION

       To install this module type the following:

          perl Build.PL
          ./Build
          ./Build test
          ./Build install

SEE ALSO

       The stoplists supplied by this module were created as part of the Snowball project (see
       <http://snowball.tartarus.org>, Lingua::Stem::Snowball).

       Lingua::EN::StopWords provides a different stoplist for English.

SOURCE REPOSITORY

       <https://github.com/wollmers/Lingua-StopWords>

AUTHOR

       Maintained by Helmut Wollmersdorfer <helmut@wollmersdorfer.at> and Marvin Humphrey <marvin
       at rectangular dot com>.  Original author Fabien Potencier, <fabpot at cpan dot org>.

COPYRIGHT

       Copyright 2021 Helmut Wollmersdorfer Copyright 2004-2008 Fabien Potencier, Marvin Humphrey

LICENSE

       This library is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself, either Perl version 5.8.3 or, at your option, any later version of
       Perl 5 you may have available.