Provided by: libcatmandu-mediawiki-perl_0.021-2_all bug

NAME

       Catmandu::Importer::MediaWiki - Catmandu importer that imports pages from mediawiki

DESCRIPTION

       This importer uses the query api from mediawiki to get a list of pages that match certain
       requirements.

       It retrieves a list of pages and their content by using the generators from mediawiki:

       <http://www.mediawiki.org/wiki/API:Query#Generators>

       The default generator is 'allpages'.

       The list could also be retrieved with the module 'list':

       <http://www.mediawiki.org/wiki/API:Lists>

       But this module 'list' is very limited. It retrieves a list of pages with a limited set of
       attributes (pageid, ns and title).

       The module 'properties' on the other hand lets you add properties:

       <http://www.mediawiki.org/wiki/API:Properties>

       But the selecting parameters (titles, pageids and revids) are too specific to execute a
       query in one call. One should execute a list query, and then use the pageids to feed them
       to the submodule 'properties'.

       To execute a query, and add properties to the pages in one call can be accomplished by use
       of generators.

       <http://www.mediawiki.org/wiki/API:Query#Generators>

       These parameters are set automatically, and cannot be overwritten:

       action = "query" indexpageids = 1 generator = <generate> format = "json"

       Additional parameters can be set in the constructor argument 'args'.  Arguments for a
       generator origin from the list module with the same name, but must be prepended with 'g'.

ARGUMENTS

       generate
           type: string

           explanation:    type of generator to use. For a complete list, see
           <http://www.mediawiki.org/wiki/API:Lists>.
                           because Catmandu::Iterable already defines 'generator', this parameter
           has been renamed
                           to 'generate'.

           default: 'allpages'.

       args
           type: hash

           explanation: extra arguments. These arguments are merged with the defaults.

           default:

               {
                   prop => "revisions",
                   rvprop => "ids|flags|timestamp|user|comment|size|content",
                   gaplimit => 100,
                   gapfilterredir => "nonredirects"
               }

           which means:

               prop             add revisions to the list of page attributes
               rvprop           specific properties for the list of revisions
               gaplimit         limit for generator 'allpages' (every 'generator' has its own limit).
               gapfilterredir   filter out redirect pages

       lgname
           type: string

           explanation:    login name. Only used when both lgname and lgpassword are set.

           <https://www.mediawiki.org/wiki/API:Login>

       lgpassword
           type: string

           explanation:    login password. Only used when both lgname and lgpassword are set.

SYNOPSIS

           use Catmandu::Sane;
           use Catmandu::Importer::MediaWiki;

           binmode STDOUT,":utf8";

           my $importer = Catmandu::Importer::MediaWiki->new(
               url => "http://en.wikipedia.org/w/api.php",
               generate => "allpages",
               args => {
                   prop => "revisions",
                   rvprop => "ids|flags|timestamp|user|comment|size|content",
                   gaplimit => 100,
                   gapprefix => "plato",
                   gapfilterredir => "nonredirects"
               }
           );
           $importer->each(sub{
               my $r = shift;
               my $content = $r->{revisions}->[0]->{"*"};
               say $r->{title};
           });

AUTHORS

       Nicolas Franck "<nicolas.franck at ugent.be>"

SEE ALSO

       Catmandu, MediaWiki::API