Provided by: libmediawiki-dumpfile-perl_0.2.2-1_all bug

NAME

       Parse::MediaWikiDump::Revisions - Object capable of processing dump files with multiple
       revisions per article

ABOUT

       This object is used to access the metadata associated with a MediaWiki instance and
       provide an iterative interface for extracting the individual article revisions out of the
       same. To guarantee that there is only a single revision per article use the
       Parse::MediaWikiDump::Pages object.

SYNOPSIS

         use MediaWiki::DumpFile::Compat;

         $pmwd = Parse::MediaWikiDump->new;
         $revisions = $pmwd->revisions('pages-articles.xml');
         $revisions = $pmwd->revisions(\*FILEHANDLE);

         #print the title and id of each article inside the dump file
         while(defined($page = $revisions->next)) {
           print "title '", $page->title, "' id ", $page->id, "\n";
         }

METHODS

       $revisions->new
           Open the specified MediaWiki dump file. If the single argument to this method is a
           string it will be used as the path to the file to open. If the argument is a reference
           to a filehandle the contents will be read from the filehandle as specified.

       $revisions->next
           Returns an instance of the next available Parse::MediaWikiDump::page object or returns
           undef if there are no more articles left.

       $revisions->version
           Returns a plain text string of the dump file format revision number

       $revisions->sitename
           Returns a plain text string that is the name of the MediaWiki instance.

       $revisions->base
           Returns the URL to the instances main article in the form of a string.

       $revisions->generator
           Returns a string containing 'MediaWiki' and a version number of the instance that
           dumped this file.  Example: 'MediaWiki 1.14alpha'

       $revisions->case
           Returns a string describing the case sensitivity configured in the instance.

       $revisions->namespaces
           Returns a reference to an array of references. Each reference is to another array with
           the first item being the unique identifier of the namespace and the second element
           containing a string that is the name of the namespace.

       $revisions->namespaces_names
           Returns an array reference the array contains strings of all the namespaces each as an
           element.

       $revisions->current_byte
           Returns the number of bytes that has been processed so far

       $revisions->size
           Returns the total size of the dump file in bytes.

EXAMPLE

   Extract the article text of each revision of an article using a given title
         #!/usr/bin/perl

         use strict;
         use warnings;
         use MediaWiki::DumpFile::Compat;

         my $file = shift(@ARGV) or die "must specify a MediaWiki dump of the current pages";
         my $title = shift(@ARGV) or die "must specify an article title";
         my $pmwd = Parse::MediaWikiDump->new;
         my $dump = $pmwd->revisions($file);
         my $found = 0;

         binmode(STDOUT, ':utf8');
         binmode(STDERR, ':utf8');

         #this is the only currently known value but there could be more in the future
         if ($dump->case ne 'first-letter') {
           die "unable to handle any case setting besides 'first-letter'";
         }

         $title = case_fixer($title);

         while(my $revision = $dump->next) {
           if ($revision->title eq $title) {
             print STDERR "Located text for $title revision ", $revision->revision_id, "\n";
             my $text = $revision->text;
             print $$text;

             $found = 1;
           }
         }

         print STDERR "Unable to find article text for $title\n" unless $found;
         exit 1;

         #removes any case sensativity from the very first letter of the title
         #but not from the optional namespace name
         sub case_fixer {
           my $title = shift;

           #check for namespace
           if ($title =~ /^(.+?):(.+)/) {
             $title = $1 . ':' . ucfirst($2);
           } else {
             $title = ucfirst($title);
           }

           return $title;
         }

LIMITATIONS

   Version 0.4
       This class was updated to support version 0.4 dump files from a MediaWiki instance but it
       does not currently support any of the new information available in those files.