Ubuntu Manpage: Mail::Mbox::MessageParser - A fast and simple mbox folder reader

Provided by: libmail-mbox-messageparser-perl_1.5105-1_all

NAME

       Mail::Mbox::MessageParser - A fast and simple mbox folder reader

SYNOPSIS

         #!/usr/bin/perl

         use Mail::Mbox::MessageParser;

         # Compression support
         my $file_name = 'mail/saved-mail.xz';
         my $file_handle = new FileHandle($file_name);

         # Set up cache. (Not necessary if enable_cache is false.)
         Mail::Mbox::MessageParser::SETUP_CACHE(
           { 'file_name' => '/tmp/cache' } );

         my $folder_reader =
           new Mail::Mbox::MessageParser( {
             'file_name' => $file_name,
             'file_handle' => $file_handle,
             'enable_cache' => 1,
             'enable_grep' => 1,
           } );

         die $folder_reader unless ref $folder_reader;

         # Any newlines or such before the start of the first email
         my $prologue = $folder_reader->prologue;
         print $prologue;

         # This is the main loop. It's executed once for each email
         while(!$folder_reader->end_of_file())
         {
           my $email = $folder_reader->read_next_email();
           print $$email;
         }

DESCRIPTION

       This module implements a fast but simple mbox folder reader. One of three implementations (Cache, Grep,
       Perl) will be used depending on the wishes of the user and the system configuration. The first
       implementation is a cached-based one which stores email information about mailboxes on the file system.
       Subsequent accesses will be faster because no analysis of the mailbox will be needed. The second
       implementation is one based on GNU grep, and is significantly faster than the Perl version for mailboxes
       which contain very large (10MB) emails. The final implementation is a fast Perl-based one which should
       always be applicable.

       The Cache implementation is about 6 times faster than the standard Perl implementation. The Grep
       implementation is about 4 times faster than the standard Perl implementation. If you have GNU grep, it's
       best to enable both the Cache and Grep implementations. If the cache information is available, you'll get
       very fast speeds. Otherwise, you'll take about a 1/3 performance hit when the Grep version is used
       instead.

       The overriding requirement for this module is speed. If you wish more sophisticated parsing, use
       Mail::MboxParser (which is based on this module) or Mail::Box.

   METHODS AND FUNCTIONS
       SETUP_CACHE(...)
             SETUP_CACHE( { 'file_name' => <cache file name> } );

             <cache file name> - the file name of the cache

           Call  this  function  once  to  set  up  the  cache before creating any parsers. You must provide the
           location to the cache file. There is no default value.

       new(...)
             new( { 'file_name' => <mailbox file name>,
               'file_handle' => <mailbox file handle>,
               'enable_cache' => <1 or 0>,
               'enable_grep' => <1 or 0>,
               'force_processing' => <1 or 0>,
               'debug' => <1 or 0>,
             } );

             <mailbox file name> - the file name of the mailbox
             <mailbox file handle> - the already opened file handle for the mailbox
             <enable_cache> - true to attempt to use the cache implementation
             <enable_grep> - true to attempt to use the grep implementation
             <force_processing> - true to force processing of files that look invalid
             <debug> - true to print some debugging information to STDERR

           The constructor takes either a file name or a file handle,  or  both.  If  the  file  handle  is  not
           defined,  Mail::Mbox::MessageParser  will  attempt  to  open the file using the file name. You should
           always pass the file name if you have it, so that the parser can cache the mailbox information.

           This module will automatically decompress the mailbox as necessary. If a filename  is  available  but
           the  file handle is undef, the module will call bzip, bzip2, gzip, lzip, xz to decompress the file in
           memory if the filename ends with the appropriate suffix. If the  file  handle  is  defined,  it  will
           detect the type of compression and apply the correct decompression program.

           The  Cache, Grep, or Perl implementation of the parser will be loaded, whichever is most appropriate.
           For example, the first time you use caching,  there  will  be  no  cache.  In  this  case,  the  grep
           implementation  can  be  used instead. The cache will be updated in memory as the grep implementation
           parses the mailbox, and the cache will be written after the program exits. The file name is optional,
           in which case enable_cache and enable_grep must both be false.

           force_processing will cause the module to process folders that look to be binary, or whose text  data
           doesn't look like a mailbox.

           Returns a reference to a Mail::Mbox::MessageParser object on success, and a scalar desribing an error
           on  failure.  ("Not  a  mailbox", "Can't open <filename>: <system error>", "Can't execute <uncompress
           command> for file <filename>"

       reset()
           Reset the filehandle and all internal state. Note that this will not work with filehandles which  are
           streams.  If  there  is enough demand, I may add the ability to store the previously read stream data
           internally so that reset() will work correctly.

       endline()
           Returns "\n" or "\r\n", depending on the file format.

       prologue()
           Returns any newlines or other content at the start of the mailbox prior to the first email.

       end_of_file()
           Returns true if the end of the file has been encountered.

       line_number()
           Returns the line number for the start of the last email read.

       number()
           Returns the number of the last email read. (i.e. The first email will have a number of 1.)

       length()
           Returns the length of the last email read.

       offset()
           Returns the byte offset of the last email read.

       read_next_email()
           Returns a reference to a scalar holding the text of the next email in the mailbox, or  undef  at  the
           end of the file.

BUGS

       No known bugs.

       Contact david@coppit.org for bug reports and suggestions.

AUTHOR

       David Coppit <david@coppit.org>.

LICENSE

       This  code  is distributed under the GNU General Public License (GPL) Version 2.  See the file LICENSE in
       the distribution for details.

HISTORY

       This code was originally part of the grepmail  distribution.  See  http://grepmail.sf.net/  for  previous
       versions of grepmail which included early versions of this code.

NAME

SYNOPSIS

DESCRIPTION

BUGS

AUTHOR

LICENSE

HISTORY

SEE ALSO