Ubuntu Manpage: MCE::Grep - Parallel grep model similar to the native grep function

NAME

       MCE::Grep - Parallel grep model similar to the native grep function

VERSION

       This document describes MCE::Grep version 1.608

SYNOPSIS

          ## Exports mce_grep, mce_grep_f, and mce_grep_s
          use MCE::Grep;

          ## Array or array_ref
          my @a = mce_grep { $_ % 5 == 0 } 1..10000;
          my @b = mce_grep { $_ % 5 == 0 } [ 1..10000 ];

          ## File_path, glob_ref, or scalar_ref
          my @c = mce_grep_f { /pattern/ } "/path/to/file";
          my @d = mce_grep_f { /pattern/ } $file_handle;
          my @e = mce_grep_f { /pattern/ } \$scalar;

          ## Sequence of numbers (begin, end [, step, format])
          my @f = mce_grep_s { %_ * 3 == 0 } 1, 10000, 5;
          my @g = mce_grep_s { %_ * 3 == 0 } [ 1, 10000, 5 ];

          my @h = mce_grep_s { %_ * 3 == 0 } {
             begin => 1, end => 10000, step => 5, format => undef
          };

DESCRIPTION

       This module provides a parallel grep implementation via Many-Core Engine.  MCE incurs a
       small overhead due to passing of data. A fast code block will run faster natively.
       However, the overhead will likely diminish as the complexity increases for the code.

          my @m1 =     grep { $_ % 5 == 0 } 1..1000000;          ## 0.065 secs
          my @m2 = mce_grep { $_ % 5 == 0 } 1..1000000;          ## 0.194 secs

       Chunking, enabled by default, greatly reduces the overhead behind the scene.  The time for
       mce_grep below also includes the time for data exchanges between the manager and worker
       processes. More parallelization will be seen when the code incurs additional CPU time.

          my @m1 =     grep { /[2357][1468][9]/ } 1..1000000;    ## 0.353 secs
          my @m2 = mce_grep { /[2357][1468][9]/ } 1..1000000;    ## 0.218 secs

       Even faster is mce_grep_s; useful when input data is a range of numbers.  Workers generate
       sequences mathematically among themselves without any interaction from the manager
       process. Two arguments are required for mce_grep_s (begin, end). Step defaults to 1 if
       begin is smaller than end, otherwise -1.

          my @m3 = mce_grep_s { /[2357][1468][9]/ } 1, 1000000;  ## 0.165 secs

       Although this document is about MCE::Grep, the MCE::Stream module can write results
       immediately without waiting for all chunks to complete. This is made possible by passing
       the reference to an array (in this case @m4 and @m5).

          use MCE::Stream default_mode => 'grep';

          my @m4; mce_stream \@m4, sub { /[2357][1468][9]/ }, 1..1000000;

             ## Completed in 0.203 secs. This is amazing considering the
             ## overhead for passing data between the manager and workers.

          my @m5; mce_stream_s \@m5, sub { /[2357][1468][9]/ }, 1, 1000000;

             ## Completed in 0.120 secs. Like with mce_grep_s, specifying a
             ## sequence specification turns out to be faster due to lesser
             ## overhead for the manager process.

       A common scenario is grepping for pattern(s) inside a massive log file.  Notice how
       parallelism increases as complexity increases for the pattern.  Testing was done against a
       300 MB file containing 250k lines.

          use MCE::Grep;

          my @m; open my $LOG, "<", "/path/to/log/file" or die "$!\n";

          @m = grep { /pattern/ } <$LOG>;                      ##  0.756 secs
          @m = grep { /foobar|[2357][1468][9]/ } <$LOG>;       ## 24.681 secs

          ## Parallelism with mce_grep. This involves the manager process
          ## due to processing a file handle.

          @m = mce_grep { /pattern/ } <$LOG>;                  ##  0.997 secs
          @m = mce_grep { /foobar|[2357][1468][9]/ } <$LOG>;   ##  7.439 secs

          ## Even faster with mce_grep_f. Workers access the file directly
          ## with zero interaction from the manager process.

          my $LOG = "/path/to/file";
          @m = mce_grep_f { /pattern/ } $LOG;                  ##  0.112 secs
          @m = mce_grep_f { /foobar|[2357][1468][9]/ } $LOG;   ##  6.840 secs

OVERRIDING DEFAULTS

       The following list 5 options which may be overridden when loading the module.

          use Sereal qw( encode_sereal decode_sereal );
          use CBOR::XS qw( encode_cbor decode_cbor );
          use JSON::XS qw( encode_json decode_json );

          use MCE::Grep
                max_workers => 4,               ## Default 'auto'
                chunk_size => 100,              ## Default 'auto'
                tmp_dir => "/path/to/app/tmp",  ## $MCE::Signal::tmp_dir
                freeze => \&encode_sereal,      ## \&Storable::freeze
                thaw => \&decode_sereal         ## \&Storable::thaw
          ;

       There is a simpler way to enable Sereal with MCE 1.5. The following will attempt to use
       Sereal if available, otherwise defaults to Storable for serialization.

          use MCE::Grep Sereal => 1;

          ## Serialization is by the Sereal module if available.
          my @m2 = mce_grep { $_ % 5 == 0 } 1..10000;

CUSTOMIZING MCE

       MCE::Grep->init ( options )
       MCE::Grep::init { options }
          The init function accepts a hash of MCE options. The gather option, if specified, is
          ignored due to being used internally by the module.

             use MCE::Grep;

             MCE::Grep::init {
                chunk_size => 1, max_workers => 4,

                user_begin => sub {
                   print "## ", MCE->wid, " started\n";
                },

                user_end => sub {
                   print "## ", MCE->wid, " completed\n";
                }
             };

             my @a = mce_grep { $_ % 5 == 0 } 1..100;

             print "\n", "@a", "\n";

             -- Output

             ## 2 started
             ## 3 started
             ## 1 started
             ## 4 started
             ## 3 completed
             ## 4 completed
             ## 1 completed
             ## 2 completed

             5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

API DOCUMENTATION

       MCE::Grep->run ( sub { code }, iterator )
       mce_grep { code } iterator
          An iterator reference can by specified for input_data. Iterators are described under
          "SYNTAX for INPUT_DATA" at MCE::Core.

             my @a = mce_grep { $_ % 3 == 0 } make_iterator(10, 30, 2);

       MCE::Grep->run ( sub { code }, list )
       mce_grep { code } list
          Input data can be defined using a list.

             my @a = mce_grep { /[2357]/ } 1..1000;
             my @b = mce_grep { /[2357]/ } [ 1..1000 ];

       MCE::Grep->run_file ( sub { code }, file )
       mce_grep_f { code } file
          The fastest of these is the /path/to/file. Workers communicate the next offset position
          among themselves without any interaction from the manager process.

             my @c = mce_grep_f { /pattern/ } "/path/to/file";
             my @d = mce_grep_f { /pattern/ } $file_handle;
             my @e = mce_grep_f { /pattern/ } \$scalar;

       MCE::Grep->run_seq ( sub { code }, $beg, $end [, $step, $fmt ] )
       mce_grep_s { code } $beg, $end [, $step, $fmt ]
          Sequence can be defined as a list, an array reference, or a hash reference.  The
          functions require both begin and end values to run. Step and format are optional. The
          format is passed to sprintf (% may be omitted below).

             my ($beg, $end, $step, $fmt) = (10, 20, 0.1, "%4.1f");

             my @f = mce_grep_s { /[1234]\.[5678]/ } $beg, $end, $step, $fmt;
             my @g = mce_grep_s { /[1234]\.[5678]/ } [ $beg, $end, $step, $fmt ];

             my @h = mce_grep_s { /[1234]\.[5678]/ } {
                begin => $beg, end => $end, step => $step, format => $fmt
             };

MANUAL SHUTDOWN

       MCE::Grep->finish
       MCE::Grep::finish
          Workers remain persistent as much as possible after running. Shutdown occurs
          automatically when the script terminates. Call finish when workers are no longer
          needed.

             use MCE::Grep;

             MCE::Grep::init {
                chunk_size => 20, max_workers => 'auto'
             };

             my @a = mce_grep { ... } 1..100;

             MCE::Grep::finish;

INDEX

MCE

AUTHOR

       Mario E. Roy, <marioeroy AT gmail DOT com>