Ubuntu Manpage: PDL::ParallelCPU - Parallel processor multi-threading support in PDL

NAME

       PDL::ParallelCPU - Parallel processor multi-threading support in PDL

DESCRIPTION

       PDL has support for splitting up numerical processing between multiple parallel processor
       threads (or pthreads) using the set_autopthread_targ and set_autopthread_size functions.
       This can improve processing performance (by greater than 2-4X in most cases) by taking
       advantage of multi-core and/or multi-processor machines.

       As of 2.059, "online_cpus" in PDL::Core is used to set the number of threads used if
       "PDL_AUTOPTHREAD_TARG" is not set.

SYNOPSIS

         use PDL;

         # Set target of 4 parallel pthreads to create, with a lower limit of
         #  5Meg elements for splitting processing into parallel pthreads.
         set_autopthread_targ(4);
         set_autopthread_size(5);

         $x = zeroes(5000,5000); # Create 25Meg element array

         $y = $x + 5; # Processing will be split up into multiple pthreads

         # Get the actual number of pthreads for the last
         #  processing operation.
         $actualPthreads = get_autopthread_actual();

         # Or compare these to see CPU usage (first one only 1 pthread, second one 10)
         # in the PDL shell:
         $x = ones(10,1000,10000); set_autopthread_targ(1); $y = sin($x)*cos($x); p get_autopthread_actual;
         $x = ones(10,1000,10000); set_autopthread_targ(10); $y = sin($x)*cos($x); p get_autopthread_actual;

Terminology

       To reduce the confusion that existed in PDL before 2.075, this document uses pthreading to
       refer to processor multi-threading, which is the use of multiple processor threads to
       split up numerical processing into parallel operations.

Functions that control PDL pthreads

       This is a brief listing and description of the PDL pthreading functions, see the PDL::Core
       docs for detailed information.

       set_autopthread_targ
            Set the target number of processor-threads (pthreads) for multi-threaded processing.
            Setting auto_pthread_targ to 0 means that no pthreading will occur.

            See PDL::Core for details.

       set_autopthread_size
            Set the minimum size (in Meg-elements or 2**20 elements) of the largest PDL involved
            in a function where auto-pthreading will be performed. For small PDLs, it probably
            isn't worth starting multiple pthreads, so this function is used to define a minimum
            threshold where auto-pthreading won't be attempted.

            See PDL::Core for details.

       get_autopthread_actual
            Get the actual number of pthreads executed for the last pdl processing function.

            See PDL::get_autopthread_actual for details.

Global Control of PDL pthreading using Environment Variables

       PDL pthreading can be globally turned on, without modifying existing code by setting
       environment variables PDL_AUTOPTHREAD_TARG and PDL_AUTOPTHREAD_SIZE before running a PDL
       script.  These environment variables are checked when PDL starts up and calls to
       set_autopthread_targ and set_autopthread_size functions made with the environment
       variable's values.

       For example, if the environment var PDL_AUTOPTHREAD_TARG is set to 3, and
       PDL_AUTOPTHREAD_SIZE is set to 10, then any pdl script will run as if the following lines
       were at the top of the file:

        set_autopthread_targ(3);
        set_autopthread_size(10);

How It Works

       The auto-pthreading process works by analyzing broadcast array dimensions in PDL
       operations (those above the operation's "signature" dimensions) and splitting up
       processing according to those and the desired number of pthreads (i.e. the pthread target
       or pthread_targ). The offsets, increments, and dimension-sizes (in case the whole
       dimension does not divide neatly by the number of pthreads) that PDL uses to step thru the
       data in memory are modified for each pthread so each one sees a different set of data when
       performing processing.

       Example

        $x = sequence(20,4,3); # Small 3-D Array, size 20,4,3

        # Setup auto-pthreading:
        set_autopthread_targ(2); # Target of 2 pthreads
        set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded

        # This will be split up into 2 pthreads
        $c = maximum($x);

       For the above example, the maximum function has a signature of "(a(n); [o]c())", which
       means that the first dimension of $x (size 20) is a Core dimension of the maximum
       function. The other dimensions of $x (size 4,3) are broadcast dimensions (i.e. will be
       broadcasted-over in the maximum function.

       The auto-pthreading algorithm examines the broadcasted dims of size (4,3) and picks the 4
       dimension, since it is evenly divisible by the autopthread_targ of 2. The processing of
       the maximum function is then split into two pthreads on the size-4 dimension, with dim
       indexes 0,2 processed by one pthread
        and dim indexes 1,3 processed by the other pthread.

Limitations

   Must have POSIX Threads Enabled
       Auto-pthreading only works if your PDL installation was compiled with POSIX threads
       enabled. This is normally the case if you are running on Windows, Linux, MacOS X, or other
       unix variants.

   Non-Threadsafe Code
       Not all the libraries that PDL intefaces to are thread-safe, i.e. they aren't written to
       operate in a multi-threaded environment without crashing or causing side-effects. Some
       examples in the PDL core is the fft function and the pnmout functions.

       To operate properly with these types of functions, the PPCode flag NoPthread has been
       introduced to indicate a function as not being pthread-safe. See PDL::PP docs for details.

   Size of PDL Dimensions and pthread Target
       As of PDL 2.058, the broadcasted dimension sizes do not need to divide exactly by the
       pthread target, although if one does, it will be used.

       If no dimension is as large as the pthread target, the number of pthreads will be the size
       of the largest broadcasted dimension.

       In order to minimise idle CPUs on the last iteration at the end of the broadcasted
       dimension, the algorithm that picks the dimension to pthread on aims for the largest
       remainder in dividing the pthread target into the sizes of the broadcasted dimensions. For
       example, if a PDL has broadcasted dimension sizes of (9,6,2) and the auto_pthread_targ is
       4, the algorithm will pick the 1-th (size 6), as that will leave a remainder of 2 (leaving
       2 idle at the end) in preference to one with size 9, which would leave 3 idle.

   Speed improvement might be less than you expect.
       If you have an 8-core machine and call auto_pthread_targ with 8 to generate 8 parallel
       pthreads, you probably won't get a 8X improvement in speed, due to memory bandwidth
       issues. Even though you have 8 separate CPUs crunching away on data, you will have (for
       most common machine architectures) common RAM that now becomes your bottleneck. For simple
       calculations (e.g simple additions) you can run into a performance limit at about 4
       pthreads. For more CPU-bound calculations the limit will be higher.

COPYRIGHT

       Copyright 2011 John Cerney. You can distribute and/or modify this document under the same
       terms as the current Perl license.

       See: http://dev.perl.org/licenses/