Provided by: pdl_2.020-3_amd64 bug

NAME

       PDL::ParallelCPU - Parallel Processor MultiThreading Support in PDL (Experimental)

DESCRIPTION

       PDL has support (currently experimental) for splitting up numerical processing between multiple parallel
       processor threads (or pthreads) using the set_autopthread_targ and set_autopthread_size functions.  This
       can improve processing performance (by greater than 2-4X in most cases) by taking advantage of multi-core
       and/or multi-processor machines.

SYNOPSIS

         use PDL;

         # Set target of 4 parallel pthreads to create, with a lower limit of
         #  5Meg elements for splitting processing into parallel pthreads.
         set_autopthread_targ(4);
         set_autopthread_size(5);

         $x = zeroes(5000,5000); # Create 25Meg element array

         $y = $x + 5; # Processing will be split up into multiple pthreads

         # Get the actual number of pthreads for the last
         #  processing operation.
         $actualPthreads = get_autopthread_actual();

Terminology

       The use of the term threading can be confusing with PDL, because it can refer to PDL threading, as
       defined in the PDL::Threading docs, or to processor multi-threading.

       To reduce confusion with the existing PDL threading terminology, this document uses pthreading to refer
       to processor multi-threading, which is the use of multiple processor threads to split up numerical
       processing into parallel operations.

Functions that control PDL PThreads

       This is a brief listing and description of the PDL pthreading functions, see the PDL::Core docs for
       detailed information.

       set_autopthread_targ
            Set  the  target  number  of  processor-threads  (pthreads)  for  multi-threaded processing. Setting
            auto_pthread_targ to 0 means that no pthreading will occur.

            See PDL::Core for details.

       set_autopthread_size
            Set the minimum size (in Meg-elements or 2**20 elements) of the largest PDL involved in  a  function
            where  auto-pthreading  will be performed. For small PDLs, it probably isn't worth starting multiple
            pthreads, so this function is used to define a minimum  threshold  where  auto-pthreading  won't  be
            attempted.

            See PDL::Core for details.

       get_autopthread_actual
            Get the actual number of pthreads executed for the last pdl processing function.

            See PDL::get_autopthread_actual for details.

Global Control of PDL PThreading using Environment Variables

       PDL  PThreading  can  be  globally  turned  on,  without  modifying  existing code by setting environment
       variables PDL_AUTOPTHREAD_TARG and PDL_AUTOPTHREAD_SIZE before running a PDL script.   These  environment
       variables  are  checked  when  PDL  starts  up and calls to set_autopthread_targ and set_autopthread_size
       functions made with the environment variable's values.

       For example, if the environment var PDL_AUTOPTHREAD_TARG is set to 3, and PDL_AUTOPTHREAD_SIZE is set  to
       10, then any pdl script will run as if the following lines were at the top of the file:

        set_autopthread_targ(3);
        set_autopthread_size(10);

How It Works

       The  auto-pthreading process works by analyzing threaded array dimensions in PDL operations and splitting
       up processing based on the thread dimension sizes and desired number of pthreads (i.e. the pthread target
       or pthread_targ). The offsets and increments that PDL uses to step thru the data in memory  are  modified
       for each pthread so each one sees a different set of data when performing processing.

       Example

        $x = sequence(20,4,3); # Small 3-D Array, size 20,4,3

        # Setup auto-pthreading:
        set_autopthread_targ(2); # Target of 2 pthreads
        set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded

        # This will be split up into 2 pthreads
        $c = maximum($x);

       For  the  above  example,  the maximum function has a signature of "(a(n); [o]c())", which means that the
       first dimension of $x (size 20) is a Core dimension of the maximum function. The other dimensions  of  $x
       (size 4,3) are threaded dimensions (i.e. will be threaded-over in the maximum function.

       The  auto-pthreading  algorithm examines the threaded dims of size (4,3) and picks the 4 dimension, since
       it is evenly divisible by the autopthread_targ of 2. The processing of the maximum function is then split
       into two pthreads on the size-4 dimension, with dim indexes 0,2 processed by one pthread
        and dim indexes 1,3 processed by the other pthread.

Limitations

   Must have POSIX Threads Enabled
       Auto-PThreading only works if your PDL installation was compiled with  POSIX  threads  enabled.  This  is
       normally the case if you are running on linux, or other unix variants.

   Non-Threadsafe Code
       Not  all  the  libraries  that PDL intefaces to are thread-safe, i.e. they aren't written to operate in a
       multi-threaded environment without crashing or causing side-effects. Some examples in the PDL core is the
       fft function and the pnmout functions.

       To operate properly with these types of functions, the PPCode  flag  NoPthread  has  been  introduced  to
       indicate a function as not being pthread-safe. See PDL::PP docs for details.

   Size of PDL Dimensions and PThread Target
       Due  to  the way a PDL is split-up for operation using multiple pthreads, the size of a dimension must be
       evenly divisible by the pthread target. For example, if a PDL has threaded dimension sizes of (4,3,3) and
       the auto_pthread_targ has been set to 2, then the first threaded dimension (size 4) will be picked to  be
       split  up into two pthreads of size 2 and 2. However, if the threaded dimension sizes are (3,3,3) and the
       auto_pthread_targ is still 2, then pthreading won't occur, because no threaded dimensions  are  divisible
       by 2.

       The  algorithm  that picks the actual number of pthreads has some smarts (but could probably be improved)
       to adjust down from the auto_pthread_targ to get a number of pthreads that can evenly divide one  of  the
       threaded   dimensions.  For  example,  if  a  PDL  has  threaded  dimension  sizes  of  (9,2,2)  and  the
       auto_pthread_targ is 4, the algorithm will see that no dimension is divisible by 4, then adjust down  the
       target to 3, resulting in splitting up the first threaded dimension (size 9) into 3 pthreads.

   Speed improvement might be less than you expect.
       If  you  have  a  8  core  machine and call auto_pthread_targ with 8 to generate 8 parallel pthreads, you
       probably won't get a 8X improvement in speed, due to memory bandwidth issues.  Even  though  you  have  8
       separate  CPUs  crunching  away on data, you will have (for most common machine architectures) common RAM
       that now becomes your bottleneck. For simple calculations (e.g simple  additions)  you  can  run  into  a
       performance limit at about
        4 pthreads. For more complex calculations the limit will be higher.

COPYRIGHT

       Copyright  2011  John  Cerney. You can distribute and/or modify this document under the same terms as the
       current Perl license.

       See: http://dev.perl.org/licenses/

perl v5.30.0                                       2020-01-18                                    PARALLELCPU(1p)