Provided by: lam-runtime_7.1.2-2build1_amd64 bug


       LAM SSI collectives - overview of LAM's MPI collective SSI modules


       The  "kind"  for  collectives  SSI  modules  is  "coll".   Specifically, the string "coll"
       (without the quotes) is the prefix that should be used with the mpirun command  line  with
       the -ssi switch.  For example:

       mpirun -ssi coll_base_crossover 4 C my_mpi_program

       LAM currently has three coll modules:

           A  full  implementation  of MPI collectives on intracommunicators.  The algorithms are
           the same as were in  the  LAM  6.5  series.   Collectives  on  intercommunicators  are
           undefined, and will result in run-time errors.

           Collective  functions  for  IMPI communicators.  These are mostly un-implemented; only
           the basics exist: MPI_BARRIER and MPI_REDUCE.

           Shared memory collectives.

       smp SMP-aware collectives (based on the  MagPIe  algorithms).   The  following  algorithms
           provide   SMP-aware   performance  on  multiprocessors:  MPI_ALLREDUCE,  MPI_ALLTOALL,
           MPI_SCATTER,   and   MPI_SCATTERV.    Note  that  the  reduction  algorithms  must  be
           specifically enabled by marking the operations as  associative  before  they  will  be
           used.  All other MPI collectives will fall back to their lam_basic equivalents.

       More collective modules are likely to be implemented in the future.


       In  the discussion below, the parameters are discussed in terms of kind and value.  Unlike
       other SSI module kinds, since coll modules are selected on a per-communicator  basis,  the
       kind and value may be specified as attributes to a parent communicator.

       Need to write much more here.

   Selecting a coll module
       coll  modules  are  selected  on  a  per-communicator  basis.   They are selected when the
       communicator is created,  and  remain  the  active  coll  module  for  the  life  of  that
       communicator.   For  example, different coll modules may be assigned to MPI_COMM_WORLD and
       MPI_COMM_SELF.  In most cases LAM/MPI will select the best coll module automatically.  For
       example,  when  a communicator spans multiple nodes and at least one node has multiple MPI
       processes, the smp module will automatically be selected.

       However, the LAM_MPI_SSI_COLL keyval can be used to set an  attribute  on  a  communicator
       that  is  used  to  create a new communicator.  The attribute should have the value of the
       string name of the coll module to use.  If that module cannot be used,  an  MPI  exception
       will  occur.   This  attribute  is  only  examined  on  the parent communicator when a new
       communicator is created.

   coll SSI Parameters
       The coll modules accept several parameters:

           Because of specific wording in the MPI standard, LAM/MPI can  effectively  not  assume
           that  any  reduction  operator  is  associative  (at  least,  not  without  additional
           overhead).  Hence, LAM/MPI relies on the user to indicate that certain operations  are
           associative.   If  the  user sets the coll_associative SSI parameter to 1, LAM/MPI may
           assume that the reduction operator is assocative, and may  be  able  to  optimize  the
           overall  reduction  operation.   If it is 0 or undefined, LAM/MPI will assume that the
           reduction operation is not  associative,  and  will  use  strict  linear  ordering  of
           reduction  operations  (regardless of data locality).  This attribute is checked every
           time a reduction operator is invoked.  The User's Guide contains more  information  on
           this topic.

           This  parameter determines the maximum number of processes in a communicator that will
           use linear algorithms.  This SSI parameter is only checked during MPI_INIT.

           During reduction operations, it  makes  sense  to  use  the  number  of  bytes  to  be
           transferred  rather  than the number of processes as a metric whether to use linear or
           logrithmic algorithms.  This parameter indicates the maxmimum number of  bytes  to  be
           transferred by each process by a linear algorithm.  This SSI parameter is only checked
           during MPI_INIT.

   Notes on the smp coll Module
       The smp coll module is based on the algorithms from the MagPIe project.   It  is  not  yet
       complete;  there are still more algorithms that can be optmized for SMP-aware execution --
       by the time that LAM/MPI  was  frozen  in  preparation  for  release,  only  some  of  the
       algorithms  had  been completed.  It is expected that future versions of LAM/MPI will have
       more SMP-optimized algorithms.

       The User's Guide contains much more detail about  the  smp  module.   In  particular,  the
       coll_associative  SSI  parameter  must  be  1 for the SMP-aware reduction algorithms to be
       used.  If it is 0 or undefined, the corresponding lam_basic algorithms will be used.   The
       coll_associative attribute is checked at every invocation of the reduction algorithms.


       lamssi(7), mpirun(1), LAM User's Guide