Provided by: lam4-dev_7.1.4-3.1_amd64 bug

NAME

       MPI_Comm_spawn -  Spawn a dynamic MPI process

SYNOPSIS

       #include <mpi.h>
       int
       MPI_Comm_spawn(char* command, char** argv, int maxprocs, MPI_Info info,
                      int root, MPI_Comm comm, MPI_Comm *intercomm,
                      int *errcodes)

INPUT PARAMETERS

       command
              - Name of program to spawn (only significant at root)
       argv   - arguments to command (only significant at root)
       maxprocs
              - max number of processes to start (only significant at root)
       info   - startup hints
       root   - rank of process to perform the spawn
       comm   - parent intracommunicator

OUTPUT PARAMETERS

       intercomm
              - child intercommunicator containing spawned processes
       errcodes
              - one code per process

DESCRIPTION

       A  group  of  processes  can create another group of processes with MPI_Comm_spawn .  This
       function is a collective operation over the parent communicator.  The child  group  starts
       up  like  any MPI application.  The processes must begin by calling MPI_Init , after which
       the pre-defined communicator, MPI_COMM_WORLD ,  may  be  used.   This  world  communicator
       contains  only  the child processes.  It is distinct from the MPI_COMM_WORLD of the parent
       processes.

       MPI_Comm_spawn_multiple is used to manually specify a group of different  executables  and
       arguments to spawn.  MPI_Comm_spawn is used to specify one executable and set of arguments
       (although a LAM/MPI appschema(5) can be provided to MPI_Comm_spawn via the "lam_file" info
       key).

       Communication With Spawned Processes

       The  natural  communication  mechanism  between  two groups is the intercommunicator.  The
       second communicator argument to MPI_Comm_spawn returns an  intercommunicator  whose  local
       group  contains  the  parent processes (same as the first communicator argument) and whose
       remote  group  contains  child  processes.  The  child  processes  can  access  the   same
       intercommunicator  by  using  the  MPI_Comm_get_parent call.  The remote group size of the
       parent communicator is zero if the process was created by mpirun (1) instead of one of the
       spawn  functions.   Both  groups  can  decide  to  merge  the  intercommunicator  into  an
       intracommunicator (with the MPI_Intercomm_merge function) and take advantage of other  MPI
       collective  operations.   They  can  then  use  the merged intracommunicator to create new
       communicators and reach other processes in the MPI application.

       Resource Allocation

       LAM/MPI offers some MPI_Info keys for the placement of  spawned  applications.   Keys  are
       looked  for in the order listed below.  The first key that is found is used; any remaining
       keys are ignored.

       lam_spawn_file

       The value of this key can be the filename of an appschema(1).  This allows the  programmer
       to specify an arbitrary set of LAM CPUs or nodes to spawn MPI processes on.  In this case,
       only the appschema is used to spawn the application; command , argv , and maxprocs are all
       ignored  (even  at  the  root).   Note that even though maxprocs is ignored, errcodes must
       still be an array long enough to hold an integer error code for every process  that  tried
       to   launch,   or   be   the   MPI   constant   MPI_ERRCODES_IGNORE   .   Also  note  that
       MPI_Comm_spawn_multiple does not accept the  "lam_spawn_file"  info  key.   As  such,  the
       "lam_spawn_file"  info key to MPI_Comm_spawn is mainly intended to spawn MPMD applications
       and/or specify an arbitrary number of nodes to run on.

       Also note that this "lam_spawn_file" key is not portable to other MPI implementations;  it
       is  a  LAM/MPI-specific info key.  If specifying exact LAM nodes or CPUs is not necessary,
       users should probably use MPI_Comm_spawn_multiple to make their program more portable.

       file

       This key is a synonym for "lam_spawn_file".  Since "file" is not a LAM-specific name,  yet
       this   key   carries   a   LAM-specific  meaning,  its  use  is  deprecated  in  favor  of
       "lam_spawn_file".

       lam_spawn_sched_round_robin

       The value of this key is a string representing a LAM  CPU  or  node  (using  standard  LAM
       nomenclature  --  see  mpirun(1))  to  begin  spawning on.  The use of this key allows the
       programmer to indicate which node/CPU for LAM to start spawning on without having to write
       out a temporary app schema file.

       The  CPU  number  is  relative  to the boot schema given to lamboot(1).  Only a single LAM
       node/CPU may be specified, such as "n3" or "c1".  If a node is specified, LAM  will  spawn
       one  MPI  process  per  node.  If a CPU is specified, LAM will scedule one MPI process per
       CPU.  An error is returned if "N" or "C" is used.

       Note that LAM is not involved with run-time scheduling of the  MPI  process  --  LAM  only
       spawns  processes  on indicated nodes.  The operating system schedules these processes for
       executation just like any other process.  No attempt is made by LAM to bind  processes  to
       CPUs.  Hence, the "cX" nomenclature is just a convenicence mechanism to inidicate how many
       MPI processes should be spawned on a given node; it is not indicative of operating  system
       scheduling.

       For  "nX"  values,  the  first  MPI  process  will  be spawned on the indicated node.  The
       remaining (maxprocs - 1) MPI processes will be spawned on successive nodes.  Specifically,
       if X is the starting node number, process i will be launched on "nK", where K = ((X + i) %
       total_nodes).  LAM will modulus the node number with the total  number  of  nodes  in  the
       current  LAM  universe  to prevent errors, thereby creating a "wraparound" effect.  Hence,
       this mechanism can be used for round-robin scheduling, regardless of how many nodes are in
       the LAM universe.

       For  "cX" values, the algorithm is essentially the same, except that LAM will resolve "cX"
       to a specific node before spawning, and successive processes are spawned on the node where
       "cK" resides, where K = ((X + i) % total_cpus).

       For  example,  if  there  are  8 nodes and 16 CPUs in the current LAM universe (2 CPUs per
       node), a "lam_spawn_sched_round_robin" key is given with the value of "c14", and  maxprocs
       is 4, LAM will spawn MPI

PROCESSES ON

       CPU  Node  MPI_COMM_WORLD rank
       ---  ----  -------------------
       c14  n7    0
       c15  n7    1
       c0   n0    2
       c1   n0    3

       lam_no_root_node_schedule

       This  key is used to designate that the spawned processes must not be spawned or scheduled
       on the "root node" (the node doing the spawn). There is no specific value associated  with
       this key, but it should be given some non-null/non-empty dummy value.

       It  is a node-specific key and not a CPU-specific one. Hence if the root node has multiple
       CPUs, none of the CPUs on this root node will take part in the scheduling of  the  spawned
       processes.

       No keys given

       If none of the info keys listed above are used, the value of MPI_INFO_NULL should be given
       for info (all other keys are ignored, anyway - there is no harm in providing other  keys).
       In  this case, LAM schedules the given number of processes onto LAM nodes by starting with
       CPU 0 (or the lowest numbered CPU), and continuing through higher CPU numbers, placing one
       process  on  each  CPU.  If the process count is greater than the CPU count, the procedure
       repeats.

       Predefined Attributes

       The pre-defined attribute on  MPI_COMM_WORLD  ,  MPI_UNIVERSE_SIZE  ,  can  be  useful  in
       determining   how   many   CPUs   are   currently  unused.   For  example,  the  value  in
       MPI_UNIVERSE_SIZE is the number of CPUs  that  LAM  was  booted  with  (see  MPI_Init(1)).
       Subtracting  the  size of MPI_COMM_WORLD from this value returns the number of CPUs in the
       current LAM universe that the current application is not using (and are  therefore  likely
       not being used).

       Process Terminiation

       Note  that  the  process[es]  spawned  by  MPI_COMM_SPAWN  (and  MPI_COMM_SPAWN_MULTIPLE )
       effectively become orphans.  That is, the spawnning MPI application does not wait for  the
       spawned  application  to finish.  Hence, there is no guarantee the spawned application has
       finished when the spawning completes.  Similarly, killing the  spawning  application  will
       also have no effect on the spawned application.

       User  applications  can effect this kind of behavior with MPI_BARRIER between the spawning
       and spawned processed before MPI_FINALIZE .

       Note that lamclean will kill *all* MPI processes.

       Process Count

       The maxprocs parameter to MPI_Comm_spawn specifies the exact number  of  processes  to  be
       started.   If  it is not possible to start the desired number of processes, MPI_Comm_spawn
       will return an error code.  Note that even though maxprocs is only relevant on  the  root,
       all  ranks  must  have  an  errcodes array long enough to handle an integer error code for
       every process that tries to launch, or  give  MPI  constant  MPI_ERRCODES_IGNORE  for  the
       errcodes  argument.   While  this  appears  to  be  a  contradiction,  it is per the MPI-2
       standard.  :-\

       Frequently, an application wishes to chooses a process count so as to fill all  processors
       available  to  a job.  MPI indicates the maximum number of processes recommended for a job
       in the pre-defined attribute, MPI_UNIVERSE_SIZE , which is cached on MPI_COMM_WORLD .

       The typical usage is to subtract  the  value  of  MPI_UNIVERSE_SIZE  from  the  number  of
       processes  currently  in  the job and spawn the difference.  LAM sets MPI_UNIVERSE_SIZE to
       the number of CPUs in the user's LAM session (as defined in the boot schema [bhost(5)] via
       lamboot (1)).

       See MPI_Init(3) for other pre-defined attributes that are helpful when spawning.

       Locating an Executable Program

       The executable program file must be located on the node(s) where the process(es) will run.
       On any node, the directories  specified  by  the  user's  PATH  environment  variable  are
       searched to find the program.

       All MPI runtime options selected by mpirun (1) in the initial application launch remain in
       effect for all child processes created by the spawn functions.

       Command-line Arguments

       The argv parameter to MPI_Comm_spawn should not contain the program name since it is given
       in  the  first  parameter.   The command line that is passed to the newly launched program
       will be the program name followed by the strings in argv .

USAGE WITH IMPI EXTENSIONS

       The IMPI standard only supports MPI-1 functions.  Hence, this function  is  currently  not
       designed to operate within an IMPI job.

ERRORS

       If  an  error occurs in an MPI function, the current MPI error handler is called to handle
       it.  By default, this error handler aborts the MPI job.  The error handler may be  changed
       with  MPI_Errhandler_set  ;  the predefined error handler MPI_ERRORS_RETURN may be used to
       cause error values to be returned (in C and Fortran; this error handler is less useful  in
       with  the  C++  MPI  bindings.   The predefined error handler MPI::ERRORS_THROW_EXCEPTIONS
       should be used in C++ if the error value needs to be recovered).  Note that MPI  does  not
       guarantee that an MPI program can continue past an error.

       All  MPI  routines  (except MPI_Wtime and MPI_Wtick ) return an error value; C routines as
       the value of the function and Fortran routines in the last argument.  The C++ bindings for
       MPI  do  not  return  error  values;  instead,  error  values are communicated by throwing
       exceptions of type MPI::Exception (but not by default).  Exceptions are only thrown if the
       error value is not MPI::SUCCESS .

       Note that if the MPI::ERRORS_RETURN handler is set in C++, while MPI functions will return
       upon an error, there will be no way to recover what the actual error value was.
       MPI_SUCCESS
              - No error; MPI routine completed successfully.
       MPI_ERR_COMM
              - Invalid communicator.  A common error is to use a null  communicator  in  a  call
              (not even allowed in MPI_Comm_rank ).
       MPI_ERR_SPAWN
              -  Spawn  error;  one or more of the applications attempting to be launched failed.
              Check the returned error code array.
       MPI_ERR_ARG
              - Invalid argument.  Some argument is invalid and is not identified by  a  specific
              error class.  This is typically a NULL pointer or other such error.
       MPI_ERR_ROOT
              -  Invalid  root.  The root must be specified as a rank in the communicator.  Ranks
              must be between zero and the size of the communicator minus one.
       MPI_ERR_OTHER
              - Other error; use MPI_Error_string to get more information about this error code.
       MPI_ERR_INTERN
              - An internal error has been detected.  This is fatal.  Please send a bug report to
              the LAM mailing list (see http://www.lam-mpi.org/contact.php ).
       MPI_ERR_NO_MEM
              -  This error class is associated with an error code that indicates that free space
              is exhausted.

SEE ALSO

       appschema(5),  bhost(5),   lamboot(1),   MPI_Comm_get_parent(3),   MPI_Intercomm_merge(3),
       MPI_Comm_spawn_multiple(3),   MPI_Info_create(3),   MPI_Info_set(3),   MPI_Info_delete(3),
       MPI_Info_free(3), MPI_Init(3), mpirun(1)

MORE INFORMATION

       For more information, please see the official MPI Forum web site, which contains the  text
       of both the MPI-1 and MPI-2 standards.  These documents contain detailed information about
       each MPI function (most of which is not duplicated in these man pages).

       http://www.mpi-forum.org/

ACKNOWLEDGEMENTS

       The LAM Team would like the thank the MPICH Team for the handy  program  to  generate  man
       pages   ("doctext"  from  ftp://ftp.mcs.anl.gov/pub/sowing/sowing.tar.gz  ),  the  initial
       formatting, and some initial text for most of the MPI-1 man pages.

LOCATION

       spawn.c