oracular (7) lamssi_boot.7.gz

Provided by: lam-runtime_7.1.4-7.2_amd64 bug

NAME

       lamssi_boot - overview of LAM's boot SSI modules

DESCRIPTION

       The  "kind"  for boot SSI modules is "boot".  Specifically, the string "boot" (without the
       quotes) is the prefix that can be used as the prefix to arguments when passing  values  to
       boot modules at run time.  For example:

       lamboot -ssi boot rsh hostfile
           Specifies to use the "rsh" boot module, and lamboot across all the nodes listed in the
           file hostfile.

       LAM currently has several boot modules: bproc, globus, rsh (which  includes  ssh),  slurm,
       and tm.

ADDITIONAL INFORMATION

       The  LAM/MPI  User's  Guide contains much detail about all of the boot modules.  All users
       are strongly encouraged to read  it.   This  man  page  is  a  summary  of  the  available
       information.

SELECTING A BOOT MODULE

       Only one boot module may be selected per command execution.  Hence, the selection of which
       module occurs once when a given command initializes.  Once the module  is  chosen,  it  is
       used for the duration of the program run.

       In  most  cases,  LAM  will  automatically select the "best" module at run-time.  LAM will
       query all available modules at run time to obtain a list of priorities.  The  module  with
       the highest priority will be used.  If multiple modules return the same priority, LAM will
       select one at random.  Priorities are in the range of 0 to 100, with 0  being  the  lowest
       priority  and  100  being the highest.  At run time, each module will examine the run-time
       environment and return a priority value that is appropriate.

       For example, when running a PBS job,  the  tm  module  will  return  a  sufficiently  high
       priority value such that it will be selected and the other available modules will not.

       Most  modules  allow  run time parameters to override the priorities that they return that
       allow changing the order (and therefore ultimate selection) of the available boot modules.
       See below.

       Alternatively, a specific module may be selected by the user by specifying a value for the
       boot parameter (either by environment variable or by the -ssi command line parameter).  In
       this  case,  no other modules will be queried by LAM.  If the named module returns a valid
       priority, it will be used.  For example:

       lamboot -ssi boot rsh hostfile
           Tells LAM to only query the rsh boot module and see if it is available to run.

       If the boot module that is selected is unable to run (e.g., attempting to use the tm  boot
       module  when  not  running in a PBS job), an appropriate error message will be printed and
       execution will abort.

AVAILABLE MODULES

       As with all SSI modules, it is possible to pass parameters  at  run  time.   This  section
       discusses  the  built-in  LAM  boot  modules, as well as the run-time parameters that they
       accept.

       In the discussion below, parameters to boot modules are discussed in  terms  of  name  and
       value.   The  name  and  value  may be specified as command line arguments to the lamboot,
       lamgrow, recon, and lamwipe commands  with  the  -ssi  switch,  or  they  may  be  set  in
       environment  variables  of  the  form  LAM_MPI_SSI_name=value.   Note  that using the -ssi
       command line switch will take precendence over any previously-set environment variables.

   bproc Boot Module
       The bproc boot module uses native bproc functionality (e.g.,  the  bproc_execmove  library
       call) to launch jobs on slaves nodes from the head node.  Checks are made before launching
       to ensure that the nodes are available and are "owned"  by  the  user  and/or  the  user's
       group.   Appropriate  error messages will be displayed if the user is unable to execute on
       the target nodes.

       Hostnames should be specified using bproc  notation:  -1  indicates  the  head  node,  and
       integer  numbers  starting  with  0  represent  slave  nodes.  The string "localhost" will
       automatically be converted to "-1".

       The default behavior is to mark the bproc head node as "non-scheduledable",  meaning  that
       the expansion of "N" and "C" when used with mpirun and lamexec will exclude the bproc head
       node.  For example, "mpirun C my_mpi_program" will run copies  of  my_mpi_program  on  all
       lambooted slave nodes, but not the bproc head node.

       Note that the bproc boot module is only usable from the bproc head node.

       The bproc boot module only has one tunable parameter:

       boot_bproc_priority
           Using  the  priority  argument  can  override  LAM's  automatic  run-time  boot module
           selection algorithms.  This parameter only has effect when the tm module  is  eligible
           to be run (i.e., when running on a bproc cluster).

       See the bproc notes in the user documentation for more details.

   globus Boot Module
       The  globus  boot  module  uses the globus-job-run command to launch executables on remote
       nodes.  It is currently limited to only allowing jobs that can use the fork job manager on
       the Globus gatekeeper.  Other job managers are not yet supported.

       LAM  will  effectively  never  select  the globus boot module by default because it has an
       extremely low default priority; it must be manually selected with the boot  SSI  parameter
       or  have  its  priority raised.  Additionally, LAM must be able to find the globus-job-run
       command in your PATH.

       The boot schema requires hosts to be listed as the Globus contact string.  For example:

       "host1:port1:/O=xxx/OU=yyy/CN=aaa bbb ccc"

       Note the use of quotes because the CN includes spaces -- the entire contact name  must  be
       enclosed  in  quotes.  Additionally, since globus-job-run does not invoke the user's "dot"
       files on the remote nodes,  no  PATH  or  environment  is  setup.   Hence,  the  attribute
       lam_install_path  must  be  specified  for each contact string in the hostfile so that LAM
       knows where to find its executables on the remote nodes.  For example:

       "host1:port1:/O=xxx/OU=yyy/CN=aaa bbb ccc" lam_install_path=/home/lam

       The globus boot module only has one tunable parameter:

       boot_globus_priority
           Using the  priority  argument  can  override  LAM's  automatic  run-time  boot  module
           selection algorithms.

   rsh Boot Module
       The  rsh  boot  module  uses  rsh  or  ssh (or any other command line agent that acts like
       rsh/ssh) to launch executables on remote nodes.   It  requires  that  executables  can  be
       started  on  remote  nodes  without  being prompted for a password, and without outputting
       anything to stderr.

       The rsh boot module is always available, and unless overridden, always  assigns  itself  a
       priority of 0.

       The rsh module accepts a few run-time parameters:

       boot_rsh_agent
           Used  to  override the compiled-in default remote agent program that was selected when
           LAM is compiled.  For example, this parameter can be set  to  use  "ssh"  if  LAM  was
           compiled  to  use  "rsh"  by  default.   Previous  versions of LAM/MPI used the LAMRSH
           environment variable for this purpose.  While the LAMRSH  environment  variable  still
           works, its use is deprecated in favor of the boot_rsh_agent SSI module argument.

       boot_rsh_priority
           Using  the  priority  argument  can  override  LAM's  automatic  run-time  boot module
           selection algorithms.

       boot_rsh_username
           If the user has a different username on the remote machine, this parameter can be used
           to  pass  the -l argument to the underlying remote agent.  Note that this is a coarse-
           grained control -- this one username will be used for all remote nodes.  If more fine-
           grained  control is required, the username should be specified in the boot schema file
           on a per-host basis.

   slurm Boot Module
       The slurm boot module uses the srun command to launch the LAM daemons in a SLURM execution
       environment  (i.e.,  it  detects that it is running under SLURM and automatically sets its
       priority to 50).  It can be used  in  two  different  modes:  batch  (where  a  script  is
       submitted  to  SLURM  and it is run on the first node in the node allocation) and allocate
       (where the -A option is used to srun to obtain an interactive allocation).  The slurm boot
       module  does  not support running in a script that is launched by SLURM on all nodes in an
       allocation.

       No boot schema file is required when using the slurm boot module; LAM  will  automatically
       determine the host and CPU count from SLURM itself.

       The slurm boot module only has one tunable parameter:

       boot_slurm_priority
           Using  the  priority  argument  can  override  LAM's  automatic  run-time  boot module
           selection algorithms.  This parameter  only  has  effect  when  the  slurm  module  is
           eligible to be run (i.e., when running in a SLURM allocation).

   tm Boot Module
       The tm boot module uses the Task Management (TM) interface to launch executables on remote
       nodes.  Currently, only OpenPBS and PBSPro are the only two systems that implement the  TM
       interface.  Hence, when LAM detects that it is running in a PBS job, it will automatically
       set the tm priority to 50.  When not running in a PBS job,  the  tm  module  will  not  be
       available.

       The tm boot module only has one tunable parameter:

       boot_tm_priority
           Using  the  priority  argument  can  override  LAM's  automatic  run-time  boot module
           selection algorithms.  This parameter only has effect when the tm module  is  eligible
           to be run (i.e., when running in a PBS job).

SEE ALSO

       lamssi(7), mpirun(1), LAM User's Guide