Provided by: slurm-client_15.08.7-1build1_amd64 bug

NAME

       srun - Run parallel jobs

SYNOPSIS

       srun [OPTIONS...]  executable [args...]

DESCRIPTION

       Run  a  parallel  job on cluster managed by Slurm.  If necessary, srun will first create a
       resource allocation in which to run the parallel job.

       The following document describes the the influence of various options on the allocation of
       cpus to jobs and tasks.
       http://slurm.schedmd.com/cpu_management.html

OPTIONS

       --accel-bind=<options>
              Control  how  tasks  are  bound  to  generic  resources  of  type gpu, mic and nic.
              Multiple options may be specified. Supported options are as include:

              g      Bind each task to GPUs which are closest to the allocated CPUs.

              m      Bind each task to MICs which are closest to the allocated CPUs.

              n      Bind each task to NICs which are closest to the allocated CPUs.

              v      Verbose mode. Log how tasks are bound to GPU and NIC devices.

       -A, --account=<account>
              Charge resources used by  this  job  to  specified  account.   The  account  is  an
              arbitrary  string.  The  account name may be changed after job submission using the
              scontrol command.

       --acctg-freq
              Define the job accounting and profiling sampling intervals.  This can  be  used  to
              override  the  JobAcctGatherFrequency  parameter  in  Slurm's  configuration  file,
              slurm.conf.  The supported format is follows:

              --acctg-freq=<datatype>=<interval>
                          where <datatype>=<interval> specifies the task  sampling  interval  for
                          the  jobacct_gather  plugin or a sampling interval for a profiling type
                          by   the   acct_gather_profile   plugin.   Multiple,    comma-separated
                          <datatype>=<interval>  intervals  may be specified. Supported datatypes
                          are as follows:

                          task=<interval>
                                 where <interval> is the task sampling interval  in  seconds  for
                                 the  jobacct_gather  plugins  and  for  task  profiling  by  the
                                 acct_gather_profile plugin.  NOTE: This  frequency  is  used  to
                                 monitor  memory usage. If memory limits are enforced the highest
                                 frequency a user can  request  is  what  is  configured  in  the
                                 slurm.conf file.  They can not turn it off (=0) either.

                          energy=<interval>
                                 where  <interval> is the sampling interval in seconds for energy
                                 profiling using the acct_gather_energy plugin

                          network=<interval>
                                 where  <interval>  is  the  sampling  interval  in  seconds  for
                                 infiniband profiling using the acct_gather_infiniband plugin.

                          filesystem=<interval>
                                 where  <interval>  is  the  sampling  interval  in  seconds  for
                                 filesystem profiling using the acct_gather_filesystem plugin.

              The default value for the task sampling interval
              is 30. The default value for all other intervals is 0.  An interval of  0  disables
              sampling  of  the  specified  type.  If the task sampling interval is 0, accounting
              information is collected only at job termination (reducing Slurm interference  with
              the job).
              Smaller  (non-zero)  values have a greater impact upon job performance, but a value
              of 30 seconds is not likely to be noticeable  for  applications  having  less  than
              10,000 tasks.

       -B --extra-node-info=<sockets[:cores[:threads]]>
              Request  a  specific allocation of resources with details as to the number and type
              of computational resources  within  a  cluster:  number  of  sockets  (or  physical
              processors)  per node, cores per socket, and threads per core.  The total amount of
              resources being requested is the product of all of the terms.  Each value specified
              is  considered  a minimum.  An asterisk (*) can be used as a placeholder indicating
              that all available resources of that type are to be utilized.  As with  nodes,  the
              individual levels can also be specified in separate options if desired:
                  --sockets-per-node=<sockets>
                  --cores-per-socket=<cores>
                  --threads-per-core=<threads>
              If  task/affinity  plugin  is enabled, then specifying an allocation in this manner
              also sets a default --cpu_bind option of threads  if  the  -B  option  specifies  a
              thread  count, otherwise an option of cores if a core count is specified, otherwise
              an option of sockets.  If SelectType is configured to select/cons_res, it must have
              a  parameter  of  CR_Core,  CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this
              option  to  be  honored.   This  option  is  not  supported  on  BlueGene   systems
              (select/bluegene  plugin  is  configured).  If not specified, the scontrol show job
              will display 'ReqS:C:T=*:*:*'.

       --bb=<spec>
              Burst buffer specification. The form of  the  specification  is  system  dependent.
              Also see --bbf.

       --bbf=<file_name>
              Path  of file containing burst buffer specification.  The form of the specification
              is system dependent.  Also see --bb.

       --bcast[=<dest_path>]
              Copy executable file to allocated compute nodes.  If a file name is specified, copy
              the  executable  to  the  specified destination file path. If no path is specified,
              copy the file to a  file  named  "slurm_bcast_<job_id>.<step_id>"  in  the  current
              working.   For  example,  "srun  --bcast=/tmp/mine  -N3  a.out"  will copy the file
              "a.out" from your current directory to the file "/tmp/mine" on each  of  the  three
              allocated compute nodes and execute that file.

       --begin=<time>
              Defer  initiation  of  this  job until the specified time.  It accepts times of the
              form HH:MM:SS to run a job at a specific time of day (seconds are  optional).   (If
              that  time  is  already  past,  the  next  day  is  assumed.)  You may also specify
              midnight, noon, fika (3 PM) or teatime (4  PM)  and  you  can  have  a  time-of-day
              suffixed with AM or PM for running in the morning or the evening.  You can also say
              what day the job will be run, by specifying a date of the form MMDDYY  or  MM/DD/YY
              YYYY-MM-DD.    Combine    date    and    time    using    the    following   format
              YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now + count time-units, where
              the time-units can be seconds (default), minutes, hours, days, or weeks and you can
              tell Slurm to run the job today with the keyword today and to run the job  tomorrow
              with the keyword tomorrow.  The value may be changed after job submission using the
              scontrol command.  For example:
                 --begin=16:00
                 --begin=now+1hour
                 --begin=now+60           (seconds by default)
                 --begin=2010-01-20T12:34:00

              Notes on date/time specifications:
               - Although the 'seconds' field of the HH:MM:SS time specification  is  allowed  by
              the  code,  note that the poll time of the Slurm scheduler is not precise enough to
              guarantee dispatch of the job on the exact second.  The job  will  be  eligible  to
              start  on  the  next  poll  following  the  specified time. The exact poll interval
              depends on the Slurm scheduler (e.g., 60 seconds with the default sched/builtin).
               - If no time (HH:MM:SS) is specified, the default is (00:00:00).
               - If a date is specified without a year (e.g., MM/DD) then  the  current  year  is
              assumed,  unless  the combination of MM/DD and HH:MM:SS has already passed for that
              year, in which case the next year is used.

       --checkpoint=<time>
              Specifies the interval between creating checkpoints of the job step.   By  default,
              the  job  step  will  have no checkpoints created.  Acceptable time formats include
              "minutes",      "minutes:seconds",      "hours:minutes:seconds",      "days-hours",
              "days-hours:minutes" and "days-hours:minutes:seconds".

       --checkpoint-dir=<directory>
              Specifies  the  directory  into  which  the  job or job step's checkpoint should be
              written (used by  the  checkpoint/blcr  and  checkpoint/xlch  plugins  only).   The
              default  value  is  the current working directory.  Checkpoint files will be of the
              form "<job_id>.ckpt" for jobs and "<job_id>.<step_id>.ckpt" for job steps.

       --comment=<string>
              An arbitrary comment.

       -C, --constraint=<list>
              Nodes can have features assigned to them by the  Slurm  administrator.   Users  can
              specify  which  of  these  features  are required by their job using the constraint
              option.  Only nodes having features matching the job constraints will  be  used  to
              satisfy  the request.  Multiple constraints may be specified with AND, OR, matching
              OR, resource counts, etc.  Supported constraint options include:

              Single Name
                     Only nodes which have the specified feature  will  be  used.   For  example,
                     --constraint="intel"

              Node Count
                     A  request  can  specify  the  number  of  nodes needed with some feature by
                     appending an asterisk  and  count  after  the  feature  name.   For  example
                     "--nodes=16 --constraint=graphics*4 ..."  indicates that the job requires 16
                     nodes and  that  at  least  four  of  those  nodes  must  have  the  feature
                     "graphics."

              AND    If only nodes with all of specified features will be used.  The ampersand is
                     used for an AND operator.  For example, --constraint="intel&gpu"

              OR     If only nodes with at least one of specified features  will  be  used.   The
                     vertical    bar    is    used    for   an   OR   operator.    For   example,
                     --constraint="intel|amd"

              Matching OR
                     If only one of a set of possible options should be used  for  all  allocated
                     nodes,  then  use  the  OR  operator  and  enclose the options within square
                     brackets.  For example:  "--constraint=[rack1|rack2|rack3|rack4]"  might  be
                     used  to  specify  that  all nodes must be allocated on a single rack of the
                     cluster, but any of those four racks can be used.

              Multiple Counts
                     Specific counts of multiple resources may be  specified  by  using  the  AND
                     operator  and  enclosing  the  options within square brackets.  For example:
                     "--constraint=[rack1*2&rack2*4]" might be used to  specify  that  two  nodes
                     must be allocated from nodes with the feature of "rack1" and four nodes must
                     be allocated from nodes with the feature "rack2".

              WARNING: When srun is executed from within salloc or sbatch,
              the constraint value can only contain a single feature  name.  None  of  the  other
              operators are currently supported for job steps.

       --contiguous
              If  set, then the allocated nodes must form a contiguous set.  Not honored with the
              topology/tree or topology/3d_torus plugins, both  of  which  can  modify  the  node
              ordering.  Not honored for a job step's allocation.

       --cores-per-socket=<cores>
              Restrict  node  selection  to nodes with at least the specified number of cores per
              socket.  See additional information under -B option above when task/affinity plugin
              is enabled.

       --cpu_bind=[{quiet,verbose},]type
              Bind  tasks  to  CPUs.   Used  only when the task/affinity or task/cgroup plugin is
              enabled.  The configuration parameter TaskPluginParam may override  these  options.
              For  example,  if TaskPluginParam is configured to bind to cores, your job will not
              be able to bind tasks to sockets.   NOTE:  To  have  Slurm  always  report  on  the
              selected  CPU  binding for all commands executed in a shell, you can enable verbose
              mode by setting the SLURM_CPU_BIND environment variable value to "verbose".

              The following informational environment variables are set  when  --cpu_bind  is  in
              use:
                   SLURM_CPU_BIND_VERBOSE
                   SLURM_CPU_BIND_TYPE
                   SLURM_CPU_BIND_LIST

              See  the  ENVIRONMENT  VARIABLES  section  for  a  more detailed description of the
              individual SLURM_CPU_BIND variables. These  variable  are  available  only  if  the
              task/affinity plugin is configured.

              When using --cpus-per-task to run multithreaded tasks, be aware that CPU binding is
              inherited from the parent of the process.  This means that the  multithreaded  task
              should  either  specify or clear the CPU binding itself to avoid having all threads
              of the multithreaded task use the same mask/CPU as the parent.  Alternatively,  fat
              masks  (masks  which specify more than one allowed CPU) could be used for the tasks
              in order to provide multiple CPUs for the multithreaded tasks.

              By default, a job step has access to every CPU allocated to  the  job.   To  ensure
              that distinct CPUs are allocated to each job step, use the --exclusive option.

              Note  that a job step can be allocated different numbers of CPUs on each node or be
              allocated CPUs not starting at location zero. Therefore one of  the  options  which
              automatically generate the task binding is recommended.  Explicitly specified masks
              or bindings are only honored when the job step has been allocated  every  available
              CPU on the node.

              Binding  a task to a NUMA locality domain means to bind the task to the set of CPUs
              that belong to the NUMA locality domain or "NUMA node".  If  NUMA  locality  domain
              options  are used on systems with no NUMA support, then each socket is considered a
              locality domain.

              Auto Binding
                     Applies only when task/affinity is  enabled.  If  the  job  step  allocation
                     includes  an allocation with a number of sockets, cores, or threads equal to
                     the number of tasks times cpus-per-task, then the tasks will by  default  be
                     bound  to  the  appropriate  resources  (auto binding). Disable this mode of
                     operation     by     explicitly     setting      "--cpu_bind=none".      Use
                     TaskPluginParam=autobind=[threads|cores|sockets]   to   set  a  default  cpu
                     binding in case "auto binding" doesn't find a match.

              Supported options include:

                     q[uiet]
                            Quietly bind before task runs (default)

                     v[erbose]
                            Verbosely report binding before task runs

                     no[ne] Do not bind tasks to CPUs (default unless auto binding is applied)

                     rank   Automatically bind by task rank.  The lowest numbered  task  on  each
                            node is bound to socket (or core or thread) zero, etc.  Not supported
                            unless the entire node is allocated to the job.

                     map_cpu:<list>
                            Bind by mapping CPU  IDs  to  tasks  as  specified  where  <list>  is
                            <cpuid1>,<cpuid2>,...<cpuidN>.   The  mapping is specified for a node
                            and identical mapping is applied to the tasks on every node (i.e. the
                            lowest  task  ID on each node is mapped to the first CPU ID specified
                            in the list, etc.).  CPU IDs are interpreted as decimal values unless
                            they  are  preceded  with  '0x' in which case they are interpreted as
                            hexadecimal  values.   Not  supported  unless  the  entire  node   is
                            allocated to the job.

                     mask_cpu:<list>
                            Bind  by  setting  CPU  masks  on  tasks as specified where <list> is
                            <mask1>,<mask2>,...<maskN>.  The mapping is specified for a node  and
                            identical  mapping  is  applied  to the tasks on every node (i.e. the
                            lowest task ID on each node is mapped to the first mask specified  in
                            the  list,  etc.).   CPU  masks are always interpreted as hexadecimal
                            values but can be preceded  with  an  optional  '0x'.  Not  supported
                            unless the entire node is allocated to the job.

                     rank_ldom
                            Bind  to  a  NUMA  locality  domain by rank. Not supported unless the
                            entire node is allocated to the job.

                     map_ldom:<list>
                            Bind by mapping NUMA locality domain IDs to tasks as specified  where
                            <list>  is  <ldom1>,<ldom2>,...<ldomN>.   The locality domain IDs are
                            interpreted as decimal values unless they are preceded with  '0x'  in
                            which case they are interpreted as hexadecimal values.  Not supported
                            unless the entire node is allocated to the job.

                     mask_ldom:<list>
                            Bind by setting NUMA locality domain  masks  on  tasks  as  specified
                            where  <list>  is  <mask1>,<mask2>,...<maskN>.   NUMA locality domain
                            masks are  always  interpreted  as  hexadecimal  values  but  can  be
                            preceded with an optional '0x'.  Not supported unless the entire node
                            is allocated to the job.

                     sockets
                            Automatically generate masks binding tasks to sockets.  Only the CPUs
                            on  the socket which have been allocated to the job will be used.  If
                            the number of tasks differs from the number of allocated sockets this
                            can result in sub-optimal binding.

                     cores  Automatically  generate  masks binding tasks to cores.  If the number
                            of tasks differs from the number of allocated cores this  can  result
                            in sub-optimal binding.

                     threads
                            Automatically generate masks binding tasks to threads.  If the number
                            of tasks differs from the number of allocated threads this can result
                            in sub-optimal binding.

                     ldoms  Automatically  generate masks binding tasks to NUMA locality domains.
                            If the number of tasks differs from the number of allocated  locality
                            domains this can result in sub-optimal binding.

                     boards Automatically  generate masks binding tasks to boards.  If the number
                            of tasks differs from the number of allocated boards this can  result
                            in  sub-optimal  binding. This option is supported by the task/cgroup
                            plugin only.

                     help   Show help message for cpu_bind

       --cpu-freq =<p1[-p2[:p3]]>

              Request that the job step initiated by this srun command be run at  some  requested
              frequency if possible, on the CPUs selected for the step on the compute node(s).

              p1  can  be   [####  |  low  | medium | high | highm1] which will set the frequency
              scaling_speed to the corresponding value, and set the frequency scaling_governor to
              UserSpace. See below for definition of the values.

              p1  can  be  [Conservative | OnDemand | Performance | PowerSave] which will set the
              scaling_governor to the corresponding value. The governor has to be in the list set
              by the slurm.conf option CpuFreqGovernors.

              When  p2  is  present,  p1 will be the minimum scaling frequency and p2 will be the
              maximum scaling frequency.

              p2 can be  [#### | medium | high | highm1] p2 must be greater than p1.

              p3 can be [Conservative | OnDemand | Performance |  PowerSave  |  UserSpace]  which
              will set the governor to the corresponding value.

              If  p3  is  UserSpace, the frequency scaling_speed will be set by a power or energy
              aware scheduling strategy to a value between p1 and p2 that lets the job run within
              the site's power goal. The job may be delayed if p1 is higher than a frequency that
              allows the job to run withing the goal.

              If the current frequency is < min, it will be set to min. Likewise, if the  current
              frequency is > max, it will be set to max.

              Acceptable values at present include:

              ####          frequency in kilohertz

              Low           the lowest available frequency

              High          the highest available frequency

              HighM1        (high minus one) will select the next highest available frequency

              Medium        attempts to set a frequency in the middle of the available range

              Conservative  attempts to use the Conservative CPU governor

              OnDemand      attempts to use the OnDemand CPU governor (the default value)

              Performance   attempts to use the Performance CPU governor

              PowerSave     attempts to use the PowerSave CPU governor

              UserSpace     attempts to use the UserSpace CPU governor

              The following informational environment variable is set in the job
              step when --cpu-freq option is requested.
                      SLURM_CPU_FREQ_REQ

              This  environment  variable  can  also  be  used  to  supply  the value for the CPU
              frequency request if it is set when the 'srun' command is issued.   The  --cpu-freq
              on  the command line will override the environment variable value.  The form on the
              environment variable is  the  same  as  the  command  line.   See  the  ENVIRONMENT
              VARIABLES section for a description of the SLURM_CPU_FREQ_REQ variable.

              NOTE: This parameter is treated as a request, not a requirement.  If the job step's
              node does not support setting the CPU frequency, or the requested value is  outside
              the  bounds  of  the  legal  frequencies,  an  error is logged, but the job step is
              allowed to continue.

              NOTE: Setting the frequency for just the CPUs of the  job  step  implies  that  the
              tasks    are    confined    to    those   CPUs.    If   task   confinement   (i.e.,
              TaskPlugin=task/affinity  or  TaskPlugin=task/cgroup  with   the   "ConstrainCores"
              option) is not configured, this parameter is ignored.

              NOTE:  When  the step completes, the frequency and governor of each selected CPU is
              reset to the configured CpuFreqDef value with a default value of the  OnDemand  CPU
              governor.

              NOTE:  When  submitting  jobs  with   the  --cpu-freq  option with linuxproc as the
              ProctrackType can cause jobs to run too quickly before Accounting is able  to  poll
              for job information. As a result not all of accounting information will be present.

       -c, --cpus-per-task=<ncpus>
              Request  that  ncpus  be  allocated  per  process. This may be useful if the job is
              multithreaded and requires more than one CPU per task for optimal performance.  The
              default  is one CPU per process.  If -c is specified without -n, as many tasks will
              be allocated per node as possible while satisfying the -c restriction. For instance
              on  a  cluster  with 8 CPUs per node, a job request for 4 nodes and 3 CPUs per task
              may be allocated 3 or 6 CPUs per node (1  or  2  tasks  per  node)  depending  upon
              resource consumption by other jobs. Such a job may be unable to execute more than a
              total of 4 tasks.  This option may also be useful to spawn tasks without allocating
              resources to the job step from the job's allocation when running multiple job steps
              with the --exclusive option.

              WARNING: There are configurations and options interpreted differently  by  job  and
              job step requests which can result in inconsistencies for this option.  For example
              srun -c2 --threads-per-core=1 prog may allocate two cores for the job, but if  each
              of those cores contains two threads, the job allocation will include four CPUs. The
              job step allocation will then launch two threads per CPU for a total of two tasks.

              WARNING:  When  srun  is  executed  from  within  salloc  or  sbatch,   there   are
              configurations and options which can result in inconsistent allocations when -c has
              a value greater than -c on salloc or sbatch.

       -d, --dependency=<dependency_list>
              Defer the start of this job until the specified dependencies  have  been  satisfied
              completed.  This  option  does not apply to job steps (executions of srun within an
              existing salloc or sbatch allocation) only to job  allocations.   <dependency_list>
              is     of     the     form     <type:job_id[:job_id][,type:job_id[:job_id]]>     or
              <type:job_id[:job_id][?type:job_id[:job_id]]>.  All dependencies must be  satisfied
              if the "," separator is used.  Any dependency may be satisfied if the "?" separator
              is used.  Many jobs can share the same dependency and these jobs may even belong to
              different  users. The  value may be changed after job submission using the scontrol
              command.  Once a job dependency fails due to the termination state of  a  preceding
              job, the dependent job will never be run, even if the preceding job is requeued and
              has a different termination state in a subsequent execution.

              after:job_id[:jobid...]
                     This job can begin execution after the specified jobs have begun execution.

              afterany:job_id[:jobid...]
                     This job can begin execution after the specified jobs have terminated.

              afternotok:job_id[:jobid...]
                     This job can begin execution after the specified  jobs  have  terminated  in
                     some failed state (non-zero exit code, node failure, timed out, etc).

              afterok:job_id[:jobid...]
                     This  job  can  begin  execution  after the specified jobs have successfully
                     executed (ran to completion with an exit code of zero).

              expand:job_id
                     Resources allocated to this job should be used to expand the specified  job.
                     The  job  to  expand  must  share  the  same  QOS  (Quality  of Service) and
                     partition.  Gang scheduling of  resources  in  the  partition  is  also  not
                     supported.

              singleton
                     This  job can begin execution after any previously launched jobs sharing the
                     same job name and user have terminated.

       -D, --chdir=<path>
              Have the remote processes do a  chdir  to  path  before  beginning  execution.  The
              default  is to chdir to the current working directory of the srun process. The path
              can be specified as full path or relative path to the directory where  the  command
              is executed.

       -e, --error=<mode>
              Specify  how  stderr  is  to  be  redirected.  By default in interactive mode, srun
              redirects stderr to the same file as stdout,  if  one  is  specified.  The  --error
              option  is  provided  to  allow  stdout  and  stderr  to be redirected to different
              locations.  See IO Redirection below for  more  options.   If  the  specified  file
              already exists, it will be overwritten.

       -E, --preserve-env
              Pass  the  current  values  of  environment variables SLURM_NNODES and SLURM_NTASKS
              through to the executable, rather than computing them from commandline parameters.

       --epilog=<executable>
              srun will run executable just after the  job  step  completes.   The  command  line
              arguments  for  executable  will  be the command and arguments of the job step.  If
              executable is "none", then no srun epilog will be run. This parameter overrides the
              SrunEpilog  parameter  in slurm.conf. This parameter is completely independent from
              the Epilog parameter in slurm.conf.

       --exclusive[=user]
              This option has two slightly different meanings for job and job  step  allocations.
              When  used  to  initiate  a  job,  the job allocation cannot share nodes with other
              running jobs   (or  just  other  users  with  the  "=user"  option).   The  default
              shared/exclusive  behavior  depends  on  system  configuration  and the partition's
              Shared option takes precedence over the job's option.

              This option can also be used when initiating more  than  one  job  step  within  an
              existing resource allocation, where you want separate processors to be dedicated to
              each job step. If sufficient processors are not available to initiate the job step,
              it  will  be deferred. This can be thought of as providing a mechanism for resource
              management to the job within it's allocation.

              The exclusive allocation of CPUs only applies to job steps explicitly invoked  with
              the  --exclusive  option.  For example, a job might be allocated one node with four
              CPUs and a remote shell invoked on the allocated node. If that shell is not invoked
              with  the  --exclusive  option, then it may create a job step with four tasks using
              the  --exclusive  option  and  not  conflict  with  the  remote  shell's   resource
              allocation.  Use the --exclusive option to invoke every job step to insure distinct
              resources for each step.

              Note that all CPUs allocated to a job are available to each  job  step  unless  the
              --exclusive  option  is  used  plus  task  affinity  is  configured. Since resource
              management is provided by processor, the --ntasks option must be specified, but the
              following  options  should  NOT  be specified --relative, --distribution=arbitrary.
              See EXAMPLE below.

       --export=<environment variables | NONE>
              Identify which environment variables are propagated to  the  launched  application.
              Multiple  environment  variable  names  should  be  comma  separated.   Environment
              variable names may be specified to propagate the current value of  those  variables
              (e.g.  "--export=EDITOR")  or  specific  values  for  the variables may be exported
              (e.g.. "--export=EDITOR=/bin/vi") in addition to  the  environment  variables  that
              would otherwise be set.  By default all environment variables are propagated.

       --gid=<group>
              If  srun  is run as root, and the --gid option is used, submit the job with group's
              group access permissions.  group may be the group name or the numerical group ID.

       --gres=<list>
              Specifies a comma delimited list of generic consumable resources.   The  format  of
              each  entry  on  the  list  is  "name[[:type]:count]".   The  name  is  that of the
              consumable resource.  The count is the number of those  resources  with  a  default
              value  of  1.   The  specified resources will be allocated to the job on each node.
              The  available  generic  consumable  resources  is  configurable  by   the   system
              administrator.   A  list  of available generic consumable resources will be printed
              and the command will exit if the  option  argument  is  "help".   Examples  of  use
              include  "--gres=gpu:2,mic=1",  "--gres=gpu:kepler:2", and "--gres=help".  NOTE: By
              default, a job step is allocated all of the generic resources that  have  allocated
              to  the  job.  To change the behavior so that each job step is allocated no generic
              resources, explicitly set the value of --gres  to  specify  zero  counts  for  each
              generic  resource  OR  set  "--gres=none"  OR  set  the SLURM_STEP_GRES environment
              variable to "none".

       -H, --hold
              Specify the job is to be submitted in a held state (priority of zero).  A held  job
              can  now  be  released using scontrol to reset its priority (e.g. "scontrol release
              <job_id>").

       -h, --help
              Display help information and exit.

       --hint=<type>
              Bind tasks according to application hints.

              compute_bound
                     Select settings for compute  bound  applications:  use  all  cores  in  each
                     socket, one thread per core.

              memory_bound
                     Select  settings  for  memory  bound applications: use only one core in each
                     socket, one thread per core.

              [no]multithread
                     [don't] use extra threads with in-core  multi-threading  which  can  benefit
                     communication intensive applications.  Only supported with the task/affinity
                     plugin.

              help   show this help message

       -I, --immediate[=<seconds>]
              exit if resources are not available  within  the  time  period  specified.   If  no
              argument  is  given,  resources  must  be  available immediately for the request to
              succeed.  By default,  --immediate  is  off,  and  the  command  will  block  until
              resources  become  available.  Since this option's argument is optional, for proper
              parsing the single letter option must be followed immediately with  the  value  and
              not include a space between them. For example "-I60" and not "-I 60".

       -i, --input=<mode>
              Specify  how  stdin  is  to  redirected.  By default, srun redirects stdin from the
              terminal all tasks. See IO Redirection below for  more  options.   For  OS  X,  the
              poll() function does not support stdin, so input from a terminal is not possible.

       -J, --job-name=<jobname>
              Specify  a  name  for the job. The specified name will appear along with the job id
              number when querying running jobs on  the  system.  The  default  is  the  supplied
              executable   program's   name.  NOTE:  This  information  may  be  written  to  the
              slurm_jobacct.log file. This file is space delimited so if a space is used  in  the
              jobname  name  it  will  cause  problems in properly displaying the contents of the
              slurm_jobacct.log file when the sacct command is used.

       --jobid=<jobid>
              Initiate a job step under an already allocated job with  job  id  id.   Using  this
              option  will  cause  srun  to  behave  exactly  as  if the SLURM_JOB_ID environment
              variable was set.

       -K, --kill-on-bad-exit[=0|1]
              Controls whether or not to terminate a job if any task exits with a  non-zero  exit
              code.  If  this  option is not specified, the default action will be based upon the
              Slurm configuration parameter of KillOnBadExit. If this  option  is  specified,  it
              will  take  precedence  over  KillOnBadExit.  An  option  argument of zero will not
              terminate the job. A non-zero argument or  no  argument  will  terminate  the  job.
              Note:  This option takes precedence over the -W, --wait option to terminate the job
              immediately if a task exits  with  a  non-zero  exit  code.   Since  this  option's
              argument  is optional, for proper parsing the single letter option must be followed
              immediately with the value and not include a space between them. For example  "-K1"
              and not "-K 1".

       -k, --no-kill
              Do  not  automatically  terminate  a  job if one of the nodes it has been allocated
              fails.  This option is only recognized on a job allocation, not for the  submission
              of   individual   job   steps.   The  job  will  assume  all  responsibilities  for
              fault-tolerance.  Tasks launch using this option will not be considered  terminated
              (e.g.  -K,  --kill-on-bad-exit  and -W, --wait options will have no effect upon the
              job step).  The active job step (MPI job) will likely suffer  a  fatal  error,  but
              subsequent job steps may be run if this option is specified.  The default action is
              to terminate the job upon node failure.

       --launch-cmd
              Print external launch command instead of running job normally through  Slurm.  This
              option is only valid if using something other than the launch/slurm plugin.

       --launcher-opts=<options>
              Options  for  the  external launcher if using something other than the launch/slurm
              plugin.

       -l, --label
              Prepend task number to lines of stdout/err.  The --label option will prepend  lines
              of output with the remote task id.

       -L, --licenses=<license>
              Specification  of  licenses  (or  other  resources  available  on  all nodes of the
              cluster) which must be allocated to this job.  License names can be followed  by  a
              colon and count (the default count is one).  Multiple license names should be comma
              separated (e.g.  "--licenses=foo:4,bar").

       -m, --distribution=
              *|block|cyclic|arbitrary|plane=<options> [:*|block|cyclic|fcyclic[:*|block|
              cyclic|fcyclic]][,Pack|NoPack]

              Specify  alternate distribution methods for remote processes.  This option controls
              the distribution of tasks to the nodes on which resources have been allocated,  and
              the distribution of those resources to tasks for binding (task affinity). The first
              distribution method (before the first ":") controls the distribution  of  tasks  to
              nodes.   The  second  distribution  method  (after  the  first  ":")  controls  the
              distribution of allocated CPUs across sockets  for  binding  to  tasks.  The  third
              distribution  method  (after the second ":") controls the distribution of allocated
              CPUs across cores for binding to tasks.  The second and third  distributions  apply
              only  if task affinity is enabled.  The third distribution is supported only if the
              task/cgroup plugin is configured. The default value for each distribution  type  is
              specified by *.

              Note  that  with  select/cons_res,  the number of CPUs allocated on each socket and
              node may be different. Refer to http://slurm.schedmd.com/mc_support.html  for  more
              information  on resource allocation, distribution of tasks to nodes, and binding of
              tasks to CPUs.
              First distribution method (distribution of tasks across nodes):

              *      Use the default method for distributing tasks to nodes (block).

              block  The block distribution method will distribute tasks  to  a  node  such  that
                     consecutive tasks share a node. For example, consider an allocation of three
                     nodes each with two  cpus.  A  four-task  block  distribution  request  will
                     distribute  those  tasks  to  the  nodes with tasks one and two on the first
                     node, task three on the second node, and task four on the third node.  Block
                     distribution  is  the  default  behavior  if the number of tasks exceeds the
                     number of allocated nodes.

              cyclic The cyclic distribution method will distribute tasks to  a  node  such  that
                     consecutive  tasks  are distributed over consecutive nodes (in a round-robin
                     fashion). For example, consider an allocation of three nodes each  with  two
                     cpus. A four-task cyclic distribution request will distribute those tasks to
                     the nodes with tasks one and four on the first node, task two on the  second
                     node,  and  task  three  on  the  third  node.  Note that when SelectType is
                     select/cons_res, the same number of CPUs may not be allocated on each  node.
                     Task  distribution  will be round-robin among all the nodes with CPUs yet to
                     be assigned to tasks.  Cyclic distribution is the default  behavior  if  the
                     number of tasks is no larger than the number of allocated nodes.

              plane  The  tasks  are  distributed  in  blocks  of  a specified size.  The options
                     include a number representing the size of the task block.  This is  followed
                     by  an optional specification of the task distribution scheme within a block
                     of tasks and between the blocks of tasks.  The number of  tasks  distributed
                     to  each  node  is  the  same  as  for  cyclic distribution, but the taskids
                     assigned to each node depend on the plane size. For more details  (including
                     examples and diagrams), please see
                     http://slurm.schedmd.com/mc_support.html
                     and
                     http://slurm.schedmd.com/dist_plane.html

              arbitrary
                     The  arbitrary  method  of  distribution will allocate processes in-order as
                     listed in file designated by the environment  variable  SLURM_HOSTFILE.   If
                     this  variable  is  listed it will over ride any other method specified.  If
                     not set the method will default to block.  Inside the hostfile must  contain
                     at  minimum  the  number  of  hosts  requested  and be one per line or comma
                     separated.  If specifying a task count (-n, --ntasks=<number>),  your  tasks
                     will be laid out on the nodes in the order of the file.
                     NOTE:  The  arbitrary  distribution option on a job allocation only controls
                     the nodes to be allocated to the job and not the allocation of CPUs on those
                     nodes. This option is meant primarily to control a job step's task layout in
                     an existing job allocation for the srun command.

              Second distribution method (distribution of CPUs across sockets for binding):

              *      Use the default method for distributing CPUs across sockets (cyclic).

              block  The block distribution method will distribute allocated  CPUs  consecutively
                     from the same socket for binding to tasks, before using the next consecutive
                     socket.

              cyclic The cyclic distribution method will distribute allocated CPUs for binding to
                     a  given  task  consecutively  from  the  same  socket,  and  from  the next
                     consecutive socket for the  next  task,  in  a  round-robin  fashion  across
                     sockets.

              fcyclic
                     The  fcyclic  distribution method will distribute allocated CPUs for binding
                     to tasks from consecutive  sockets  in  a  round-robin  fashion  across  the
                     sockets.

              Third distribution method (distribution of CPUs across cores for binding):

              *      Use  the  default  method for distributing CPUs across cores (inherited from
                     second distribution method).

              block  The block distribution method will distribute allocated  CPUs  consecutively
                     from  the  same core for binding to tasks, before using the next consecutive
                     core.

              cyclic The cyclic distribution method will distribute allocated CPUs for binding to
                     a given task consecutively from the same core, and from the next consecutive
                     core for the next task, in a round-robin fashion across cores.

              fcyclic
                     The fcyclic distribution method will distribute allocated CPUs  for  binding
                     to tasks from consecutive cores in a round-robin fashion across the cores.

              Optional control for task distribution over nodes:

              Pack   Rather  than  evenly  distributing  a  job  step's  tasks evenly across it's
                     allocated nodes, pack them as tightly as possible on the nodes.

              NoPack Rather than packing a job step's tasks as tightly as possible on the  nodes,
                     distribute   them   evenly.    This   user   option   will   supersede   the
                     SelectTypeParameters CR_Pack_Nodes configuration parameter.

       --mail-type=<type>
              Notify user by email when certain event types occur.  Valid type values  are  NONE,
              BEGIN,  END,  FAIL,  REQUEUE,  ALL  (equivalent  to  BEGIN, END, FAIL, REQUEUE, and
              STAGE_OUT), STAGE_OUT (burst buffer stage out completed), TIME_LIMIT, TIME_LIMIT_90
              (reached  90  percent  of  time  limit),  TIME_LIMIT_80 (reached 80 percent of time
              limit), and TIME_LIMIT_50 (reached 50 percent of time limit).  Multiple type values
              may  be  specified in a comma separated list.  The user to be notified is indicated
              with --mail-user.

       --mail-user=<user>
              User to receive email notification of state changes as defined by --mail-type.  The
              default value is the submitting user.

       --mem=<MB>
              Specify  the  real  memory  required  per  node  in  MegaBytes.   Default  value is
              DefMemPerNode and the maximum  value  is  MaxMemPerNode.  If  configured,  both  of
              parameters  can  be  seen  using  the scontrol show config command.  This parameter
              would   generally   be   used   if   whole   nodes   are    allocated    to    jobs
              (SelectType=select/linear).   Specifying a memory limit of zero for a job step will
              restrict the job step to the amount of memory allocated to the job, but not  remove
              any  of  the job's memory allocation from being available to other job steps.  Also
              see --mem-per-cpu.  --mem and --mem-per-cpu are mutually exclusive.  NOTE: A memory
              size specification is treated as a special case and grants the job access to all of
              the memory on each node.  NOTE: Enforcement of memory limits currently relies  upon
              the  task/cgroup  plugin  or  enabling of accounting, which samples memory use on a
              periodic basis (data need not be stored, just collected). In both cases memory  use
              is based upon the job's Resident Set Size (RSS). A task may exceed the memory limit
              until the next periodic accounting sample.

       --mem-per-cpu=<MB>
              Minimum  memory  required  per  allocated  CPU  in  MegaBytes.   Default  value  is
              DefMemPerCPU  and  the  maximum  value  is  MaxMemPerCPU  (see exception below). If
              configured, both of parameters can be seen using the scontrol show config  command.
              Note  that  if  the  job's --mem-per-cpu value exceeds the configured MaxMemPerCPU,
              then the user's limit will be treated as a memory  limit  per  task;  --mem-per-cpu
              will be reduced to a value no larger than MaxMemPerCPU; --cpus-per-task will be set
              and the value of --cpus-per-task multiplied by the  new  --mem-per-cpu  value  will
              equal the original --mem-per-cpu value specified by the user.  This parameter would
              generally   be   used   if   individual   processors   are   allocated   to    jobs
              (SelectType=select/cons_res).   If  resources  are allocated by the core, socket or
              whole nodes; the number of CPUs allocated to a job may  be  higher  than  the  task
              count  and the value of --mem-per-cpu should be adjusted accordingly.  Specifying a
              memory limit of zero for a job step will restrict the job step  to  the  amount  of
              memory allocated to the job, but not remove any of the job's memory allocation from
              being available to other job steps.  Also see --mem.  --mem and  --mem-per-cpu  are
              mutually exclusive.

       --mem_bind=[{quiet,verbose},]type
              Bind  tasks  to  memory. Used only when the task/affinity plugin is enabled and the
              NUMA memory functions are available.  Note that the resolution of  CPU  and  memory
              binding may differ on some architectures. For example, CPU binding may be performed
              at the level of the cores within a processor while memory binding will be performed
              at  the  level  of nodes, where the definition of "nodes" may differ from system to
              system. The use of any type other than "none" or "local" is  not  recommended.   If
              you  want  greater  control,  try  running  a  simple  test  code  with the options
              "--cpu_bind=verbose,none  --mem_bind=verbose,none"  to   determine   the   specific
              configuration.

              NOTE:  To  have Slurm always report on the selected memory binding for all commands
              executed in a shell, you can enable verbose  mode  by  setting  the  SLURM_MEM_BIND
              environment variable value to "verbose".

              The  following  informational  environment  variables are set when --mem_bind is in
              use:

                   SLURM_MEM_BIND_VERBOSE
                   SLURM_MEM_BIND_TYPE
                   SLURM_MEM_BIND_LIST

              See the ENVIRONMENT VARIABLES section  for  a  more  detailed  description  of  the
              individual SLURM_MEM_BIND* variables.

              Supported options include:

              q[uiet]
                     quietly bind before task runs (default)

              v[erbose]
                     verbosely report binding before task runs

              no[ne] don't bind tasks to memory (default)

              rank   bind by task rank (not recommended)

              local  Use memory local to the processor in use

              map_mem:<list>
                     bind  by  mapping  a  node's  memory  to  tasks as specified where <list> is
                     <cpuid1>,<cpuid2>,...<cpuidN>.  CPU IDs are interpreted  as  decimal  values
                     unless  they  are  preceded  with  '0x'  in  which  case they interpreted as
                     hexadecimal values (not recommended)

              mask_mem:<list>
                     bind by  setting  memory  masks  on  tasks  as  specified  where  <list>  is
                     <mask1>,<mask2>,...<maskN>.    memory   masks   are  always  interpreted  as
                     hexadecimal values.  Note that masks must be preceded with a  '0x'  if  they
                     don't begin with [0-9] so they are seen as numerical values by srun.

              help   show this help message

       --mincpus=<n>
              Specify a minimum number of logical cpus/processors per node.

       --msg-timeout=<seconds>
              Modify  the job launch message timeout.  The default value is MessageTimeout in the
              Slurm  configuration  file  slurm.conf.   Changes  to  this   are   typically   not
              recommended, but could be useful to diagnose problems.

       --mpi=<mpi_type>
              Identify the type of MPI to be used. May result in unique initiation procedures.

              list   Lists available mpi types to choose from.

              lam    Initiates  one 'lamd' process per node and establishes necessary environment
                     variables for LAM/MPI.

              mpich1_shmem
                     Initiates  one  process  per  node  and  establishes  necessary  environment
                     variables for mpich1 shared memory model.  This also works for mvapich built
                     for shared memory.

              mpichgm
                     For use with Myrinet.

              mvapich
                     For use with Infiniband.

              openmpi
                     For use with OpenMPI.

              pmi2   To enable PMI2 support. The PMI2 support in Slurm  works  only  if  the  MPI
                     implementation supports it, in other words if the MPI has the PMI2 interface
                     implemented. The --mpi=pmi2  will  load  the  library  lib/slurm/mpi_pmi2.so
                     which  provides  the  server  side  functionality  but  the client side must
                     implement PMI2_Init() and the other interface calls.

              none   No special MPI processing. This is the default and  works  with  many  other
                     versions of MPI.

       --multi-prog
              Run  a  job  with different programs and different arguments for each task. In this
              case, the executable program specified is actually a configuration file  specifying
              the  executable  and  arguments  for  each task. See MULTIPLE PROGRAM CONFIGURATION
              below for details on the configuration file contents.

       -N, --nodes=<minnodes[-maxnodes]>
              Request that a minimum of minnodes nodes be allocated to this job.  A maximum  node
              count  may  also be specified with maxnodes.  If only one number is specified, this
              is used as both the minimum and maximum node count.  The  partition's  node  limits
              supersede  those  of  the  job.   If  a  job's node limits are outside of the range
              permitted for its associated partition, the job will be left in  a  PENDING  state.
              This  permits  possible  execution  at  a  later  time, when the partition limit is
              changed.  If a job node limit  exceeds  the  number  of  nodes  configured  in  the
              partition,   the  job  will  be  rejected.   Note  that  the  environment  variable
              SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility) will be  set  to
              the  count  of  nodes  actually allocated to the job. See the ENVIRONMENT VARIABLES
              section for more information.  If -N is not specified, the default behavior  is  to
              allocate  enough  nodes  to satisfy the requirements of the -n and -c options.  The
              job will be allocated as many nodes as possible  within  the  range  specified  and
              without  delaying  the  initiation  of  the  job.  The node count specification may
              include a numeric value followed by a suffix of "k" (multiplies  numeric  value  by
              1,024) or "m" (multiplies numeric value by 1,048,576).

       -n, --ntasks=<number>
              Specify the number of tasks to run. Request that srun allocate resources for ntasks
              tasks.  The default is one task per node, but note that the --cpus-per-task  option
              will change this default.

       --network=<type>
              Specify  information  pertaining  to  the switch or network.  The interpretation of
              type is system dependent.  This option is supported when running Slurm  on  a  Cray
              natively.  It is used to request using Network Performace Counters.  Only one value
              per request is valid.  All options are case in-sensitive.   In  this  configuration
              supported values include:

              system
                    Use  the  system-wide network performance counters. Only nodes requested will
                    be marked in use for the job allocation.  If the job does  not  fill  up  the
                    entire  system  the  rest  of the nodes are not able to be used by other jobs
                    using NPC, if idle their state will appear  as  PerfCnts.   These  nodes  are
                    still available for other jobs not using NPC.

              blade Use  the  blade  network  performance  counters. Only nodes requested will be
                    marked in use for the job allocation.  If the job does not fill up the entire
                    blade(s) allocated to the job those blade(s) are not able to be used by other
                    jobs using NPC, if idle their state will appear as PerfCnts.  These nodes are
                    still available for other jobs not using NPC.

              In all cases the job or step allocation request must specify the
              --exclusive option.  Otherwise the request will be denied.

              Also  with any of these options steps are not allowed to share blades, so resources
              would remain idle inside an allocation if the step running on a blade does not take
              up all the nodes on the blade.

              The  network  option  is  also supported on systems with IBM's Parallel Environment
              (PE).  See IBM's LoadLeveler job command keyword documentation  about  the  keyword
              "network"  for  more  information.   Multiple  values  may  be specified in a comma
              separated list.  All options are case in-sensitive.  Supported values include:

              BULK_XFER[=<resources>]
                          Enable bulk transfer of data using Remote Direct-Memory Access  (RDMA).
                          The  optional resources specification is a numeric value which can have
                          a suffix of "k", "K", "m", "M", "g" or "G" for kilobytes, megabytes  or
                          gigabytes.   NOTE:  The resources specification is not supported by the
                          underlying IBM infrastructure as of Parallel  Environment  version  2.2
                          and  no  value should be specified at this time.  The devices allocated
                          to a job must all be of the same type.  The default value depends  upon
                          depends  upon what hardware is available and in order of preferences is
                          IPONLY (which is not considered in User Space mode), HFI, IB, HPCE, and
                          KMUX.

              CAU=<count> Number  of  Collective Acceleration Units (CAU) required.  Applies only
                          to IBM Power7-IH processors.  Default value is zero.   Independent  CAU
                          will be allocated for each programming interface (MPI, LAPI, etc.)

              DEVNAME=<name>
                          Specify  the  device  name  to  use  for communications (e.g. "eth0" or
                          "mlx4_0").

              DEVTYPE=<type>
                          Specify the device type  to  use  for  communications.   The  supported
                          values   of   type  are:  "IB"  (InfiniBand),  "HFI"  (P7  Host  Fabric
                          Interface), "IPONLY" (IP-Only interfaces), "HPCE" (HPC  Ethernet),  and
                          "KMUX" (Kernel Emulation of HPCE).  The devices allocated to a job must
                          all be of the same type.  The default value depends upon  depends  upon
                          what hardware is available and in order of preferences is IPONLY (which
                          is not considered in User Space mode), HFI, IB, HPCE, and KMUX.

              IMMED =<count>
                          Number of immediate send slots per window required.   Applies  only  to
                          IBM Power7-IH processors.  Default value is zero.

              INSTANCES =<count>
                          Specify  number  of  network  connections for each task on each network
                          connection.  The default instance count is 1.

              IPV4        Use Internet Protocol (IP) version 4 communications (default).

              IPV6        Use Internet Protocol (IP) version 6 communications.

              LAPI        Use the LAPI programming interface.

              MPI         Use the MPI programming interface.  MPI is the default interface.

              PAMI        Use the PAMI programming interface.

              SHMEM       Use the OpenSHMEM programming interface.

              SN_ALL      Use all available switch networks (default).

              SN_SINGLE   Use one available switch network.

              UPC         Use the UPC programming interface.

              US          Use User Space communications.

              Some examples of network specifications:

              Instances=2,US,MPI,SN_ALL
                          Create two user space  connections  for  MPI  communications  on  every
                          switch network for each task.

              US,MPI,Instances=3,Devtype=IB
                          Create  three  user  space  connections for MPI communications on every
                          InfiniBand network for each task.

              IPV4,LAPI,SN_Single
                          Create a IP version 4 connection for LAPI communications on one  switch
                          network for each task.

              Instances=2,US,LAPI,MPI
                          Create  two user space connections each for LAPI and MPI communications
                          on every switch network for each task. Note that SN_ALL is the  default
                          option  so  every  switch  network  is used. Also note that Instances=2
                          specifies that two connections are established for each protocol  (LAPI
                          and  MPI)  and  each task.  If there are two networks and four tasks on
                          the node then a total of 32 connections are established (2 instances  x
                          2 protocols x 2 networks x 4 tasks).

       --nice[=adjustment]
              Run  the job with an adjusted scheduling priority within Slurm.  With no adjustment
              value the scheduling priority is decreased by 100. The  adjustment  range  is  from
              -10000  (highest  priority)  to  10000 (lowest priority). Only privileged users can
              specify  a  negative  adjustment.  NOTE:  This  option  is  presently  ignored   if
              SchedulerType=sched/wiki or SchedulerType=sched/wiki2.

       --ntasks-per-core=<ntasks>
              Request the maximum ntasks be invoked on each core.  This option applies to the job
              allocation, but not to step allocations.   Meant  to  be  used  with  the  --ntasks
              option.   Related to --ntasks-per-node except at the core level instead of the node
              level.  Masks will automatically be generated to bind the tasks  to  specific  core
              unless  --cpu_bind=none  is  specified.   NOTE: This option is not supported unless
              SelectTypeParameters=CR_Core or SelectTypeParameters=CR_Core_Memory is configured.

       --ntasks-per-node=<ntasks>
              Request that ntasks be invoked on each node.  If used with the --ntasks option, the
              --ntasks option will take precedence and the --ntasks-per-node will be treated as a
              maximum count of tasks per node.  Meant to be used with the --nodes  option.   This
              is  related  to --cpus-per-task=ncpus, but does not require knowledge of the actual
              number of cpus on each node.  In some cases, it is more convenient to  be  able  to
              request  that  no  more  than  a  specific number of tasks be invoked on each node.
              Examples of this include submitting a hybrid MPI/OpenMP  app  where  only  one  MPI
              "task/rank"  should  be  assigned to each node while allowing the OpenMP portion to
              utilize all of the  parallelism  present  in  the  node,  or  submitting  a  single
              setup/cleanup/monitoring  job to each node of a pre-existing allocation as one step
              in a larger job script.

       --ntasks-per-socket=<ntasks>
              Request the maximum ntasks be invoked on each socket.  This option applies  to  the
              job  allocation,  but  not to step allocations.  Meant to be used with the --ntasks
              option.  Related to --ntasks-per-node except at the socket  level  instead  of  the
              node  level.   Masks  will automatically be generated to bind the tasks to specific
              sockets unless --cpu_bind=none is specified.  NOTE: This option  is  not  supported
              unless  SelectTypeParameters=CR_Socket  or SelectTypeParameters=CR_Socket_Memory is
              configured.

       -O, --overcommit
              Overcommit resources.  When applied to job allocation, only one CPU is allocated to
              the  job per node and options used to specify the number of tasks per node, socket,
              core, etc.  are ignored.  When applied to job step allocations  (the  srun  command
              when executed within an existing job allocation), this option can be used to launch
              more than one task per CPU.  Normally, srun will not allocate more than one process
              per  CPU.   By  specifying  --overcommit  you are explicitly allowing more than one
              process per CPU. However no more than MAX_TASKS_PER_NODE  tasks  are  permitted  to
              execute  per  node.  NOTE: MAX_TASKS_PER_NODE is defined in the file slurm.h and is
              not a variable, it is set at Slurm build time.

       -o, --output=<mode>
              Specify the mode for stdout redirection.  By  default  in  interactive  mode,  srun
              collects  stdout  from  all  tasks and sends this output via TCP/IP to the attached
              terminal. With --output stdout may be redirected to a file, to one file  per  task,
              or  to  /dev/null.  See section IO Redirection below for the various forms of mode.
              If the specified file already exists, it will be overwritten.

              If --error is not also specified on the command line, both stdout and  stderr  will
              directed to the file specified by --output.

       --open-mode=<append|truncate>
              Open  the  output  and error files using append or truncate mode as specified.  The
              default value is specified by the system configuration parameter JobFileAppend.

       -p, --partition=<partition_names>
              Request a specific partition for the resource allocation.  If  not  specified,  the
              default  behavior  is to allow the slurm controller to select the default partition
              as designated by the system administrator.  If  the  job  can  use  more  than  one
              partition,  specify  their  names  in  a  comma  separate list and the one offering
              earliest initiation will be used  with  no  regard  given  to  the  partition  name
              ordering  (although higher priority partitions will be considered first).  When the
              job is initiated, the name of the partition used will be placed first  in  the  job
              record partition string.

       --power=<flags>
              Comma separated list of power management plugin options.  Currently available flags
              include: level (all nodes allocated to the job should have  identical  power  caps,
              may be disabled by the Slurm configuration option PowerParameters=job_no_level).

       --priority=<value>
              Request  a  specific  job  priority.   May  be  subject  to  configuration specific
              constraints.  Only Slurm operators and administrators can set  the  priority  of  a
              job.

       --profile=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
              enables  detailed data collection by the acct_gather_profile plugin.  Detailed data
              are typically time-series that are stored in an HDF5 file for the job.

              All       All data types are collected. (Cannot be combined with other values.)

              None      No data types are collected. This is the default.
                         (Cannot be combined with other values.)

              Energy    Energy data is collected.

              Task      Task (I/O, Memory, ...) data is collected.

              Filesystem
                        Filesystem data is collected.

              Network   Network (InfiniBand) data is collected.

       --prolog=<executable>
              srun will run executable just before launching the  job  step.   The  command  line
              arguments  for  executable  will  be the command and arguments of the job step.  If
              executable is "none", then no srun prolog will be run. This parameter overrides the
              SrunProlog  parameter  in slurm.conf. This parameter is completely independent from
              the Prolog parameter in slurm.conf.

       --propagate[=rlimits]
              Allows users to specify which of the modifiable (soft) resource limits to propagate
              to  the  compute  nodes and apply to their jobs.  If rlimits is not specified, then
              all resource limits will be propagated.  The following rlimit names  are  supported
              by Slurm (although some options may not be supported on some systems):

              ALL       All limits listed below

              AS        The maximum address space for a process

              CORE      The maximum size of core file

              CPU       The maximum amount of CPU time

              DATA      The maximum size of a process's data segment

              FSIZE     The  maximum  size  of files created. Note that if the user sets FSIZE to
                        less than the current size of the slurmd.log, job launches will fail with
                        a 'File size limit exceeded' error.

              MEMLOCK   The maximum size that may be locked into memory

              NOFILE    The maximum number of open files

              NPROC     The maximum number of processes available

              RSS       The maximum resident set size

              STACK     The maximum stack size

       --pty  Execute   task  zero  in  pseudo  terminal  mode.   Implicitly  sets  --unbuffered.
              Implicitly sets --error and --output to /dev/null for all tasks except  task  zero,
              which  may  cause  those tasks to exit immediately (e.g. shells will typically exit
              immediately in that situation).  Not currently supported on AIX platforms.

       -Q, --quiet
              Suppress informational messages from srun. Errors will still be displayed.

       -q, --quit-on-interrupt
              Quit immediately on single SIGINT (Ctrl-C). Use of this option disables the  status
              feature  normally  available  when srun receives a single Ctrl-C and causes srun to
              instead immediately terminate the running job.

       --qos=<qos>
              Request a quality of service for the job.  QOS  values  can  be  defined  for  each
              user/cluster/account  association  in the Slurm database.  Users will be limited to
              their association's defined set of qos's when the  Slurm  configuration  parameter,
              AccountingStorageEnforce, includes "qos" in it's definition.

       -r, --relative=<n>
              Run  a  job  step relative to node n of the current allocation.  This option may be
              used to spread several job steps out among the nodes of the current job. If  -r  is
              used,  the  current  job step will begin at node n of the allocated nodelist, where
              the first node is considered node 0.  The -r option is not permitted with -w or  -x
              option  and will result in a fatal error when not running within a prior allocation
              (i.e. when SLURM_JOB_ID is not set). The default for  n  is  0.  If  the  value  of
              --nodes  exceeds  the  number  of  nodes  identified  with the --relative option, a
              warning message will be printed and the --relative option will take precedence.

       --reboot
              Force the allocated nodes  to  reboot  before  starting  the  job.   This  is  only
              supported with some system configurations and will otherwise be silently ignored.

       --resv-ports
              Reserve communication ports for this job. Users can specify the number of port they
              want to reserve. The parameter MpiParams=ports=12000-12999  must  be  specified  in
              slurm.conf.  If  not  specified  the  default  reserve number of ports equal to the
              number of tasks. If the number of reserved ports is  zero  no  ports  is  reserved.
              Used for OpenMPI.

       --reservation=<name>
              Allocate resources for the job from the named reservation.

       --restart-dir=<directory>
              Specifies  the directory from which the job or job step's checkpoint should be read
              (used by the checkpoint/blcrm and checkpoint/xlch plugins only).

       -s, --share
              The job allocation can share resources with other running jobs.  The  resources  to
              be   shared   can   be  nodes,  sockets,  cores,  or  hyperthreads  depending  upon
              configuration.  The default shared behavior depends on system configuration and the
              partition's  Shared option takes precedence over the job's option.  This option may
              result in the allocation being granted sooner than if the --share  option  was  not
              set  and  allow  higher system utilization, but application performance will likely
              suffer due to competition for resources.  Also see the --exclusive option.

       -S, --core-spec=<num>
              Count of specialized cores per node reserved by the job for system  operations  and
              not  used by the application. The application will not use these cores, but will be
              charged  for  their  allocation.   Default  value  is  dependent  upon  the  node's
              configured  CoreSpecCount  value.   If  a value of zero is designated and the Slurm
              configuration option AllowSpecResourcesUsage is enabled, the job will be allowed to
              override  CoreSpecCount and use the specialized resources on nodes it is allocated.
              This option can not be used with the --thread-spec option.

       --sicp Identify a job as one which jobs submitted to other clusters can be dependent upon.

       --signal=<sig_num>[@<sig_time>]
              When a job is within sig_time seconds of its end time, send it the signal  sig_num.
              Due  to  the resolution of event handling by Slurm, the signal may be sent up to 60
              seconds earlier than specified.  sig_num may either be  a  signal  number  or  name
              (e.g.  "10"  or  "USR1").  sig_time must have an integer value between 0 and 65535.
              By default, no signal is sent before the job's end time.  If a sig_num is specified
              without any sig_time, the default time will be 60 seconds.

       --slurmd-debug=<level>
              Specify  a  debug level for slurmd(8). The level may be specified either an integer
              value between 0 [quiet, only errors are displayed] and 4 [verbose operation] or the
              SlurmdDebug tags.

              quiet     Log nothing

              fatal     Log only fatal errors

              error     Log only errors

              info      Log errors and general informational messages

              verbose   Log errors and verbose informational messages

              The slurmd debug information is copied onto the stderr of
              the job. By default only errors are displayed.

       --sockets-per-node=<sockets>
              Restrict  node  selection  to  nodes with at least the specified number of sockets.
              See additional information under -B  option  above  when  task/affinity  plugin  is
              enabled.

       --switches=<count>[@<max-time>]
              When  a  tree  topology is used, this defines the maximum count of switches desired
              for the job allocation and optionally the maximum time to wait for that  number  of
              switches.  If  Slurm  finds  an  allocation containing more switches than the count
              specified, the job remains pending until it either finds an allocation with desired
              switch  count  or the time limit expires.  It there is no switch count limit, there
              is no delay in starting  the  job.   Acceptable  time  formats  include  "minutes",
              "minutes:seconds",  "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
              "days-hours:minutes:seconds".  The job's maximum time delay may be limited  by  the
              system administrator using the SchedulerParameters configuration parameter with the
              max_switch_wait parameter option.  The  default  max-time  is  the  max_switch_wait
              SchedulerParameters.

       -T, --threads=<nthreads>
              Allows  limiting the number of concurrent threads used to send the job request from
              the srun process to the slurmd processes on the allocated nodes. Default is to  use
              one  thread per allocated node up to a maximum of 60 concurrent threads. Specifying
              this option limits the number of concurrent threads to nthreads (less than or equal
              to  60).   This  should  only be used to set a low thread count for testing on very
              small memory computers.

       -t, --time=<time>
              Set a limit on the total run time of the job allocation.   If  the  requested  time
              limit  exceeds  the partition's time limit, the job will be left in a PENDING state
              (possibly indefinitely).  The default time limit is the  partition's  default  time
              limit.   When the time limit is reached, each task in each job step is sent SIGTERM
              followed by SIGKILL.  The interval  between  signals  is  specified  by  the  Slurm
              configuration  parameter  KillWait.   The OverTimeLimit configuration parameter may
              permit the job to run longer than scheduled.  Time resolution  is  one  minute  and
              second values are rounded up to the next minute.

              A  time  limit  of  zero  requests  that no time limit be imposed.  Acceptable time
              formats    include    "minutes",    "minutes:seconds",     "hours:minutes:seconds",
              "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

       --task-epilog=<executable>
              The  slurmstepd  daemon  will  run executable just after each task terminates. This
              will be executed before any TaskEpilog parameter in slurm.conf is executed. This is
              meant  to  be  a  very  short-lived  program. If it fails to terminate within a few
              seconds, it will be killed along with any descendant processes.

       --task-prolog=<executable>
              The slurmstepd daemon will run executable just before  launching  each  task.  This
              will be executed after any TaskProlog parameter in slurm.conf is executed.  Besides
              the normal environment variables, this has SLURM_TASK_PID available to identify the
              process  ID  of  the  task being started.  Standard output from this program of the
              form "export NAME=value" will be used to set environment  variables  for  the  task
              being spawned.

       --test-only
              Returns  an  estimate of when a job would be scheduled to run given the current job
              queue and all the other srun arguments specifying  the  job.   This  limits  srun's
              behavior  to  just return information; no job is actually submitted.  EXCEPTION: On
              Bluegene/Q systems on when running within an existing job allocation, this disables
              the  use  of "runjob" to launch tasks. The program will be executed directly by the
              slurmd daemon.

       --thread-spec=<num>
              Count of specialized threads per node reserved by the job for system operations and
              not  used  by the application. The application will not use these threads, but will
              be charged for their allocation.  This option can not be used with the  --core-spec
              option.

       --threads-per-core=<threads>
              Restrict  node selection to nodes with at least the specified number of threads per
              core.  NOTE: "Threads" refers to the number of processing units on each core rather
              than  the  number  of  application  tasks  to be launched per core.  See additional
              information under -B option above when task/affinity plugin is enabled.

       --time-min=<time>
              Set a minimum time limit on the job allocation.  If specified,  the  job  may  have
              it's  --time  limit lowered to a value no lower than --time-min if doing so permits
              the job to begin execution earlier than otherwise possible.  The job's  time  limit
              will  not  be changed after the job is allocated resources.  This is performed by a
              backfill scheduling algorithm to allocate resources otherwise reserved  for  higher
              priority  jobs.   Acceptable  time  formats  include  "minutes", "minutes:seconds",
              "hours:minutes:seconds",       "days-hours",        "days-hours:minutes"        and
              "days-hours:minutes:seconds".

       --tmp=<MB>
              Specify a minimum amount of temporary disk space.

       -u, --unbuffered
              By  default  the connection between slurmstepd and the user launched application is
              over a pipe. The stdio output written by the application is buffered by  the  glibc
              until  it  is  flushed  or the output is set as unbuffered.  See setbuf(3). If this
              option is specified the tasks are executed with  a  pseudo  terminal  so  that  the
              application output is unbuffered.

       --usage
              Display brief help message and exit.

       --uid=<user>
              Attempt  to  submit  and/or  run a job as user instead of the invoking user id. The
              invoking user's credentials will be used to check access permissions for the target
              partition. User root may use this option to run jobs as a normal user in a RootOnly
              partition for example. If run as root, srun will drop its permissions  to  the  uid
              specified  after  node  allocation  is  successful.  user  may  be the user name or
              numerical user ID.

       -V, --version
              Display version information and exit.

       -v, --verbose
              Increase the verbosity  of  srun's  informational  messages.   Multiple  -v's  will
              further increase srun's verbosity.  By default only errors will be displayed.

       -W, --wait=<seconds>
              Specify  how  long  to  wait after the first task terminates before terminating all
              remaining tasks. A value of 0 indicates an unlimited wait (a warning will be issued
              after  60 seconds). The default value is set by the WaitTime parameter in the slurm
              configuration file (see slurm.conf(5)). This option can be useful to insure that  a
              job is terminated in a timely fashion in the event that one or more tasks terminate
              prematurely.  Note: The -K, --kill-on-bad-exit option  takes  precedence  over  -W,
              --wait to terminate the job immediately if a task exits with a non-zero exit code.

       -w, --nodelist=<host1,host2,... or filename>
              Request  a  specific  list  of  hosts.  The job will contain all of these hosts and
              possibly additional hosts as needed to satisfy resource requirements.  The list may
              be  specified as a comma-separated list of hosts, a range of hosts (host[1-5,7,...]
              for example), or a filename.  The host list will be assumed to be a filename if  it
              contains  a "/" character.  If you specify a minimum node or processor count larger
              than can be satisfied by the supplied  host  list,  additional  resources  will  be
              allocated  on  other  nodes  as needed.  Rather than repeating a host name multiple
              times, an asterisk and a repetition count may be  appended  to  a  host  name.  For
              example "host1,host1" and "host1*2" are equivalent.

       --wckey=<wckey>
              Specify  wckey  to  be used with job.  If TrackWCKey=no (default) in the slurm.conf
              this value is ignored.

       -X, --disable-status
              Disable the display of task status when srun receives  a  single  SIGINT  (Ctrl-C).
              Instead  immediately  forward the SIGINT to the running job.  Without this option a
              second Ctrl-C in one second is required to forcibly terminate the job and srun will
              immediately    exit.    May    also   be   set   via   the   environment   variable
              SLURM_DISABLE_STATUS.

       -x, --exclude=<host1,host2,... or filename>
              Request that a specific list of hosts not be included in the resources allocated to
              this  job.  The  host  list  will  be  assumed  to  be  a filename if it contains a
              "/"character.

       -Z, --no-allocate
              Run the specified tasks on a set of nodes without creating a  Slurm  "job"  in  the
              Slurm  queue structure, bypassing the normal resource allocation step.  The list of
              nodes must be specified with the -w,  --nodelist  option.   This  is  a  privileged
              option only available for the users "SlurmUser" and "root".

       The following options support Blue Gene systems, but may be applicable to other systems as
       well.

       --blrts-image=<path>
              Path to blrts image for bluegene block.  BGL only.  Default  from  blugene.conf  if
              not set.

       --cnload-image=<path>
              Path   to  compute  node  image  for  bluegene  block.   BGP  only.   Default  from
              blugene.conf if not set.

       --conn-type=<type>
              Require the block connection type to be of  a  certain  type.   On  Blue  Gene  the
              acceptable of type are MESH, TORUS and NAV.  If NAV, or if not set, then Slurm will
              try to fit a what the DefaultConnType is set to in the bluegene.conf if that  isn't
              set  the default is TORUS.  You should not normally set this option.  If running on
              a BGP system and wanting to run in HTC mode (only for 1 midplane and  below).   You
              can  use  HTC_S for SMP, HTC_D for Dual, HTC_V for virtual node mode, and HTC_L for
              Linux mode.  For systems that allow a different connection type per  dimension  you
              can  supply  a  comma  separated list of connection types may be specified, one for
              each dimension (i.e. M,T,T,T will give you a torus  connection  is  all  dimensions
              expect the first).

       -g, --geometry=<XxYxZ> | <AxXxYxZ>
              Specify the geometry requirements for the job. On BlueGene/L and BlueGene/P systems
              there are three numbers giving dimensions in the X, Y and Z  directions,  while  on
              BlueGene/Q  systems  there  are four numbers giving dimensions in the A, X, Y and Z
              directions  and  can  not  be   used   to   allocate   sub-blocks.    For   example
              "--geometry=1x2x3x4",  specifies  a  block of nodes having 1 x 2 x 3 x 4 = 24 nodes
              (actually midplanes on BlueGene).

       --ioload-image=<path>
              Path to io image for bluegene block.  BGP only.  Default from blugene.conf  if  not
              set.

       --linux-image=<path>
              Path  to  linux  image for bluegene block.  BGL only.  Default from blugene.conf if
              not set.

       --mloader-image=<path>
              Path to mloader image for bluegene block.  Default from blugene.conf if not set.

       -R, --no-rotate
              Disables rotation of the job's requested geometry in order to  fit  an  appropriate
              block.  By default the specified geometry can rotate in three dimensions.

       --ramdisk-image=<path>
              Path  to ramdisk image for bluegene block.  BGL only.  Default from blugene.conf if
              not set.

       srun will submit the job request to the slurm job controller, then initiate all  processes
       on  the  remote nodes. If the request cannot be met immediately, srun will block until the
       resources are free to run the job. If the -I (--immediate) option is specified  srun  will
       terminate if resources are not immediately available.

       When initiating remote processes srun will propagate the current working directory, unless
       --chdir=<path> is specified, in which case path will become the working directory for  the
       remote processes.

       The  -n, -c, and -N options control how CPUs  and nodes will be allocated to the job. When
       specifying only the number of processes to run with -n, a default of one CPU  per  process
       is  allocated.  By specifying the number of CPUs required per task (-c), more than one CPU
       may be allocated per process. If the number of nodes  is  specified  with  -N,  srun  will
       attempt to allocate at least the number of nodes specified.

       Combinations  of  the  above  three  options  may  be  used  to  change  how processes are
       distributed across nodes and  cpus.  For  instance,  by  specifying  both  the  number  of
       processes  and  number  of  nodes  on  which  to  run, the number of processes per node is
       implied. However, if the number of CPUs per process  is  more  important  then  number  of
       processes (-n) and the number of CPUs per process (-c) should be specified.

       srun  will  refuse  to  allocate more than one process per CPU unless --overcommit (-O) is
       also specified.

       srun will attempt to meet the above specifications "at a minimum." That is,  if  16  nodes
       are requested for 32 processes, and some nodes do not have 2 CPUs, the allocation of nodes
       will be increased in order to meet the demand for CPUs. In other words, a  minimum  of  16
       nodes  are being requested. However, if 16 nodes are requested for 15 processes, srun will
       consider this an error, as 15 processes cannot run across 16 nodes.

       IO Redirection

       By default, stdout and stderr will be redirected from all tasks to the stdout  and  stderr
       of srun, and stdin will be redirected from the standard input of srun to all remote tasks.
       If stdin is only to be read by a subset of the spawned tasks, specifying a  file  to  read
       from  rather  than  forwarding  stdin from the srun command may be preferable as it avoids
       moving and storing data that will never be read.

       For OS X, the poll() function does not support stdin, so input  from  a  terminal  is  not
       possible.

       For BGQ srun only supports stdin to 1 task running on the system.  By default it is taskid
       0   but   can   be   changed   with   the    -i<taskid>    as    described    below,    or
       --launcher-opts="--stdinrank=<taskid>".

       This behavior may be changed with the --output, --error, and --input (-o, -e, -i) options.
       Valid format specifications for these options are

       all       stdout stderr is redirected from all tasks to srun.  stdin is broadcast  to  all
                 remote tasks.  (This is the default behavior)

       none      stdout  and stderr is not received from any task.  stdin is not sent to any task
                 (stdin is closed).

       taskid    stdout and/or stderr are redirected from only the task with relative id equal to
                 taskid,  where  0 <= taskid <= ntasks, where ntasks is the total number of tasks
                 in the current job step.  stdin is redirected from the stdin  of  srun  to  this
                 same task.  This file will be written on the node executing the task.

       filename  srun will redirect stdout and/or stderr to the named file from all tasks.  stdin
                 will be redirected from the named file and broadcast to all tasks  in  the  job.
                 filename  refers  to  a  path  on  the  host  that  runs srun.  Depending on the
                 cluster's file system layout,  this  may  result  in  the  output  appearing  in
                 different places depending on whether the job is run in batch mode.

       format string
                 srun  allows  for  a  format  string  to  be  used to generate the named IO file
                 described above. The following list of format specifiers  may  be  used  in  the
                 format  string  to  generate  a  filename  that will be unique to a given jobid,
                 stepid, node, or task. In each case, the appropriate number of files are  opened
                 and  associated  with  the  corresponding  tasks.  Note  that  any format string
                 containing %t, %n, and/or %N will be written on  the  node  executing  the  task
                 rather  than  the  node  where  srun  executes,  these format specifiers are not
                 supported on a BGQ system.

                 %A     Job array's master job allocation number.

                 %a     Job array ID (index) number.

                 %J     jobid.stepid of the running job. (e.g. "128.0")

                 %j     jobid of the running job.

                 %s     stepid of the running job.

                 %N     short hostname. This will create a separate IO file per node.

                 %n     Node identifier relative to current job (e.g. "0" is the  first  node  of
                        the running job) This will create a separate IO file per node.

                 %t     task  identifier  (rank)  relative  to  current  job.  This will create a
                        separate IO file per task.

                 %u     User name.

                 A number placed between the percent character and format specifier may  be  used
                 to  zero-pad the result in the IO filename. This number is ignored if the format
                 specifier corresponds to  non-numeric data (%N for example).

                 Some examples of how the format string may be used for a 4 task job step with  a
                 Job ID of 128 and step id of 0 are included below:

                 job%J.out      job128.0.out

                 job%4j.out     job0128.out

                 job%j-%2t.out  job128-00.out, job128-01.out, ...

INPUT ENVIRONMENT VARIABLES

       Some  srun  options  may  be  set via environment variables.  These environment variables,
       along with their corresponding options, are listed below.  Note: Command line options will
       always override these settings.

       PMI_FANOUT            This is used exclusively with PMI (MPICH2 and MVAPICH2) and controls
                             the fanout of data communications. The srun command  sends  messages
                             to application programs (via the PMI library) and those applications
                             may be called upon to forward that data to  up  to  this  number  of
                             additional  tasks.  Higher values offload work from the srun command
                             to  the  applications  and  likely  increase  the  vulnerability  to
                             failures.  The default value is 32.

       PMI_FANOUT_OFF_HOST   This is used exclusively with PMI (MPICH2 and MVAPICH2) and controls
                             the fanout of data communications.  The srun command sends  messages
                             to application programs (via the PMI library) and those applications
                             may be called upon to forward that  data  to  additional  tasks.  By
                             default,  srun  sends one message per host and one task on that host
                             forwards the data to other tasks on that host up to PMI_FANOUT.   If
                             PMI_FANOUT_OFF_HOST  is  defined,  the  user task may be required to
                             forward   the   data   to   tasks   on   other    hosts.     Setting
                             PMI_FANOUT_OFF_HOST  may  increase  performance.  Since more work is
                             performed by  the  PMI  library  loaded  by  the  user  application,
                             failures also can be more common and more difficult to diagnose.

       PMI_TIME              This is used exclusively with PMI (MPICH2 and MVAPICH2) and controls
                             how much the communications from the tasks to the  srun  are  spread
                             out  in  time  in  order to avoid overwhelming the srun command with
                             work.  The  default  value  is  500  (microseconds)  per  task.   On
                             relatively  slow  processors  or  systems  with very large processor
                             counts (and large PMI data sets), higher values may be required.

       SLURM_CONF            The location of the Slurm configuration file.

       SLURM_ACCOUNT         Same as -A, --account

       SLURM_ACCTG_FREQ      Same as --acctg-freq

       SLURM_BCAST           Same as --bcast

       SLURM_BLRTS_IMAGE     Same as --blrts-image

       SLURM_BURST_BUFFER    Same as --bb

       SLURM_CHECKPOINT      Same as --checkpoint

       SLURM_CHECKPOINT_DIR  Same as --checkpoint-dir

       SLURM_CNLOAD_IMAGE    Same as --cnload-image

       SLURM_CONN_TYPE       Same as --conn-type

       SLURM_CORE_SPEC       Same as --core-spec

       SLURM_CPU_BIND        Same as --cpu_bind

       SLURM_CPU_FREQ_REQ    Same as --cpu-freq.

       SLURM_CPUS_PER_TASK   Same as -c, --cpus-per-task

       SLURM_DEBUG           Same as -v, --verbose

       SlurmD_DEBUG          Same as -d, --slurmd-debug

       SLURM_DEPENDENCY      -P, --dependency=<jobid>

       SLURM_DISABLE_STATUS  Same as -X, --disable-status

       SLURM_DIST_PLANESIZE  Same as -m plane

       SLURM_DISTRIBUTION    Same as -m, --distribution

       SLURM_EPILOG          Same as --epilog

       SLURM_EXCLUSIVE       Same as --exclusive

       SLURM_EXIT_ERROR      Specifies the exit code generated when a Slurm  error  occurs  (e.g.
                             invalid  options).   This  can  be  used  by a script to distinguish
                             application exit codes from various Slurm  error  conditions.   Also
                             see SLURM_EXIT_IMMEDIATE.

       SLURM_EXIT_IMMEDIATE  Specifies  the  exit  code  generated when the --immediate option is
                             used and resources are not currently available.  This can be used by
                             a  script  to  distinguish application exit codes from various Slurm
                             error conditions.  Also see SLURM_EXIT_ERROR.

       SLURM_GEOMETRY        Same as -g, --geometry

       SLURM_HINT            Same as --hint

       SLURM_GRES            Same as --gres. Also see SLURM_STEP_GRES

       SLURM_IMMEDIATE       Same as -I, --immediate

       SLURM_IOLOAD_IMAGE    Same as --ioload-image

       SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
                             Same as --jobid

       SLURM_JOB_NAME        Same as -J, --job-name except  within  an  existing  allocation,  in
                             which  case it is ignored to avoid using the batch job's name as the
                             name of each job step.

       SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
                             Total number of nodes in the job’s resource allocation.

       SLURM_KILL_BAD_EXIT   Same as -K, --kill-on-bad-exit

       SLURM_LABELIO         Same as -l, --label

       SLURM_LINUX_IMAGE     Same as --linux-image

       SLURM_MEM_BIND        Same as --mem_bind

       SLURM_MEM_PER_CPU     Same as --mem-per-cpu

       SLURM_MEM_PER_NODE    Same as --mem

       SLURM_MLOADER_IMAGE   Same as --mloader-image

       SLURM_MPI_TYPE        Same as --mpi

       SLURM_NETWORK         Same as --network

       SLURM_NNODES          Same as -N, --nodes

       SLURM_NO_ROTATE       Same as -R, --no-rotate

       SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
                             Same as -n, --ntasks

       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core

       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node

       SLURM_NTASKS_PER_SOCKET
                             Same as --ntasks-per-socket

       SLURM_OPEN_MODE       Same as --open-mode

       SLURM_OVERCOMMIT      Same as -O, --overcommit

       SLURM_PARTITION       Same as -p, --partition

       SLURM_PMI_KVS_NO_DUP_KEYS
                             If set, then PMI key-pairs will contain no duplicate keys.  MPI  can
                             use  this  variable  to  inform the PMI library that it will not use
                             duplicate keys so PMI can skip the check for duplicate  keys.   This
                             is  the  case  for  MPICH2  and  reduces  overhead  in  testing  for
                             duplicates for improved performance

       SLURM_POWER           Same as --power

       SLURM_PROFILE         Same as --profile

       SLURM_PROLOG          Same as --prolog

       SLURM_QOS             Same as --qos

       SLURM_RAMDISK_IMAGE   Same as --ramdisk-image

       SLURM_REMOTE_CWD      Same as -D, --chdir=

       SLURM_REQ_SWITCH      When a tree topology is used, this  defines  the  maximum  count  of
                             switches  desired  for the job allocation and optionally the maximum
                             time to wait for that number of switches. See --switches

       SLURM_RESERVATION     Same as --reservation

       SLURM_RESTART_DIR     Same as --restart-dir

       SLURM_RESV_PORTS      Same as --resv-ports

       SLURM_SICP            Same as --sicp

       SLURM_SIGNAL          Same as --signal

       SLURM_STDERRMODE      Same as -e, --error

       SLURM_STDINMODE       Same as -i, --input

       SLURM_SRUN_REDUCE_TASK_EXIT_MSG
                             if set and non-zero, successive task exit  messages  with  the  same
                             exit code will be printed only once.

       SLURM_STEP_GRES       Same  as --gres (only applies to job steps, not to job allocations).
                             Also see SLURM_GRES

       SLURM_STEP_KILLED_MSG_NODE_ID=ID
                             If set, only the specified node will log when the job  or  step  are
                             killed by a signal.

       SLURM_STDOUTMODE      Same as -o, --output

       SLURM_TASK_EPILOG     Same as --task-epilog

       SLURM_TASK_PROLOG     Same as --task-prolog

       SLURM_TEST_EXEC       if  defined,  then verify existence of the executable program on the
                             local computer before attempting to launch it on compute nodes.

       SLURM_THREAD_SPEC     Same as --thread-spec

       SLURM_THREADS         Same as -T, --threads

       SLURM_TIMELIMIT       Same as -t, --time

       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered

       SLURM_WAIT            Same as -W, --wait

       SLURM_WAIT4SWITCH     Max time waiting for requested switches. See --switches

       SLURM_WCKEY           Same as -W, --wckey

       SLURM_WORKING_DIR     -D, --chdir

OUTPUT ENVIRONMENT VARIABLES

       srun will set some environment variables in the environment of the executing tasks on  the
       remote compute nodes.  These environment variables are:

       SLURM_CHECKPOINT_IMAGE_DIR
                             Directory   into  which  checkpoint  images  should  be  written  if
                             specified on the execute line.

       SLURM_CLUSTER_NAME    Name of the cluster on which the job is executing.

       SLURM_CPU_BIND_VERBOSE
                             --cpu_bind verbosity (quiet,verbose).

       SLURM_CPU_BIND_TYPE   --cpu_bind type (none,rank,map_cpu:,mask_cpu:).

       SLURM_CPU_BIND_LIST   --cpu_bind map or mask list (list of Slurm CPU IDs or masks for this
                             node,   CPU_ID   =   Board_ID  x  threads_per_board  +  Socket_ID  x
                             threads_per_socket + Core_ID x threads_per_core + Thread_ID).

       SLURM_CPU_FREQ_REQ    Contains the value requested for cpu frequency on the  srun  command
                             as  a  numerical  frequency  in  kilohertz,  or  a coded value for a
                             request of low, medium,highm1 or high for the  frequency.   See  the
                             description of the --cpu-freq option or the SLURM_CPU_FREQ_REQ input
                             environment variable.

       SLURM_CPUS_ON_NODE    Count of processors available to the job on  this  node.   Note  the
                             select/linear  plugin  allocates  entire nodes to jobs, so the value
                             indicates  the  total  count  of  CPUs  on  the   node.    For   the
                             select/cons_res plugin, this number indicates the number of cores on
                             this node allocated to the job.

       SLURM_CPUS_PER_TASK   Number of cpus requested per task.  Only set if the  --cpus-per-task
                             option is specified.

       SLURM_DISTRIBUTION    Distribution  type for the allocated jobs. Set the distribution with
                             -m, --distribution.

       SLURM_GTIDS           Global task IDs  running  on  this  node.   Zero  origin  and  comma
                             separated.

       SLURM_JOB_CPUS_PER_NODE
                             Number of CPUS per node.

       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.

       SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
                             Job id of the executing job.

       SLURM_JOB_NAME        Set  to  the value of the --job-name option or the command name when
                             srun is used to create a new job allocation. Not set  when  srun  is
                             used  only  to  create  a  job  step  (i.e.  within  an existing job
                             allocation).

       SLURM_JOB_PARTITION   Name of the partition in which the job is running.

       SLURM_LAUNCH_NODE_IPADDR
                             IP address of the node from which  the  task  launch  was  initiated
                             (where the srun command ran from).

       SLURM_LOCALID         Node local task ID for the process within a job.

       SLURM_MEM_BIND_VERBOSE
                             --mem_bind verbosity (quiet,verbose).

       SLURM_MEM_BIND_TYPE   --mem_bind type (none,rank,map_mem:,mask_mem:).

       SLURM_MEM_BIND_LIST   --mem_bind map or mask list (<list of IDs or masks for this node>).

       SLURM_NNODES          Total number of nodes in the job's resource allocation.

       SLURM_NODE_ALIASES    Sets  of  node  name,  communication  address and hostname for nodes
                             allocated to the job from the cloud. Each  element  in  the  set  if
                             colon separated and each set is comma separated. For example:
                             SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar

       SLURM_NODEID          The relative node ID of the current node.

       SLURM_NODELIST        List of nodes allocated to the job.

       SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
                             Total number of processes in the current job.

       SLURM_PRIO_PROCESS    The scheduling priority (nice value) at the time of job  submission.
                             This value is propagated to the spawned processes.

       SLURM_PROCID          The MPI rank (or relative process ID) of the current process.

       SLURM_SRUN_COMM_HOST  IP address of srun communication host.

       SLURM_SRUN_COMM_PORT  srun communication port.

       SLURM_STEP_LAUNCHER_PORT
                             Step launcher port.

       SLURM_STEP_NODELIST   List of nodes allocated to the step.

       SLURM_STEP_NUM_NODES  Number of nodes allocated to the step.

       SLURM_STEP_NUM_TASKS  Number of processes in the step.

       SLURM_STEP_TASKS_PER_NODE
                             Number of processes per node within the step.

       SLURM_STEP_ID (and SLURM_STEPID for backwards compatibility)
                             The step ID of the current job.

       SLURM_SUBMIT_DIR      The directory from which srun was invoked.

       SLURM_SUBMIT_HOST     The hostname of the computer from which salloc was invoked.

       SLURM_TASK_PID        The process ID of the task being started.

       SLURM_TASKS_PER_NODE  Number  of  tasks  to  be  initiated  on each node. Values are comma
                             separated and in the same order as SLURM_NODELIST.  If two  or  more
                             consecutive  nodes  are  to  have the same task count, that count is
                             followed by "(x#)" where "#" is the repetition count.  For  example,
                             "SLURM_TASKS_PER_NODE=2(x3),1"  indicates that the first three nodes
                             will each execute three tasks and the fourth node will  execute  one
                             task.

       SLURM_TOPOLOGY_ADDR   This  is  set  only  if  the  system  has  the  topology/tree plugin
                             configured.  The value will be set to  the  names  network  switches
                             which  may be involved in the job's communications from the system's
                             top level switch down to the leaf switch and ending with node  name.
                             A period is used to separate each hardware component name.

       SLURM_TOPOLOGY_ADDR_PATTERN
                             This  is  set  only  if  the  system  has  the  topology/tree plugin
                             configured.  The  value  will  be  set  component  types  listed  in
                             SLURM_TOPOLOGY_ADDR.   Each  component  will be identified as either
                             "switch" or "node".  A period is  used  to  separate  each  hardware
                             component type.

       SRUN_DEBUG            Set  to  the  logging level of the srun command.  Default value is 3
                             (info level).  The value is incremented or  decremented  based  upon
                             the --verbose and --quiet options.

       MPIRUN_NOALLOCATE     Do not allocate a block on Blue Gene systems only.

       MPIRUN_NOFREE         Do not free a block on Blue Gene systems only.

       MPIRUN_PARTITION      The block name on Blue Gene systems only.

SIGNALS AND ESCAPE SEQUENCES

       Signals  sent  to  the  srun  command  are  automatically  forwarded  to  the  tasks it is
       controlling with a few exceptions. The escape sequence <control-c> will report  the  state
       of  all tasks associated with the srun command. If <control-c> is entered twice within one
       second, then the associated SIGINT signal will be sent to  all  tasks  and  a  termination
       sequence will be entered sending SIGCONT, SIGTERM, and SIGKILL to all spawned tasks.  If a
       third <control-c> is received, the srun program will be  terminated  without  waiting  for
       remote tasks to exit or their I/O to complete.

       The  escape sequence <control-z> is presently ignored. Our intent is for this put the srun
       command into a mode where various special actions may be invoked.

MPI SUPPORT

       MPI use depends upon the type of MPI being used.  There are three fundamentally  different
       modes of operation used by these various MPI implementation.

       1.  Slurm  directly  launches  the  tasks  and  performs  initialization of communications
       (Quadrics MPI, MPICH2, MPICH-GM, MVAPICH, MVAPICH2 and some MPICH1  modes).  For  example:
       "srun -n16 a.out".

       2.  Slurm  creates  a resource allocation for the job and then mpirun launches tasks using
       Slurm's infrastructure (OpenMPI, LAM/MPI, HP-MPI and some MPICH1 modes).

       3. Slurm creates a resource allocation for the job and then mpirun  launches  tasks  using
       some  mechanism other than Slurm, such as SSH or RSH (BlueGene MPI and some MPICH1 modes).
       These tasks initiated outside of Slurm's monitoring or control. Slurm's epilog  should  be
       configured to purge these tasks when the job's allocation is relinquished.

       See  http://slurm.schedmd.com/mpi_guide.html  for more information on use of these various
       MPI implementation with Slurm.

MULTIPLE PROGRAM CONFIGURATION

       Comments in the configuration file must have a "#" in column one.  The configuration  file
       contains the following fields separated by white space:

       Task rank
              One  or  more  task  ranks to use this configuration.  Multiple values may be comma
              separated.  Ranges may be indicated with two numbers separated with a '-' with  the
              smaller  number  first  (e.g.  "0-4"  and  not  "4-0").   To indicate all tasks not
              otherwise specified, specify a rank of '*' as the last line of  the  file.   If  an
              attempt  is made to initiate a task for which no executable program is defined, the
              following error message will be produced "No executable program specified for  this
              task".

       Executable
              The name of the program to execute.  May be fully qualified pathname if desired.

       Arguments
              Program  arguments.   The  expression "%t" will be replaced with the task's number.
              The expression "%o" will be replaced with the task's offset within this range (e.g.
              a  configured  task rank value of "1-5" would have offset values of "0-4").  Single
              quotes may be used to avoid having the enclosed values interpreted.  This field  is
              optional.   Any arguments for the program entered on the command line will be added
              to the arguments specified in the configuration file.

       For example:
       ###################################################################
       # srun multiple program configuration file
       #
       # srun -n8 -l --multi-prog silly.conf
       ###################################################################
       4-6       hostname
       1,7       echo  task:%t
       0,2-3     echo  offset:%o

       > srun -n8 -l --multi-prog silly.conf
       0: offset:0
       1: task:1
       2: offset:1
       3: offset:2
       4: linux15.llnl.gov
       5: linux16.llnl.gov
       6: linux17.llnl.gov
       7: task:7

EXAMPLES

       This simple example demonstrates the execution of the command hostname in eight tasks.  At
       least  eight  processors  will  be  allocated  to  the job (the same as the task count) on
       however many nodes are required to satisfy the request. The output of each  task  will  be
       proceeded  with  its  task number.  (The machine "dev" in the example below has a total of
       two CPUs per node)

       > srun -n8 -l hostname
       0: dev0
       1: dev0
       2: dev1
       3: dev1
       4: dev2
       5: dev2
       6: dev3
       7: dev3

       The srun -r option is used within a job script to run two job steps on disjoint  nodes  in
       the  following example. The script is run using allocate mode instead of as a batch job in
       this case.

       > cat test.sh
       #!/bin/sh
       echo $SLURM_NODELIST
       srun -lN2 -r2 hostname
       srun -lN2 hostname

       > salloc -N4 test.sh
       dev[7-10]
       0: dev9
       1: dev10
       0: dev7
       1: dev8

       The following script runs two job steps in parallel within an allocated set of nodes.

       > cat test.sh
       #!/bin/bash
       srun -lN2 -n4 -r 2 sleep 60 &
       srun -lN2 -r 0 sleep 60 &
       sleep 1
       squeue
       squeue -s
       wait

       > salloc -N4 test.sh
         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]

       STEPID     PARTITION     USER      TIME NODELIST
       65641.0        batch   grondo      0:01 dev[7-8]
       65641.1        batch   grondo      0:01 dev[9-10]

       This example demonstrates how one executes a simple MPICH job.  We use  srun  to  build  a
       list  of  machines  (nodes)  to be used by mpirun in its required format. A sample command
       line and the script to be executed follow.

       > cat test.sh
       #!/bin/sh
       MACHINEFILE="nodes.$SLURM_JOB_ID"

       # Generate Machinefile for mpich such that hosts are in the same
       #  order as if run via srun
       #
       srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE

       # Run using generated Machine file:
       mpirun -np $SLURM_NTASKS -machinefile $MACHINEFILE mpi-app

       rm $MACHINEFILE

       > salloc -N2 -n4 test.sh

       This simple example demonstrates the execution of different jobs on different nodes in the
       same  srun.   You  can  do  this  for  any  number  of  nodes  or any number of jobs.  The
       executables are placed on the nodes sited by the SLURM_NODEID env var.  Starting at 0  and
       going to the number specified on the srun commandline.

       > cat test.sh
       case $SLURM_NODEID in
           0) echo "I am running on "
              hostname ;;
           1) hostname
              echo "is where I am running" ;;
       esac

       > srun -N2 test.sh
       dev0
       is where I am running
       I am running on
       dev1

       This  example  demonstrates  use  of  multi-core  options  to control layout of tasks.  We
       request that four sockets per node and two cores per socket be dedicated to the job.

       > srun -N2 -B 4-4:2-2 a.out

       This example shows a script in which Slurm is used to provide resource  management  for  a
       job  by executing the various job steps as processors become available for their dedicated
       use.

       > cat my.script
       #!/bin/bash
       srun --exclusive -n4 prog1 &
       srun --exclusive -n3 prog2 &
       srun --exclusive -n1 prog3 &
       srun --exclusive -n1 prog4 &
       wait

COPYING

       Copyright (C) 2006-2007 The Regents of the University of California.  Produced at Lawrence
       Livermore National Laboratory (cf, DISCLAIMER).
       Copyright (C) 2008-2010 Lawrence Livermore National Security.
       Copyright (C) 2010-2015 SchedMD LLC.

       This   file   is  part  of  Slurm,  a  resource  management  program.   For  details,  see
       <http://slurm.schedmd.com/>.

       Slurm is free software; you can redistribute it and/or modify it under the  terms  of  the
       GNU  General Public License as published by the Free Software Foundation; either version 2
       of the License, or (at your option) any later version.

       Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       GNU General Public License for more details.

SEE ALSO

       salloc(1),  sattach(1),  sbatch(1),   sbcast(1),   scancel(1),   scontrol(1),   squeue(1),
       slurm.conf(5), sched_setaffinity (2), numa (3) getrlimit (2)