bionic (1) smap.1.gz

Provided by: slurm-client_17.11.2-1build1_amd64 bug

NAME

       smap - graphically view information about Slurm jobs, partitions, and set configurations parameters.

SYNOPSIS

       smap [OPTIONS...]

DESCRIPTION

       smap  is  used  to graphically view job, partition and node information for a system running Slurm.  Note
       that information about nodes and partitions to which you lack access will always be  displayed  to  avoid
       obvious gaps in the output.  This is equivalent to the --all option of the sinfo and squeue commands.

OPTIONS

       -c, --commandline
              Print output to the commandline, no curses.

       -D <option>, --display=<option>
              sets  the  display  mode  for  smap,  showing  relevant  information  about  the selected view and
              displaying a corresponding node chart.  Note that unallocated nodes are indicated  by  a  '.'  and
              nodes  in  the  DOWN, DRAINED or FAIL state by a '#'.  When the --iterate=<seconds> option is also
              selected, you can switch displays by typing a different letter from the list below (except 'c').

              b              Displays information about BlueGene partitions on the system

              c              Displays current BlueGene node states and allows users  to  configure  the  system.
                             Type  'quit'  to end the configure mode.  Type 'exit' to end the configuration mode
                             and exit smap.

              j              Displays information about jobs running on system.

              r              Display information about advanced reservations.   While  all  current  and  future
                             reservations  will be listed, only currently active reservations will appear on the
                             node map.

              s              Displays information about slurm partitions on the system

       -h, --noheader
              Do not print a header on the output.

       -H, --show_hidden
              Display hidden partitions and their jobs.

       --help,
              Print a message describing all smap options.

       -i <seconds> , --iterate=<seconds>
              Print the state on a periodic basis.  Sleep for the indicated number of seconds  between  reports.
              User  can  exit  at anytime by typing 'q' or hitting the return key.  If user is in configure mode
              type 'exit' to exit program, 'quit' to exit configure mode.

       -I, --ionodes
              Only show objects with these ionodes this support is only for bluegene  systems.  This  should  be
              used  inconjuction  with the '-n' option.  Only specify the ionode number range here.  Specify the
              node name with the '-n' option.

       -M, --clusters=<string>
              Clusters to issue commands to.  Note that the  SlurmDBD  must  be  up  for  this  option  to  work
              properly.

       -n, --nodes
              Only  show  objects  with  these  nodes.   If  querying to the ionode level use the option '-I' in
              conjunction with this option.

       -Q, --quiet
              Avoid printing error messages.

       -R <RACK_MIDPLANE_ID/XYZ>, --resolve=<RACK_MIDPLANE_ID/XYZ>
              Returns the XYZ coords for a Rack/Midplane id or vice-versa.

              To get the XYZ coord for a Rack/Midplane id input -R R101 where 10  is  the  rack  and  1  is  the
              midplane.

              To get the Rack/Midplane id from a XYZ coord input -R 101 where X=1 Y=1 Z=1 with no leading 'R'.

       --usage
              Print a brief message listing the smap options.

       -V , --version
              Print version information and exit.

INTERACTIVE OPTIONS

       When  using  smap in curses mode and when the --iterate=<seconds> option is also selected, you can scroll
       through the different windows using the arrow keys.  The  up  and  down  arrow  keys  scroll  the  window
       containing the grid, and the left and right arrow keys scroll the window containing the text information.

       With  the iterate option selected, you can use any of the options available to the -D option listed above
       (except 'c') to change screens.  You can also hide or make visible hidden partitions by pressing  'h'  at
       any moment.

OUTPUT FIELD DESCRIPTIONS

       ACCESS_CONTROL
              Identifies  the  users or bank accounts which can use this advanced reservation.  A prefix of "A:"
              indicates that the following account names may use this reservation.  A prefix of  "U:"  indicates
              that the following user names may use this reservation.

       AVAIL  Partition state: up or down.

       BG_BLOCK
              BlueGene Block Name.

       CONN   Connection Type: TORUS or MESH or SMALL (for small blocks).

       END_TIME
              The time when an advanced reservation ended.

       ID     Key to identify the nodes associated with this entity in the node chart.

       MODE   Mode Type: COPROCESS or VIRTUAL.

       NAME   Name of the job or advanced reservation.

       NODELIST or BP_LIST
              Names of nodes or base partitions associated with this configuration, partition or reservation.

       NODES  Count of nodes or base partitions with this particular configuration.

       PARTITION
              Name of a partition.  Note that the suffix "*" identifies the default partition.

       ST     State of a job in compact form. Possible states include: PD (pending), R (running), S (suspended),
              CD  (completed), CF (configuring), CG  (completing),  F  (failed),  TO  (timeout),  and  NF  (node
              failure). See JOB STATE CODES section below for more information.

       START_TIME
              The time when an advanced reservation started.

       STATE  State  of  the  nodes.   Possible  states include: allocated, completing, down, drained, draining,
              fail, failing, idle, and unknown plus their abbreviated forms: alloc,  comp,  donw,  drain,  drng,
              fail,  failg,  idle,  and  unk  respectively.   Note that the suffix "*" identifies nodes that are
              presently not responding.  See NODE STATE CODES section below for more information.

       TIMELIMIT
              Maximum time limit for any user job in days-hours:minutes:seconds.  infinite is used  to  identify
              jobs or partitions without a job time limit.

       TOPOGRAPHY INFORMATION

       The node chart is designed to indicate relative locations of the nodes.  On most Linux clusters this will
       represent a one-dimensional array of nodes. Larger clusters will utilize multiple as  needed  with  right
       side of one line being logically followed by the left side of the next line.

       On BlueGene systems, the node chart will indicate the three
       dimensional topography of the system.
       The X dimension will increase from left to right on a given line.
       The Y dimension will increase in planes from bottom to top.
       The Z dimension will increase within a plane from the back
       line to the front line of a plane.
       Note the example below:

          a a a a b b d d
         a a a a b b d d
        a a a a b b c c
       a a a a b b c c

          a a a a b b d d
         a a a a b b d d
        a a a a b b c c
       a a a a b b c c

          a a a a . . d d
         a a a a . . d d
        a a a a . . e e              Y
       a a a a . . e e               |
                                     |
          a a a a . . d d            0----X
         a a a a . . d d            /
        a a a a . . . .            /
       a a a a . . . #            Z

       ID JOBID PARTITION BG_BLOCK USER   NAME ST  TIME NODES BP_LIST
       a  12345 batch     RMP0     joseph tst1 R  43:12   32k bgl[000x333]
       b  12346 debug     RMP1     chris  sim3 R  12:34    8k bgl[420x533]
       c  12350 debug     RMP2     danny  job3 R   0:12    4k bgl[622x733]
       d  12356 debug     RMP3     dan    colu R  18:05    8k bgl[600x731]
       e  12378 debug     RMP4     joseph asx4 R   0:34    2k bgl[612x713]

CONFIGURATION INSTRUCTIONS

       For  Admin  use.  From this screen one can create a configuration file that is used to partition and wire
       the system into usable blocks.

       OUTPUT

              BG_BLOCK
                     BlueGene Block Name.

              CONN   Connection Type: TORUS or MESH or SMALL (for small blocks).

              ID     Key to identify the nodes associated with this entity in the node chart.

              MODE   Mode Type: COPROCESS or VIRTUAL.

       INPUT COMMANDS

              resolve <RACK_MIDPLANE_ID/XYZ>
                     Returns the XYZ coords for a Rack/Midplane id or vice-versa.

                     To get the XYZ coord for a Rack/Midplane id input -R R101 where 10 is the rack and 1 is the
                     midplane.

                     To get the Rack/Midplane id from a XYZ coord input -R 101 where X=1 Y=1 Z=1 with no leading
                     'R'.

              load <bluegene.conf file>
                     Load an already existent bluegene.conf file. This will verify and  mapout  a  bluegene.conf
                     file.  After loaded the configuration may be edited and saved as a new file.

              create <size> <options>
                     Submit  request for partition creation. The size may be specified either as a count of base
                     partitions or specific dimensions in the X, Y  and  Z  directions  separated  by  "x",  for
                     example  "2x3x4".  A  variety  of options may be specified. Valid options are listed below.
                     Note that the option and their values are case insensitive  (e.g.  "MESH"  and  "mesh"  are
                     equivalent).

              Start = XxYxZ
                     Identify  where  to  start  the  partition.   This  is primarily for testing purposes.  For
                     convenience one can only put the X coord or XxY will  also  work.   The  default  value  is
                     0x0x0.

              Connection = MESH | TORUS | SMALL
                     Identify how the nodes should be connected in network.  The default value is TORUS.

                     Small  Equivalent  to  "Connection=Small".   If  a  small  connection is specified the base
                            partition chosen will create smaller partitions  based  on  options  32CNBlocks  and
                            128CNBlocks  respectively  for  a  Bluegene  L  system.  16CNBlocks, 64CNBlocks, and
                            256CNBlocks are also available for Bluegene P systems.  Keep in mind you  must  have
                            enough ionodes to make all these configurations possible.
                              These  number  will be altered to take up the entire base partition. Size does not
                            need to be specified with a  small  request,  we  will  always  default  to  1  base
                            partition for allocation.

                     Mesh   Equivalent to "Connection=Mesh".

                     Torus  Equivalent to "Connection=Torus".

              Rotation = TRUE | FALSE
                     Specifies  that  the geometry specified in the size parameter may be rotated in space (e.g.
                     the Y and Z dimensions may be switched).  The default value is FALSE.

              Rotate Equivalent to "Rotation=true".

              Elongation = TRUE | FALSE
                     If TRUE, permit the geometry specified in the size parameter to be altered as needed to fit
                     available resources.  For example, an allocation of "4x2x1" might be used to satisfy a size
                     specification of "2x2x2".  The default value is FALSE.

              Elongate
                     Equivalent to "Elongation=true".

              copy <id> <count>
                     Submit request for partition to be copied.  You may copy a specific partition by specifying
                     its  id, by default the last configured partition is copied.  You may also specify a number
                     of copies to be made.  By default, one copy is made.

              delete <id>
                     Delete the specified block.

              down <node_range>
                     Down a specific node or range of nodes.  i.e. 000, 000-111 [000x111]

              up <node_range>
                     Bring a specific node or range of nodes up.  i.e. 000, 000-111 [000x111]

              alldown
                     Set all nodes to down state.

              allup  Set all nodes to up state.

              save <file_name>
                     Save the current configuration to a file.  If no file_name is specified, the  configuration
                     is written to a file named "bluegene.conf" in the current working directory.

              clear  Clear all partitions created.

NODE STATE CODES

       Node  state  codes  are shortened as required for the field size.  These node states may be followed by a
       special character to identify state flags associated with the node.  The  following  node  sufficies  and
       states are used:

       *   The  node  is  presently  not responding and will not be allocated any new work.  If the node remains
           non-responsive, it will be placed in the DOWN state (except  in  the  case  of  COMPLETING,  DRAINED,
           DRAINING, FAIL, FAILING nodes).

       ~   The node is presently in a power saving mode (typically running at reduced frequency).

       #   The node is presently being powered up or configured.

       $   The node is currently in a reservation with a flag value of "maintenance".

       @   The node is pending reboot.

       ALLOCATED   The node has been allocated to one or more jobs.

       ALLOCATED+  The  node is allocated to one or more active jobs plus one or more jobs are in the process of
                   COMPLETING.

       COMPLETING  All jobs associated with this node are in the process of COMPLETING.  This node state will be
                   removed when all of the job's processes have terminated and the Slurm epilog program (if any)
                   has terminated. See the Epilog parameter description in the  slurm.conf  man  page  for  more
                   information.

       DOWN        The  node  is  unavailable for use. Slurm can automatically place nodes in this state if some
                   failure occurs. System administrators may also explicitly place nodes in  this  state.  If  a
                   node  resumes  normal  operation,  Slurm  can  automatically  return  it  to service. See the
                   ReturnToService and SlurmdTimeout parameter descriptions in the slurm.conf(5)  man  page  for
                   more information.

       DRAINED     The  node  is  unavailable  for  use  per  system administrator request.  See the update node
                   command in the scontrol(1) man page or the slurm.conf(5) man page for more information.

       DRAINING    The node is currently executing a job, but will not be allocated to additional jobs. The node
                   state  will  be  changed to state DRAINED when the last job on it completes. Nodes enter this
                   state per system administrator request. See the update node command in  the  scontrol(1)  man
                   page or the slurm.conf(5) man page for more information.

       FAIL        The  node  is  expected  to  fail  soon  and  is unavailable for use per system administrator
                   request.  See the update node command in the scontrol(1) man page or  the  slurm.conf(5)  man
                   page for more information.

       FAILING     The  node  is  currently executing a job, but is expected to fail soon and is unavailable for
                   use per system administrator request.  See the update node command  in  the  scontrol(1)  man
                   page or the slurm.conf(5) man page for more information.

       IDLE        The node is not allocated to any jobs and is available for use.

       MAINT       The node is currently in a reservation with a flag value of "maintainence".

       REBOOT      The node is currently scheduled to be rebooted.

       UNKNOWN     The Slurm controller has just started and the node's state has not yet been determined.

JOB STATE CODES

       Jobs  typically  pass  through  several  states in the course of their execution.  The typical states are
       PENDING, RUNNING, SUSPENDED, COMPLETING, and COMPLETED.  An explanation of each state follows.

       BF  BOOT_FAIL       Job terminated due to launch failure, typically  due  to  a  hardware  failure  (e.g.
                           unable to boot the node or block and the job can not be requeued).

       CA  CANCELLED       Job was explicitly cancelled by the user or system administrator.  The job may or may
                           not have been initiated.

       CD  COMPLETED       Job has terminated all processes on all nodes with an exit code of zero.

       CG  COMPLETING      Job is in the process of completing. Some  processes  on  some  nodes  may  still  be
                           active.

       CF  CONFIGURING     Job  has  been  allocated resources, but are waiting for them to become ready for use
                           (e.g. booting).

       F   FAILED          Job terminated with non-zero exit code or other failure condition.

       NF  NODE_FAIL       Job terminated due to failure of one or more allocated nodes.

       PD  PENDING         Job is awaiting resource allocation.

       PR  PREEMPTED       Job terminated due to preemption.

       R   RUNNING         Job currently has an allocation.

       SE  SPECIAL_EXIT    The job was requeued in a special state. This state can be set by users, typically in
                           EpilogSlurmctld, if the job has terminated with a particular exit value.

       ST  STOPPED         Job has an allocation, but execution has been stopped with SIGSTOP signal.  CPUS have
                           been retained by this job.

       S   SUSPENDED       Job has an allocation, but execution has been suspended and CPUs have  been  released
                           for other jobs.

       TO  TIMEOUT         Job terminated upon reaching its time limit.

ENVIRONMENT VARIABLES

       The following environment variables can be used to override settings compiled into smap.

       SLURM_CONF          The location of the Slurm configuration file.

COPYING

       Copyright  (C)  2004-2007  The  Regents  of the University of California.  Produced at Lawrence Livermore
       National Laboratory (cf, DISCLAIMER).
       Copyright (C) 2008-2009 Lawrence Livermore National Security.
       Copyright (C) 2010-2013 SchedMD LLC.

       This   file   is   part   of   Slurm,   a   resource    management    program.     For    details,    see
       <https://slurm.schedmd.com/>.

       Slurm  is  free  software;  you  can  redistribute it and/or modify it under the terms of the GNU General
       Public License as published by the Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       Slurm  is  distributed  in  the  hope  that it will be useful, but WITHOUT ANY WARRANTY; without even the
       implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.   See  the  GNU  General  Public
       License for more details.

SEE ALSO

       scontrol(1),  sinfo(1),  squeue(1),  slurm_load_ctl_conf  (3),  slurm_load_jobs (3), slurm_load_node (3),
       slurm_load_partitions   (3),   slurm_reconfigure   (3),   slurm_shutdown   (3),   slurm_update_job   (3),
       slurm_update_node (3), slurm_update_partition (3), slurm.conf(5)