Provided by: slurm-client_17.11.2-1build1_amd64 bug

NAME

       sdiag - Scheduling diagnostic tool for Slurm

SYNOPSIS

       sdiag

DESCRIPTION

       sdiag  shows  information related to slurmctld execution about: threads, agents, jobs, and
       scheduling algorithms. The goal is to obtain data  from  slurmctld  behaviour  helping  to
       adjust  configuration  parameters  or  queues  policies. The main reason behind is to know
       Slurm behaviour under systems with a high throughput.

       It has two execution modes. The default mode --all shows several counters  and  statistics
       explained later, and there is another execution option --reset for resetting those values.

       Values are reset at midnight UTC time by default.

       The first block of information is related to global slurmctld execution:

       Server thread count
              The  number  of  current  active slurmctld threads. A high number would mean a high
              load processing events like job submissions,  jobs  dispatching,  jobs  completing,
              etc.  If  this  is  often close to MAX_SERVER_THREADS it could point to a potential
              bottleneck.

       Agent queue size
              Slurm design has scalability in mind and sending messages to thousands of nodes  is
              not  a trivial task. The agent mechanism helps to control communication between the
              slurm daemons and the controller for a best effort. If  this  values  is  close  to
              MAX_AGENT_CNT there could be some delays affecting jobs management.

       DBD Agent queue size
              Slurm  queues  up  the  messages  intended for the SlurmDBD and processes them in a
              separate thread. If the SlurmDBD, or  database,  is  down  then  this  number  will
              increase. The max queue size is calculated as:

              MAX(10000, ((max_job_cnt * 2) + (node_record_count * 4)))

              If  this  number  begins to grow more than half of the max queue size, the slurmdbd
              and the database should be investigated immediately.

       Jobs submitted
              Number of jobs submitted since last reset

       Jobs started
              Number of jobs started since last reset. This includes backfilled jobs.

       Jobs completed
              Number of jobs completed since last reset.

       Jobs canceled
              Number of jobs canceled since last reset.

       Jobs failed
              Number of jobs failed due to slurmd or other internal issues since last reset.

       Jobs running:
              Number of jobs running at the given time of the time stamp below.

       Jobs running ts:
              Time stamp of when the running job count was taken.

       The second block of information is related to main  scheduling  algorithm  based  on  jobs
       priorities.  A scheduling cycle implies to get the job_write_lock lock, then trying to get
       resources for jobs pending, starting from the most priority one and  going  in  descendent
       order.  Once  a  job  can  not  get  the  resources the loop keeps going but just for jobs
       requesting other partitions. Jobs with dependencies or affected  by  accounts  limits  are
       not processed.

       Last cycle
              Time in microseconds for last scheduling cycle.

       Max cycle
              Time in microseconds for the maximum scheduling cycle since last reset.

       Total cycles
              Number  of  scheduling  cycles since last reset. Scheduling is done in periodically
              and when a job is submitted or a job is completed.

       Mean cycle
              Mean of scheduling cycles since last reset

       Mean depth cycle
              Mean of cycle depth. Depth means number of jobs processed in a scheduling cycle.

       Cycles per minute
              Counter of scheduling executions per minute

       Last queue length
              Length of jobs pending queue.

       The third block  of  information  is  related  to  backfilling  scheduling  algorithm.   A
       backfilling  scheduling  cycle implies to get locks for jobs, nodes and partitions objects
       then trying to get resources for jobs pending. Jobs are processed based on priorities.  If
       a  job  can  not get resources the algorithm calculates when it could get them obtaining a
       future start time for the job.  Then next job is processed and the algorithm tries to  get
       resources  for  that job but avoiding to affect the previous ones, and again it calculates
       the future start time if not current resources available. The backfilling algorithm  takes
       more  time  for  each new job to process since more priority jobs can not be affected. The
       algorithm itself takes measures for avoiding a long execution cycle and for taking all the
       locks for too long.

       Total backfilled jobs (since last slurm start)
              Number of jobs started thanks to backfilling since last slurm start.

       Total backfilled jobs (since last stats cycle start)
              Number of jobs started thanks to backfilling since last time stats where reset.  By
              default these values are reset at midnight UTC time.

       Total backfilled heterogeneous job components
              Number of heterogeneous job components started thanks  to  backfilling  since  last
              Slurm start.

       Total cycles
              Number of scheduling cycles since last reset

       Last cycle when
              Time  when  last  execution  cycle  happened  in  format  "weekday  Month  MonthDay
              hour:minute.seconds year"

       Last cycle
              Time in microseconds of last backfilling cycle.   It  counts  only  execution  time
              removing  sleep  time  inside a scheduling cycle when it takes too much time.  Note
              that locks are released during the sleep time so that other work can proceed.

       Max cycle
              Time in microseconds of maximum backfilling cycle execution since last  reset.   It
              counts  only  execution  time removing sleep time inside a scheduling cycle when it
              takes too much time.  Note that locks are released during the sleep  time  so  that
              other work can proceed.

       Mean cycle
              Mean of backfilling scheduling cycles in microseconds since last reset

       Last depth cycle
              Number  of processed jobs during last backfilling scheduling cycle. It counts every
              process even if it has no option to execute due to dependencies or limits.

       Last depth cycle (try sched)
              Number of processed jobs during last backfilling scheduling cycle. It  counts  only
              processes  with  a  chance  to  run waiting for available resources. These jobs are
              which makes the backfilling algorithm heavier.

       Depth Mean
              Mean of processed jobs during backfilling scheduling cycles since last reset.  Jobs
              which are found to be ineligible to run when examined by the backfill scheduler are
              not counted (e.g. jobs submitted to multiple partitions and already  started,  jobs
              which  have  reached  a  QOS  or  account limit such as maximum running jobs for an
              account, etc).

       Depth Mean (try sched)
              Mean of processed jobs during backfilling scheduling cycles since last reset.  Jobs
              which are found to be ineligible to run when examined by the backfill scheduler are
              not counted (e.g. jobs submitted to multiple partitions and already  started,  jobs
              which  have  reached  a  QOS  or  account limit such as maximum running jobs for an
              account, etc).

       Last queue length
              Number of jobs pending to be processed by backfilling algorithm.  A  job  once  for
              each  partition  it requested.  A pending job array will normally be counted as one
              job (tasks of a job array which have already been started/requeued or  individually
              modified  will  already  have  individual  job  records  and  are each counted as a
              separate job).

       Queue length Mean
              Mean of jobs pending to be processed by backfilling algorithm.  A job once for each
              partition  it  requested.   A pending job array will normally be counted as one job
              (tasks of a job array which have  already  been  started/requeued  or  individually
              modified  will  already  have  individual  job  records  and  are each counted as a
              separate job).

       The fourth and fifth blocks of  information  report  the  most  frequently  issued  remote
       procedure  calls  (RPCs), calls made for the Slurmctld daemon to perform some action.  The
       fourth block reports the RPCs issued by message type.  You will need to look up those  RPC
       codes    in    the    Slurm    source    code   by   looking   them   up   in   the   file
       src/common/slurm_protocol_defs.h.  The report includes the number of  times  each  RPC  is
       invoked,  the  total  time consumed by all of those RPCs plus the average time consumed by
       each RPC in microseconds.  The fifth block reports the RPCs issued by user ID,  the  total
       number  of  RPCs  they  have issued, the total time consumed by all of those RPCs plus the
       average time consumed by each RPC in microseconds.

OPTIONS

       -a, --all
              Get and report information. This is the default mode of operation.

       -h, --help
              Print description of options and exit.

       -i, --sort-by-id
              Sort Remote Procedure Call (RPC) data by message type ID and user ID.

       -r, --reset
              Reset counters. Only supported for Slurm operators and administrators.

       -t, --sort-by-time
              Sort Remote Procedure Call (RPC) data by total run time.

       -T, --sort-by-time2
              Sort Remote Procedure Call (RPC) data by average run time.

       --usage
              Print list of options and exit.

       -V, --version
              Print current version number and exit.

ENVIRONMENT VARIABLES

       Some sdiag options may be set via  environment  variables.  These  environment  variables,
       along with their corresponding options, are listed below.  (Note: commandline options will
       always override these settings)

       SLURM_CONF          The location of the Slurm configuration file.

COPYING

       Copyright (C) 2010-2011 Barcelona Supercomputing Center.
       Copyright (C) 2010-2017 SchedMD LLC.

       Slurm is free software; you can redistribute it and/or modify it under the  terms  of  the
       GNU  General Public License as published by the Free Software Foundation; either version 2
       of the License, or (at your option) any later version.

       Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       GNU General Public License for more details.

SEE ALSO

       sinfo(1), squeue(1), scontrol(1), slurm.conf(5),