Provided by: slurm-client_19.05.5-1_amd64 bug

NAME

       sdiag - Scheduling diagnostic tool for Slurm

SYNOPSIS

       sdiag

DESCRIPTION

       sdiag  shows  information related to slurmctld execution about: threads, agents, jobs, and
       scheduling algorithms. The goal is to obtain data  from  slurmctld  behaviour  helping  to
       adjust  configuration  parameters  or  queues  policies. The main reason behind is to know
       Slurm behaviour under systems with a high throughput.

       It has two execution modes. The default mode --all shows several counters  and  statistics
       explained later, and there is another execution option --reset for resetting those values.

       Values are reset at midnight UTC time by default.

       The first block of information is related to global slurmctld execution:

       Server thread count
              The  number  of  current  active slurmctld threads. A high number would mean a high
              load processing events like job submissions,  jobs  dispatching,  jobs  completing,
              etc.  If  this  is  often close to MAX_SERVER_THREADS it could point to a potential
              bottleneck.

       Agent queue size
              Slurm design has scalability in mind and sending messages to thousands of nodes  is
              not  a  trivial  task.  The  agent mechanism helps to control communication between
              slurmctld and the slurmd daemons for a best effort. This value denotes the count of
              enqueued outgoing RPC requests in an internal retry list.

       Agent count
              Number  of agent threads. Each of these agent threads can create in turn a group of
              up to 2 + AGENT_THREAD_COUNT active threads at a time.

       DBD Agent queue size
              Slurm queues up the messages intended for the SlurmDBD  and  processes  them  in  a
              separate  thread.  If  the  SlurmDBD,  or  database,  is down then this number will
              increase. The max queue size is calculated as:

              MAX(10000, ((max_job_cnt * 2) + (node_record_count * 4)))

              If this number begins to grow more than half of the max queue  size,  the  slurmdbd
              and the database should be investigated immediately.

       Jobs submitted
              Number of jobs submitted since last reset

       Jobs started
              Number of jobs started since last reset. This includes backfilled jobs.

       Jobs completed
              Number of jobs completed since last reset.

       Jobs canceled
              Number of jobs canceled since last reset.

       Jobs failed
              Number of jobs failed due to slurmd or other internal issues since last reset.

       Job states ts:
              Lists the timestamp of when the following job state counts were gathered.

       Jobs pending:
              Number of jobs pending at the given time of the time stamp above.

       Jobs running:
              Number of jobs running at the given time of the time stamp above.

       Jobs running ts:
              Time stamp of when the running job count was taken.

       The  next  block  of  information  is  related  to main scheduling algorithm based on jobs
       priorities. A scheduling cycle implies to get the job_write_lock lock, then trying to  get
       resources  for  jobs  pending, starting from the most priority one and going in descendent
       order. Once a job can not get the resources  the  loop  keeps  going  but  just  for  jobs
       requesting  other  partitions.  Jobs with dependencies or affected  by accounts limits are
       not processed.

       Last cycle
              Time in microseconds for last scheduling cycle.

       Max cycle
              Maximum time in microseconds for any scheduling cycle since last reset.

       Total cycles
              Total run time  in  microseconds  for  all  scheduling  cycles  since  last  reset.
              Scheduling  is performed periodically and (depending upon configuration) when a job
              is submitted or a job is completed.

       Mean cycle
              Mean time in microseconds for all scheduling cycles since last reset.

       Mean depth cycle
              Mean of cycle depth. Depth means number of jobs processed in a scheduling cycle.

       Cycles per minute
              Counter of scheduling executions per minute.

       Last queue length
              Length of jobs pending queue.

       The next  block  of  information  is  related  to  backfilling  scheduling  algorithm.   A
       backfilling  scheduling  cycle implies to get locks for jobs, nodes and partitions objects
       then trying to get resources for jobs pending. Jobs are processed based on priorities.  If
       a  job  can  not get resources the algorithm calculates when it could get them obtaining a
       future start time for the job.  Then next job is processed and the algorithm tries to  get
       resources  for  that job but avoiding to affect the previous ones, and again it calculates
       the future start time if not current resources available. The backfilling algorithm  takes
       more  time  for  each new job to process since more priority jobs can not be affected. The
       algorithm itself takes measures for avoiding a long execution cycle and for taking all the
       locks for too long.

       Total backfilled jobs (since last slurm start)
              Number of jobs started thanks to backfilling since last slurm start.

       Total backfilled jobs (since last stats cycle start)
              Number of jobs started thanks to backfilling since last time stats where reset.  By
              default these values are reset at midnight UTC time.

       Total backfilled heterogeneous job components
              Number of heterogeneous job components started thanks  to  backfilling  since  last
              Slurm start.

       Total cycles
              Number of backfill scheduling cycles since last reset

       Last cycle when
              Time  when  last  backfill  scheduling  cycle happened in the format "weekday Month
              MonthDay hour:minute.seconds year"

       Last cycle
              Time in microseconds of last backfill scheduling cycle.  It counts  only  execution
              time,  removing  sleep  time  inside  a  scheduling  cycle  when it executes for an
              extended period time.  Note that locks are released during the sleep time  so  that
              other work can proceed.

       Max cycle
              Time  in  microseconds  of  maximum  backfill scheduling cycle execution since last
              reset.  It counts only execution time, removing  sleep  time  inside  a  scheduling
              cycle  when  it executes for an extended period time.  Note that locks are released
              during the sleep time so that other work can proceed.

       Mean cycle
              Mean time in microseconds of backfilling scheduling cycles since last reset.

       Last depth cycle
              Number of processed jobs during last backfilling scheduling cycle. It counts  every
              job even if that job can not be started due to dependencies or limits.

       Last depth cycle (try sched)
              Number  of  processed jobs during last backfilling scheduling cycle. It counts only
              jobs with a chance to start using available  resources.  These  jobs  consume  more
              scheduling time than jobs which are found can not be started due to dependencies or
              limits.

       Depth Mean
              Mean count of jobs processed during all backfilling scheduling  cycles  since  last
              reset.   Jobs which are found to be ineligible to run when examined by the backfill
              scheduler are not counted (e.g. jobs submitted to multiple partitions  and  already
              started,  jobs  which  have  reached a QOS or account limit such as maximum running
              jobs for an account, etc).

       Depth Mean (try sched)
              The subset of Depth Mean that the backfill scheduler attempted to schedule.

       Last queue length
              Number of jobs pending to be processed by backfilling algorithm.  A job is  counted
              once  for each partition it is queued to use.  A pending job array will normally be
              counted as one job (tasks of a job array which have already  been  started/requeued
              or  individually  modified  will  already  have individual job records and are each
              counted as a separate job).

       Queue length Mean
              Mean count of jobs pending to be processed by backfilling algorithm.   A  job  once
              for  each  partition it requested.  A pending job array will normally be counted as
              one job (tasks  of  a  job  array  which  have  already  been  started/requeued  or
              individually modified will already have individual job records and are each counted
              as a separate job).

       Latency for 1000 calls to gettimeofday()
              Latency of 1000 calls to the gettimeofday() syscall in microseconds, as measured at
              controller startup.

       The  next  blocks  of information report the most frequently issued remote procedure calls
       (RPCs), calls made for the Slurmctld daemon to perform  some  action.   The  fourth  block
       reports  the RPCs issued by message type.  You will need to look up those RPC codes in the
       Slurm source code by looking them up in the  file  src/common/slurm_protocol_defs.h.   The
       report includes the number of times each RPC is invoked, the total time consumed by all of
       those RPCs plus the average time consumed by each RPC in microseconds.   The  fifth  block
       reports  the  RPCs issued by user ID, the total number of RPCs they have issued, the total
       time consumed by all of those  RPCs  plus  the  average  time  consumed  by  each  RPC  in
       microseconds.   RPCs statistics are collected for the life of the slurmctld process unless
       explicitly --reset.

       The sixth block of information, labeled Pending RPC Statistics,  shows  information  about
       pending outgoing RPCs on the slurmctld agent queue.  The first section of this block shows
       types of RPCs on the queue and the count of each. The second section shows up to the first
       25 individual RPCs pending on the agent queue, including the type and the destination host
       list.  This information is cached and only refreshed on 30 second intervals.

OPTIONS

       -a, --all
              Get and report information. This is the default mode of operation.

       -h, --help
              Print description of options and exit.

       -i, --sort-by-id
              Sort Remote Procedure Call (RPC) data by message type ID and user ID.

       -r, --reset
              Reset scheduler and RPC counters to 0.  Only  supported  for  Slurm  operators  and
              administrators.

       -t, --sort-by-time
              Sort Remote Procedure Call (RPC) data by total run time.

       -T, --sort-by-time2
              Sort Remote Procedure Call (RPC) data by average run time.

       --usage
              Print list of options and exit.

       -V, --version
              Print current version number and exit.

ENVIRONMENT VARIABLES

       Some  sdiag  options  may  be  set via environment variables. These environment variables,
       along with their corresponding options, are listed below.  (Note: commandline options will
       always override these settings)

       SLURM_CONF          The location of the Slurm configuration file.

COPYING

       Copyright (C) 2010-2011 Barcelona Supercomputing Center.
       Copyright (C) 2010-2019 SchedMD LLC.

       Slurm  is  free  software; you can redistribute it and/or modify it under the terms of the
       GNU General Public License as published by the Free Software Foundation; either version  2
       of the License, or (at your option) any later version.

       Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       GNU General Public License for more details.

SEE ALSO

       sinfo(1), squeue(1), scontrol(1), slurm.conf(5),