Ubuntu Manpage: sdiag - Scheduling diagnostic tool for Slurm

name
synopsis
description
options
environment variables
copying
see also

Provided by: slurm-client_17.11.2-1build1_amd64

NAME

       sdiag - Scheduling diagnostic tool for Slurm

SYNOPSIS

       sdiag

DESCRIPTION

sdiag shows information related to slurmctld execution about: threads, agents, jobs, and scheduling
algorithms. The goal is to obtain data from slurmctld behaviour helping to adjust configuration
parameters or queues policies. The main reason behind is to know Slurm behaviour under systems with a
high throughput.

It has two execution modes. The default mode --all shows several counters and statistics explained later,
and there is another execution option --reset for resetting those values.

Values are reset at midnight UTC time by default.

The first block of information is related to global slurmctld execution:

Server thread count
The number of current active slurmctld threads. A high number would mean a high load processing
events like job submissions, jobs dispatching, jobs completing, etc. If this is often close to
MAX_SERVER_THREADS it could point to a potential bottleneck.

Agent queue size
Slurm design has scalability in mind and sending messages to thousands of nodes is not a trivial
task. The agent mechanism helps to control communication between the slurm daemons and the
controller for a best effort. If this values is close to MAX_AGENT_CNT there could be some delays
affecting jobs management.

DBD Agent queue size
Slurm queues up the messages intended for the SlurmDBD and processes them in a separate thread. If
the SlurmDBD, or database, is down then this number will increase. The max queue size is
calculated as:

MAX(10000, ((max_job_cnt * 2) + (node_record_count * 4)))

If this number begins to grow more than half of the max queue size, the slurmdbd and the database
should be investigated immediately.

Jobs submitted
Number of jobs submitted since last reset

Jobs started
Number of jobs started since last reset. This includes backfilled jobs.

Jobs completed
Number of jobs completed since last reset.

Jobs canceled
Number of jobs canceled since last reset.

Jobs failed
Number of jobs failed due to slurmd or other internal issues since last reset.

Jobs running:
Number of jobs running at the given time of the time stamp below.

Jobs running ts:
Time stamp of when the running job count was taken.

The second block of information is related to main scheduling algorithm based on jobs priorities. A
scheduling cycle implies to get the job_write_lock lock, then trying to get resources for jobs pending,
starting from the most priority one and going in descendent order. Once a job can not get the resources
the loop keeps going but just for jobs requesting other partitions. Jobs with dependencies or affected
by accounts limits are not processed.

Last cycle
Time in microseconds for last scheduling cycle.

Max cycle
Time in microseconds for the maximum scheduling cycle since last reset.

Total cycles
Number of scheduling cycles since last reset. Scheduling is done in periodically and when a job is
submitted or a job is completed.

Mean cycle
Mean of scheduling cycles since last reset

Mean depth cycle
Mean of cycle depth. Depth means number of jobs processed in a scheduling cycle.

Cycles per minute
Counter of scheduling executions per minute

Last queue length
Length of jobs pending queue.

The third block of information is related to backfilling scheduling algorithm. A backfilling scheduling
cycle implies to get locks for jobs, nodes and partitions objects then trying to get resources for jobs
pending. Jobs are processed based on priorities. If a job can not get resources the algorithm calculates
when it could get them obtaining a future start time for the job. Then next job is processed and the
algorithm tries to get resources for that job but avoiding to affect the previous ones, and again it
calculates the future start time if not current resources available. The backfilling algorithm takes more
time for each new job to process since more priority jobs can not be affected. The algorithm itself takes
measures for avoiding a long execution cycle and for taking all the locks for too long.

Total backfilled jobs (since last slurm start)
Number of jobs started thanks to backfilling since last slurm start.

Total backfilled jobs (since last stats cycle start)
Number of jobs started thanks to backfilling since last time stats where reset. By default these
values are reset at midnight UTC time.

Total backfilled heterogeneous job components
Number of heterogeneous job components started thanks to backfilling since last Slurm start.

Total cycles
Number of scheduling cycles since last reset

Last cycle when
Time when last execution cycle happened in format "weekday Month MonthDay hour:minute.seconds
year"

Last cycle
Time in microseconds of last backfilling cycle. It counts only execution time removing sleep time
inside a scheduling cycle when it takes too much time. Note that locks are released during the
sleep time so that other work can proceed.

Max cycle
Time in microseconds of maximum backfilling cycle execution since last reset. It counts only
execution time removing sleep time inside a scheduling cycle when it takes too much time. Note
that locks are released during the sleep time so that other work can proceed.

Mean cycle
Mean of backfilling scheduling cycles in microseconds since last reset

Last depth cycle
Number of processed jobs during last backfilling scheduling cycle. It counts every process even if
it has no option to execute due to dependencies or limits.

Last depth cycle (try sched)
Number of processed jobs during last backfilling scheduling cycle. It counts only processes with a
chance to run waiting for available resources. These jobs are which makes the backfilling
algorithm heavier.

Depth Mean
Mean of processed jobs during backfilling scheduling cycles since last reset. Jobs which are
found to be ineligible to run when examined by the backfill scheduler are not counted (e.g. jobs
submitted to multiple partitions and already started, jobs which have reached a QOS or account
limit such as maximum running jobs for an account, etc).

Depth Mean (try sched)
Mean of processed jobs during backfilling scheduling cycles since last reset. Jobs which are
found to be ineligible to run when examined by the backfill scheduler are not counted (e.g. jobs
submitted to multiple partitions and already started, jobs which have reached a QOS or account
limit such as maximum running jobs for an account, etc).

Last queue length
Number of jobs pending to be processed by backfilling algorithm. A job once for each partition it
requested. A pending job array will normally be counted as one job (tasks of a job array which
have already been started/requeued or individually modified will already have individual job
records and are each counted as a separate job).

Queue length Mean
Mean of jobs pending to be processed by backfilling algorithm. A job once for each partition it
requested. A pending job array will normally be counted as one job (tasks of a job array which
have already been started/requeued or individually modified will already have individual job
records and are each counted as a separate job).

The fourth and fifth blocks of information report the most frequently issued remote procedure calls
(RPCs), calls made for the Slurmctld daemon to perform some action. The fourth block reports the RPCs
issued by message type. You will need to look up those RPC codes in the Slurm source code by looking
them up in the file src/common/slurm_protocol_defs.h. The report includes the number of times each RPC
is invoked, the total time consumed by all of those RPCs plus the average time consumed by each RPC in
microseconds. The fifth block reports the RPCs issued by user ID, the total number of RPCs they have
issued, the total time consumed by all of those RPCs plus the average time consumed by each RPC in
microseconds.

OPTIONS

       -a, --all
              Get and report information. This is the default mode of operation.

       -h, --help
              Print description of options and exit.

       -i, --sort-by-id
              Sort Remote Procedure Call (RPC) data by message type ID and user ID.

       -r, --reset
              Reset counters. Only supported for Slurm operators and administrators.

       -t, --sort-by-time
              Sort Remote Procedure Call (RPC) data by total run time.

       -T, --sort-by-time2
              Sort Remote Procedure Call (RPC) data by average run time.

       --usage
              Print list of options and exit.

       -V, --version
              Print current version number and exit.

ENVIRONMENT VARIABLES

       Some  sdiag  options  may be set via environment variables. These environment variables, along with their
       corresponding options, are listed below.  (Note: commandline options will always override these settings)

       SLURM_CONF          The location of the Slurm configuration file.

COPYING

       Copyright (C) 2010-2011 Barcelona Supercomputing Center.
       Copyright (C) 2010-2017 SchedMD LLC.

       Slurm is free software; you can redistribute it and/or modify it under  the  terms  of  the  GNU  General
       Public License as published by the Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       Slurm is distributed in the hope that it will be useful, but  WITHOUT  ANY  WARRANTY;  without  even  the
       implied  warranty  of  MERCHANTABILITY  or  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
       License for more details.