Ubuntu Manpage: gres.conf - Slurm configuration file for Generic RESource (GRES) management.

Provided by: slurm-client_21.08.5-2ubuntu1_amd64

NAME

       gres.conf - Slurm configuration file for Generic RESource (GRES) management.

DESCRIPTION

gres.conf is an ASCII file which describes the configuration of Generic RESource (GRES) on each compute
node. If the GRES information in the slurm.conf file does not fully describe those resources, then a
gres.conf file should be included on each compute node. The file location can be modified at system
build time using the DEFAULT_SLURM_CONF parameter or at execution time by setting the SLURM_CONF
environment variable. The file will always be located in the same directory as the slurm.conf file.

If the GRES information in the slurm.conf file fully describes those resources (i.e. no "Cores", "File"
or "Links" specification is required for that GRES type or that information is automatically detected),
that information may be omitted from the gres.conf file and only the configuration information in the
slurm.conf file will be used. The gres.conf file may be omitted completely if the configuration
information in the slurm.conf file fully describes all GRES.

If using the gres.conf file to describe the resources available to nodes, the first parameter on the line
should be NodeName. If configuring Generic Resources without specifying nodes, the first parameter on the
line should be Name.

Parameter names are case insensitive. Any text following a "#" in the configuration file is treated as a
comment through the end of that line. Changes to the configuration file take effect upon restart of
Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the command "scontrol reconfigure"
unless otherwise noted.

NOTE: Slurm support for gres/mps requires the use of the select/cons_tres plugin. For more information on
how to configure MPS, see https://slurm.schedmd.com/gres.html#MPS_Management.

For more information on GRES scheduling in general, see https://slurm.schedmd.com/gres.html.

The overall configuration parameters available include:

AutoDetect
The hardware detection mechanisms to enable for automatic GRES configuration. Currently, the
options are:

nvml Automatically detect NVIDIA GPUs

off Do not automatically detect any GPUs. Used to override other options.

rsmi Automatically detect AMD GPUs

AutoDetect can be on a line by itself, in which case it will globally apply to all lines in
gres.conf by default. In addition, AutoDetect can be combined with NodeName to only apply to
certain nodes. Node-specific AutoDetects will trump the global AutoDetect. A node-specific
AutoDetect only needs to be specified once per node. If specified multiple times for the same
nodes, they must all be the same value. To unset AutoDetect for a node when a global AutoDetect is
set, simply set it to "off" in a node-specific GRES line. E.g.: NodeName=tux3 AutoDetect=off
Name=gpu File=/dev/nvidia[0-3].

Count Number of resources of this type available on this node. The default value is set to the number
of File values specified (if any), otherwise the default value is one. A suffix of "K", "M", "G",
"T" or "P" may be used to multiply the number by 1024, 1048576, 1073741824, etc. respectively.
For example: "Count=10G".

Cores Optionally specify the core index numbers for the specific cores which can use this resource. For
example, it may be strongly preferable to use specific cores with specific GRES devices (e.g. on a
NUMA architecture). While Slurm can track and assign resources at the CPU or thread level, its
scheduling algorithms used to co-allocate GRES devices with CPUs operates at a socket or NUMA
level. Therefore it is not possible to preferentially assign GRES with different specific CPUs on
the same NUMA or socket and this option should be used to identify all cores on some socket.

Multiple cores may be specified using a comma-delimited list or a range may be specified using a
"-" separator (e.g. "0,1,2,3" or "0-3"). If a job specifies --gres-flags=enforce-binding, then
only the identified cores can be allocated with each generic resource. This will tend to improve
performance of jobs, but delay the allocation of resources to them. If specified and a job is not
submitted with the --gres-flags=enforce-binding option the identified cores will be preferred for
scheduling with each generic resource.

If --gres-flags=disable-binding is specified, then any core can be used with the resources, which
also increases the speed of Slurm's scheduling algorithm but can degrade the application
performance. The --gres-flags=disable-binding option is currently required to use more CPUs than
are bound to a GRES (i.e. if a GPU is bound to the CPUs on one socket, but resources on more than
one socket are required to run the job). If any core can be effectively used with the resources,
then do not specify the cores option for improved speed in the Slurm scheduling logic. A restart
of the slurmctld is needed for changes to the Cores option to take effect.

NOTE: Since Slurm must be able to perform resource management on heterogeneous clusters having
various processing unit numbering schemes, a logical core index must be specified instead of the
physical core index. That logical core index might not correspond to your physical core index
number. Core 0 will be the first core on the first socket, while core 1 will be the second core
on the first socket. This numbering coincides with the logical core number (Core L#) seen in
"lstopo -l" command output.

File Fully qualified pathname of the device files associated with a resource. The name can include a
numeric range suffix to be interpreted by Slurm (e.g. File=/dev/nvidia[0-3]).

This field is generally required if enforcement of generic resource allocations is to be supported
(i.e. prevents users from making use of resources allocated to a different user). Enforcement of
the file allocation relies upon Linux Control Groups (cgroups) and Slurm's task/cgroup plugin,
which will place the allocated files into the job's cgroup and prevent use of other files. Please
see Slurm's Cgroups Guide for more information: https://slurm.schedmd.com/cgroups.html.

If File is specified then Count must be either set to the number of file names specified or not
set (the default value is the number of files specified). The exception to this is MPS. For MPS,
each GPU would be identified by device file using the File parameter and Count would specify the
number of MPS entries that would correspond to that GPU (typically 100 or some multiple of 100).

NOTE: If you specify the File parameter for a resource on some node, the option must be specified
on all nodes and Slurm will track the assignment of each specific resource on each node. Otherwise
Slurm will only track a count of allocated resources rather than the state of each individual
device file.

NOTE: Drain a node before changing the count of records with File parameters (i.e. if you want to
add or remove GPUs from a node's configuration). Failure to do so will result in any job using
those GRES being aborted.

Flags Optional flags that can be specified to change configured behavior of the GRES.

Allowed values at present are:

CountOnly Do not attempt to load plugin as this GRES will only be used to track counts
of GRES used. This avoids attempting to load non-existent plugin which can
affect filesystems with high latency metadata operations for non-existent
files.

nvidia_gpu_env Set environment variable CUDA_VISIBLE_DEVICES for all GPUs on the specified
node(s).

amd_gpu_env Set environment variable ROCR_VISIBLE_DEVICES for all GPUs on the specified
node(s).

opencl_env Set environment variable GPU_DEVICE_ORDINAL for all GPUs on the specified
node(s).

no_gpu_env Set no GPU-specific environment variables. This is mutually exclusive to all
other environment-related flags.

If no environment-related flags are specified, then nvidia_gpu_env, amd_gpu_env, and opencl_env
will be implicitly set by default. If AutoDetect is used and environment-related flags are not
specified, then AutoDetect=nvml will set nvidia_gpu_env and AutoDetect=rsmi will set amd_gpu_env.
Conversely, specified environment-related flags will always override AutoDetect.

Environment-related flags set on one GRES line will be inherited by the GRES line directly below
it if no environment-related flags are specified on that line and if it is of the same node, name,
and type. Environment-related flags must be the same for GRES of the same node, name, and type.

Note that there is a known issue with the AMD ROCm runtime where ROCR_VISIBLE_DEVICES is processed
first, and then CUDA_VISIBLE_DEVICES is processed. To avoid the issues caused by this, set
Flags=amd_gpu_env for AMD GPUs so only ROCR_VISIBLE_DEVICES is set.

Links A comma-delimited list of numbers identifying the number of connections between this device and
other devices to allow coscheduling of better connected devices. This is an ordered list in which
the number of connections this specific device has to device number 0 would be in the first
position, the number of connections it has to device number 1 in the second position, etc. A -1
indicates the device itself and a 0 indicates no connection. If specified, then this line can
only contain a single GRES device (i.e. can only contain a single file via File).

This is an optional value and is usually automatically determined if AutoDetect is enabled. A
typical use case would be to identify GPUs having NVLink connectivity. Note that for GPUs, the
minor number assigned by the OS and used in the device file (i.e. the X in /dev/nvidiaX) is not
necessarily the same as the device number/index. The device number is created by sorting the GPUs
by PCI bus ID and then numbering them starting from the smallest bus ID. See
https://slurm.schedmd.com/gres.html#GPU_Management

Name Name of the generic resource. Any desired name may be used. The name must match a value in
GresTypes in slurm.conf. Each generic resource has an optional plugin which can provide
resource-specific functionality. Generic resources that currently include an optional plugin are:

gpu Graphics Processing Unit

mps CUDA Multi-Process Service (MPS)

nic Network Interface Card

NodeName
An optional NodeName specification can be used to permit one gres.conf file to be used for all
compute nodes in a cluster by specifying the node(s) that each line should apply to. The NodeName
specification can use a Slurm hostlist specification as shown in the example below.

Type An optional arbitrary string identifying the type of device. For example, this might be used to
identify a specific model of GPU, which users can then specify in a job request. If Type is
specified, then Count is limited in size (currently 1024). A restart of the slurmctld and slurmd
daemons is required for changes to the Type option to take effect.

EXAMPLES

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Define GPU devices with MPS support, with AutoDetect sanity checking
       ##################################################################
       AutoDetect=nvml
       Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
       Name=gpu Type=tesla  File=/dev/nvidia1 COREs=2,3
       Name=mps Count=100 File=/dev/nvidia0 COREs=0,1
       Name=mps Count=100  File=/dev/nvidia1 COREs=2,3

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Overwrite system defaults and explicitly configure three GPUs
       ##################################################################
       Name=gpu Type=tesla File=/dev/nvidia[0-1] COREs=0,1
       # Name=gpu Type=tesla  File=/dev/nvidia[2-3] COREs=2,3
       # NOTE: nvidia2 device is out of service
       Name=gpu Type=tesla  File=/dev/nvidia3 COREs=2,3

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Use a single gres.conf file for all compute nodes - positive method
       ##################################################################
       ## Explicitly specify devices on nodes tux0-tux15
       # NodeName=tux[0-15]  Name=gpu File=/dev/nvidia[0-3]
       # NOTE: tux3 nvidia1 device is out of service
       NodeName=tux[0-2]  Name=gpu File=/dev/nvidia[0-3]
       NodeName=tux3  Name=gpu File=/dev/nvidia[0,2-3]
       NodeName=tux[4-15]  Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Use NVML to gather GPU configuration information
       # for all nodes except one
       ##################################################################
       AutoDetect=nvml
       NodeName=tux3 AutoDetect=off Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Specify some nodes with NVML, some with RSMI, and some with no AutoDetect
       ##################################################################
       NodeName=tux[0-7] AutoDetect=nvml
       NodeName=tux[8-11] AutoDetect=rsmi
       NodeName=tux[12-15] Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Define 'bandwidth' GRES to use as a way to limit the
       # resource use on these nodes for workflow purposes
       ##################################################################
       NodeName=tux[0-7] Name=bandwidth Type=lustre Count=4G Flags=CountOnly

COPYING

       Copyright (C) 2010 The Regents of the University of California.  Produced at Lawrence Livermore  National
       Laboratory (cf, DISCLAIMER).
       Copyright (C) 2010-2021 SchedMD LLC.

       This    file    is    part    of    Slurm,   a   resource   management   program.    For   details,   see
       <https://slurm.schedmd.com/>.

       Slurm is free software; you can redistribute it and/or modify it under  the  terms  of  the  GNU  General
       Public License as published by the Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       Slurm  is  distributed  in  the  hope  that it will be useful, but WITHOUT ANY WARRANTY; without even the
       implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.   See  the  GNU  General  Public
       License for more details.

NAME

DESCRIPTION

EXAMPLES

COPYING

SEE ALSO