Provided by: slurm-client_24.05.4-1_amd64 bug

NAME

       gres.conf - Slurm configuration file for Generic RESource (GRES) management.

DESCRIPTION

       gres.conf is an ASCII file which describes the configuration of Generic RESource(s) (GRES)
       on each compute node.  If the GRES information in  the  slurm.conf  file  does  not  fully
       describe  those  resources, then a gres.conf file should be included on each compute node.
       For cloud nodes, a gres.conf file that includes all the cloud nodes must be on  all  cloud
       nodes  and  the  controller.  The  file  will  always  be located in the same directory as
       slurm.conf.

       If the GRES information in the slurm.conf file fully describes those  resources  (i.e.  no
       "Cores",  "File"  or  "Links"  specification  is  required  for  that  GRES  type  or that
       information is automatically detected), that information may be omitted from the gres.conf
       file  and  only  the  configuration  information in the slurm.conf file will be used.  The
       gres.conf file  may  be  omitted  completely  if  the  configuration  information  in  the
       slurm.conf file fully describes all GRES.

       If  using  the  gres.conf  file  to  describe  the resources available to nodes, the first
       parameter on the line  should  be  NodeName.  If  configuring  Generic  Resources  without
       specifying nodes, the first parameter on the line should be Name.

       Parameter  names are case insensitive.  Any text following a "#" in the configuration file
       is treated as a comment through the end of that line.  Changes to the  configuration  file
       take  effect  upon  restart  of  Slurm  daemons,  daemon  receipt of the SIGHUP signal, or
       execution of the command "scontrol reconfigure" unless otherwise noted.

       NOTE: Slurm support for gres/[mps|shard] requires the use of the select/cons_tres  plugin.
       For       more      information      on      how      to      configure      MPS,      see
       https://slurm.schedmd.com/gres.html#MPS_Management.   For  more  information  on  how   to
       configure Sharding, see https://slurm.schedmd.com/gres.html#Sharding.

       For      more      information      on     GRES     scheduling     in     general,     see
       https://slurm.schedmd.com/gres.html.

       The overall configuration parameters available include:

       AutoDetect
              The hardware detection mechanisms  to  enable  for  automatic  GRES  configuration.
              Currently, the options are:

              nrt    Automatically detect AWS Trainium/Inferentia devices.

              nvml   Automatically  detect  NVIDIA  GPUs.  Requires the NVIDIA Management Library
                     (NVML).

              off    Do not automatically detect any GPUs. Used to override other options.

              oneapi Automatically detect Intel GPUs. Requires the Intel Graphics Compute Runtime
                     for oneAPI Level Zero and OpenCL Driver (oneapi).

              rsmi   Automatically detect AMD GPUs. Requires the ROCm System Management Interface
                     (ROCm SMI) Library.

              AutoDetect can be on a line by itself, in which case it will globally apply to  all
              lines  in  gres.conf  by  default.  In  addition,  AutoDetect  can be combined with
              NodeName to only apply to certain nodes. Node-specific AutoDetects will  trump  the
              global  AutoDetect.  A node-specific AutoDetect only needs to be specified once per
              node. If specified multiple times for the same nodes, they must  all  be  the  same
              value.  To  unset AutoDetect for a node when a global AutoDetect is set, simply set
              it to "off" in a  node-specific  GRES  line.   E.g.:  NodeName=tux3  AutoDetect=off
              Name=gpu File=/dev/nvidia[0-3].  AutoDetect cannot be used with cloud nodes.

              AutoDetect  will  automatically detect files, cores, links, and any other hardware.
              If a parameter such as File, Cores, or Links are specified when AutoDetect is used,
              then  the  specified  values  are used to sanity check the auto detected values. If
              there is a mismatch, then the node's state is  set  to  invalid  and  the  node  is
              drained.

       Count  Number of resources of this name/type available on this node.  The default value is
              set to the number of File values specified (if any), otherwise the default value is
              one.  A  suffix  of "K", "M", "G", "T" or "P" may be used to multiply the number by
              1024, 1048576, 1073741824, etc. respectively.  For example: "Count=10G".

       Cores  Optionally specify the core index numbers for the specific cores which can use this
              resource.   For  example,  it may be strongly preferable to use specific cores with
              specific GRES devices (e.g. on a NUMA architecture).  While  Slurm  can  track  and
              assign  resources  at  the  CPU  or thread level, its scheduling algorithms used to
              co-allocate GRES devices with CPUs operates at a socket level (or NUMA  level  with
              numa_node_as_socket)  for  job  allocations.   Therefore  it  is  not  possible  to
              preferentially assign GRES with different specific CPUs on the same socket (or NUMA
              with  numa_node_as_socket) and this option should generally be used to identify all
              cores on some socket. Though, job step allocations that request n  portion  of  the
              job's  resources  with  --exact  and task binding through --gpu-d will both look at
              cores directly for which more specific core identification may be useful.

              Multiple cores may be specified using a comma-delimited list  or  a  range  may  be
              specified  using  a  "-"  separator  (e.g. "0,1,2,3" or "0-3").  If a job specifies
              --gres-flags=enforce-binding, then only the identified cores can be allocated  with
              each generic resource. This will tend to improve performance of jobs, but delay the
              allocation of resources to them.  If specified and a job is not submitted with  the
              --gres-flags=enforce-binding  option  the  identified  cores  will be preferred for
              scheduling with each generic resource.

              If --gres-flags=disable-binding is specified, then any core can be  used  with  the
              resources,  which  also increases the speed of Slurm's scheduling algorithm but can
              degrade the application performance.  The  --gres-flags=disable-binding  option  is
              currently  required  to  use  more  CPUs than are bound to a GRES (e.g. if a GPU is
              bound to the CPUs on one socket, but resources on more than one socket are required
              to  run  the job).  If any core can be effectively used with the resources, then do
              not specify the cores option for improved speed in the Slurm scheduling  logic.   A
              restart of the slurmctld is needed for changes to the Cores option to take effect.

              NOTE:  Since  Slurm  must  be  able to perform resource management on heterogeneous
              clusters having various processing unit numbering schemes,  a  logical  core  index
              must  be  specified  instead  of  the physical core index.  That logical core index
              might not correspond to your physical core index number.  Core 0 will be the  first
              core on the first socket, while core 1 will be the second core on the first socket.
              This numbering coincides with the logical core number (Core L#) seen in "lstopo -l"
              command output.

       File   Fully  qualified pathname of the device files associated with a resource.  The name
              can  include  a  numeric  range  suffix  to   be   interpreted   by   Slurm   (e.g.
              File=/dev/nvidia[0-3]).

              This  field is generally required if enforcement of generic resource allocations is
              to be supported (i.e. prevents users from making use of resources  allocated  to  a
              different  user).   Enforcement  of  the  file allocation relies upon Linux Control
              Groups (cgroups) and Slurm's task/cgroup plugin, which  will  place  the  allocated
              files  into  the  job's  cgroup and prevent use of other files.  Please see Slurm's
              Cgroups Guide for more information: https://slurm.schedmd.com/cgroups.html.

              If File is specified then Count must be either set to  the  number  of  file  names
              specified  or  not  set  (the default value is the number of files specified).  The
              exception to this is MPS/Sharding. For either of these  GRES,  each  GPU  would  be
              identified  by  device  file  using  the File parameter and Count would specify the
              number of entries that would correspond to that GPU. For MPS, typically 100 or some
              multiple  of  100.  For  Sharding  typically  the maximum number of jobs that could
              simultaneously share that GPU.

              If using a card with Multi-Instance GPU functionality, use  MultipleFiles  instead.
              File and MultipleFiles are mutually exclusive.

              NOTE: File is required for all gpu typed GRES.

              NOTE:  If  you  specify  the File parameter for a resource on some node, the option
              must be specified on all nodes and Slurm will track the assignment of each specific
              resource  on  each  node.  Otherwise  Slurm  will  only  track a count of allocated
              resources rather than the state of each individual device file.

              NOTE: Drain a node before changing the count of records with File parameters  (e.g.
              if  you  want to add or remove GPUs from a node's configuration).  Failure to do so
              will result in any job using those GRES being aborted.

              NOTE: When specifying File, Count is limited in  size  (currently  1024)  for  each
              node.

       Flags  Optional flags that can be specified to change configured behavior of the GRES.

              Allowed values at present are:

              CountOnly           Do  not  attempt to load a plugin of the GRES type as this GRES
                                  will only be used to track counts of  GRES  used.  This  avoids
                                  attempting   to  load  non-existent  plugin  which  can  affect
                                  filesystems  with  high   latency   metadata   operations   for
                                  non-existent files.

                                  NOTE:  If  a gres has this flag configured it is global, so all
                                  other nodes with that gres will have this flag implied.

              explicit            If the flag is set, GRES is not allocated to the job as part of
                                  whole  node  allocation (--exclusive or OverSubscribe=EXCLUSIVE
                                  set on partition) unless it was  explicitly  requested  by  the
                                  job.

                                  NOTE:  If  a gres has this flag configured it is global, so all
                                  other nodes with that gres will have this flag implied.

              one_sharing         To be used on a shared gres. If using a shared  gres  (mps)  on
                                  top  of a sharing gres (gpu) only allow one of the sharing gres
                                  to be used by the shared gres.  This is the default for MPS.

                                  NOTE: If a gres has this flag configured it is global,  so  all
                                  other  nodes  with  that gres will have this flag implied. This
                                  flag is not compatible with all_sharing for a specific gres.

              all_sharing         To  be  used  on  a  shared  gres.  This  is  the  opposite  of
                                  one_sharing  and can be used to allow all sharing gres (gpu) on
                                  a node to be used for shared gres (mps).

                                  NOTE: If a gres has this flag configured it is global,  so  all
                                  other  nodes  with  that gres will have this flag implied. This
                                  flag is not compatible with one_sharing for a specific gres.

              nvidia_gpu_env      Set environment variable CUDA_VISIBLE_DEVICES for all  GPUs  on
                                  the specified node(s).

              amd_gpu_env         Set  environment  variable ROCR_VISIBLE_DEVICES for all GPUs on
                                  the specified node(s).

              intel_gpu_env       Set environment variable ZE_AFFINITY_MASK for all GPUs  on  the
                                  specified node(s).

              opencl_env          Set environment variable GPU_DEVICE_ORDINAL for all GPUs on the
                                  specified node(s).

              no_gpu_env          Set no GPU-specific environment  variables.  This  is  mutually
                                  exclusive to all other environment-related flags.

              If  no  environment-related  flags are specified, then nvidia_gpu_env, amd_gpu_env,
              intel_gpu_env, and opencl_env will be implicitly set by default.  If AutoDetect  is
              used and environment-related flags are not specified, then AutoDetect=nvml will set
              nvidia_gpu_env, AutoDetect=rsmi will set amd_gpu_env,  and  AutoDetect=oneapi  will
              set  intel_gpu_env.   Conversely,  specified  environment-related flags will always
              override AutoDetect.

              Environment-related flags set on one GRES line will be inherited by the  GRES  line
              directly below it if no environment-related flags are specified on that line and if
              it is of the same node, name, and type. Environment-related flags must be the  same
              for GRES of the same node, name, and type.

              Note   that   there   is   a   known   issue   with  the  AMD  ROCm  runtime  where
              ROCR_VISIBLE_DEVICES  is  processed  first,  and   then   CUDA_VISIBLE_DEVICES   is
              processed.  To  avoid the issues caused by this, set Flags=amd_gpu_env for AMD GPUs
              so only ROCR_VISIBLE_DEVICES is set.

       Links  A comma-delimited list of numbers identifying the  number  of  connections  between
              this  device  and  other devices to allow coscheduling of better connected devices.
              This is an ordered list in which the number of connections this specific device has
              to device number 0 would be in the first position, the number of connections it has
              to device number 1 in the second position, etc.  A -1 indicates the  device  itself
              and  a  0 indicates no connection.  If specified, then this line can only contain a
              single GRES device (i.e. can only contain a single file via File).

              This is an optional value and is usually automatically determined if AutoDetect  is
              enabled.   A typical use case would be to identify GPUs having NVLink connectivity.
              Note that for GPUs, the minor number assigned by the OS and used in the device file
              (i.e.   the  X  in  /dev/nvidiaX)  is  not  necessarily  the  same  as  the  device
              number/index. The device number is created by sorting the GPUs by PCI  bus  ID  and
              then    numbering    them    starting    from    the    smallest   bus   ID.    See
              https://slurm.schedmd.com/gres.html#GPU_Management

       MultipleFiles
              Fully qualified pathname of the device files associated with a resource.   Graphics
              cards  using Multi-Instance GPU (MIG) technology will present multiple device files
              that should be managed as a single generic resource. The file names can be a  comma
              separated    list    or   it   can   include   a   numeric   range   suffix   (e.g.
              MultipleFiles=/dev/nvidia[0-3]).

              Drain a node before changing the count of records with the MultipleFiles parameter,
              such as when adding or removing GPUs from a node's configuration.  Failure to do so
              will result in any job using those GRES being aborted.

              When not using GPUs with MIG functionality, use File  instead.   MultipleFiles  and
              File are mutually exclusive.

       Name   Name  of the generic resource. Any desired name may be used.  The name must match a
              value in GresTypes in slurm.conf.  Each generic resource  has  an  optional  plugin
              which   can   provide  resource-specific  functionality.   Generic  resources  that
              currently include an optional plugin are:

              gpu    Graphics Processing Unit

              mps    CUDA Multi-Process Service (MPS)

              nic    Network Interface Card

              shard  Shards of a gpu

       NodeName
              An optional NodeName specification can be used to permit one gres.conf file  to  be
              used  for  all  compute nodes in a cluster by specifying the node(s) that each line
              should apply to.  The NodeName specification can use a Slurm hostlist specification
              as shown in the example below.

       Type   An  optional  arbitrary  string  identifying  the  type  of  generic resource.  For
              example, this might be used to identify a specific model of GPU,  which  users  can
              then  specify in a job request.  For changes to the Type option to take effect with
              a scontrol  reconfig  all  affected  slurmd  daemons  must  be  responding  to  the
              slurmctld.  Otherwise a restart of the slurmctld and slurmd daemons is required.

              NOTE:  If  using  autodetect  functionality and defining the Type in your gres.conf
              file, the Type specified should match or be  a  substring  of  the  value  that  is
              detected, using an underscore in lieu of any spaces.

EXAMPLES

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Define GPU devices with MPS support, with AutoDetect sanity checking
       ##################################################################
       AutoDetect=nvml
       Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
       Name=gpu Type=tesla  File=/dev/nvidia1 COREs=2,3
       Name=mps Count=100 File=/dev/nvidia0 COREs=0,1
       Name=mps Count=100  File=/dev/nvidia1 COREs=2,3

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Overwrite system defaults and explicitly configure three GPUs
       ##################################################################
       Name=gpu Type=tesla File=/dev/nvidia[0-1] COREs=0,1
       # Name=gpu Type=tesla  File=/dev/nvidia[2-3] COREs=2,3
       # NOTE: nvidia2 device is out of service
       Name=gpu Type=tesla  File=/dev/nvidia3 COREs=2,3

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Use a single gres.conf file for all compute nodes - positive method
       ##################################################################
       ## Explicitly specify devices on nodes tux0-tux15
       # NodeName=tux[0-15]  Name=gpu File=/dev/nvidia[0-3]
       # NOTE: tux3 nvidia1 device is out of service
       NodeName=tux[0-2]  Name=gpu File=/dev/nvidia[0-3]
       NodeName=tux3  Name=gpu File=/dev/nvidia[0,2-3]
       NodeName=tux[4-15]  Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Use NVML to gather GPU configuration information
       # for all nodes except one
       ##################################################################
       AutoDetect=nvml
       NodeName=tux3 AutoDetect=off Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Specify some nodes with NVML, some with RSMI, and some with no AutoDetect
       ##################################################################
       NodeName=tux[0-7] AutoDetect=nvml
       NodeName=tux[8-11] AutoDetect=rsmi
       NodeName=tux[12-15] Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Define 'bandwidth' GRES to use as a way to limit the
       # resource use on these nodes for workflow purposes
       ##################################################################
       NodeName=tux[0-7] Name=bandwidth Type=lustre Count=4G Flags=CountOnly

COPYING

       Copyright  (C)  2010  The  Regents  of the University of California.  Produced at Lawrence
       Livermore National Laboratory (cf, DISCLAIMER).
       Copyright (C) 2010-2022 SchedMD LLC.

       This  file  is  part  of  Slurm,  a  resource  management  program.   For   details,   see
       <https://slurm.schedmd.com/>.

       Slurm  is  free  software; you can redistribute it and/or modify it under the terms of the
       GNU General Public License as published by the Free Software Foundation; either version  2
       of the License, or (at your option) any later version.

       Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See  the
       GNU General Public License for more details.

SEE ALSO

       slurm.conf(5)