Provided by: slurm-client_24.05.2-1_amd64 bug

NAME

       cgroup.conf - Slurm configuration file for the cgroup support

DESCRIPTION

       cgroup.conf is an ASCII file which defines parameters used by Slurm's Linux cgroup related
       plugins.  The file will always be located in the same directory as the slurm.conf.

       Parameter names are case insensitive.  Any text following a "#" in the configuration  file
       is  treated  as a comment through the end of that line.  Changes to the configuration file
       take effect upon restart of Slurm  daemons,  daemon  receipt  of  the  SIGHUP  signal,  or
       execution of the command "scontrol reconfigure" unless otherwise noted.

       For    general    Slurm    cgroups    information,    see    the    Cgroups    Guide    at
       <https://slurm.schedmd.com/cgroups.html>.

       The following cgroup.conf parameters are defined to control the general behavior of  Slurm
       cgroup plugins.

       CgroupMountpoint=PATH
              Only  intended  for  development and testing. Specifies the PATH under which cgroup
              controllers should be mounted. The default PATH is /sys/fs/cgroup.

       CgroupPlugin=<cgroup/v1|cgroup/v2|autodetect|disabled>
              Specify the  plugin  to  be  used  when  interacting  with  the  cgroup  subsystem.
              Supported  values  are  "disabled",  to completely disable any interaction with the
              cgroups, "cgroup/v1" which supports the legacy interface of cgroup v1,  "cgroup/v2"
              for  the  unified  cgroup2  architecture, and "autodetect" which tries to determine
              which cgroup version does your system provide. "autodetect" is  useful  to  have  a
              single  configuration  in  clusters where nodes can have different cgroup versions.
              The default value is "autodetect".

       SystemdTimeout=<number>
              On slow systems like virtual machines or when systemd is busy, it can take a lot of
              time  to  initialize  and  prepare the scope for slurmd during startup.  Slurm will
              wait a maximum of this amount of time (in milliseconds) for the scope to  be  ready
              before failing. Only applies to cgroup/v2.  The default is 1000 ms.

       IgnoreSystemd=<yes|no>
              Only  for cgroup/v2 and for development and testing. It will avoid any call to dbus
              and contact with systemd,  and  instead  will  prepare  all  the  cgroup  hierarchy
              manually.  This option is dangerous in systems with systemd since the cgroup can be
              modified by systemd and cause issues to jobs.

       IgnoreSystemdOnFailure=<yes|no>
              Only for cgroup/v2 and for development and testing. It has similar functionality to
              IgnoreSystemd but only in the case that a dbus call does not succeed.

       EnableControllers=<yes|no>
              Only  for  cgroup/v2  and  generally  for development, testing, when running on old
              kernels and/or old systemd versions (e.g. RHEL8, systemd < 244) where not  all  the
              controllers  are  enabled  in the cgroup tree. With this parameter, slurmd gets the
              available controllers from root's  cgroup.controllers  file  (CgroupMountPoint,  by
              default  /sys/fs/cgroup/cgroup.controllers)  and makes them available in all levels
              of the cgroup tree until it reaches Slurm's cgroup leaf.

TASK/CGROUP PLUGIN

       The following  cgroup.conf  parameters  are  defined  to  control  the  behavior  of  this
       particular plugin:

       AllowedRAMSpace=<number>
              Constrain  the job/step cgroup RAM to this percentage of the allocated memory.  The
              percentage supplied may be expressed as floating point number,  e.g.  101.5.   Sets
              the  cgroup  soft  memory  limit  at  the  allocated  memory size and then sets the
              job/step hard memory limit at the (AllowedRAMSpace/100) * allocated memory. If  the
              job/step  exceeds  the hard limit, then it might trigger Out Of Memory (OOM) events
              (including oom-kill) which will be logged to  kernel  log  ring  buffer  (dmesg  in
              Linux).  Setting  AllowedRAMSpace  above  100  may cause system Out of Memory (OOM)
              events as it allows job/step to allocate more memory than configured to the  nodes.
              Reducing  configured node available memory to avoid system OOM events is suggested.
              Setting AllowedRAMSpace below 100 will result in jobs receiving  less  memory  than
              allocated and soft memory limit will set to the same value as the hard limit.  Also
              see ConstrainRAMSpace.  The default value is 100.

       AllowedSwapSpace=<number>
              Constrain the job cgroup swap space to this percentage of the allocated memory. The
              default  value  is 0, which means that RAM+Swap will be limited to AllowedRAMSpace.
              The supplied percentage may be expressed as a floating point number, e.g. 50.5.  If
              the  limit  is exceeded, the job steps will be killed and a warning message will be
              written  to  standard  error.   Also   see   ConstrainSwapSpace.    NOTE:   Setting
              AllowedSwapSpace  to 0 does not restrict the Linux kernel from using swap space. To
              control how the kernel uses swap space, see MemorySwappiness.

       ConstrainCores=<yes|no>
              If configured to "yes" then constrain allowed cores  to  the  subset  of  allocated
              resources.  This  functionality  makes  use  of the cpuset subsystem.  Due to a bug
              fixed in version 1.11.5 of HWLOC, the  task/affinity  plugin  may  be  required  in
              addition to task/cgroup for this to function properly.  The default value is "no".

       ConstrainDevices=<yes|no>
              If  configured  to  "yes"  then  constrain  the job's allowed devices based on GRES
              allocated resources. It uses the devices subsystem for that.  The default value  is
              "no".

       ConstrainRAMSpace=<yes|no>
              If  configured  to  "yes"  then constrain the job's RAM usage by setting the memory
              soft limit to the allocated memory and the hard limit to  the  allocated  memory  *
              AllowedRAMSpace.  The default value is "no", in which case the job's RAM limit will
              be set to its swap space limit if ConstrainSwapSpace is set to  "yes".  CR_*_Memory
              must  be  set  in  slurm.conf  for  this  parameter  to  take any effect.  Also see
              AllowedSwapSpace, AllowedRAMSpace and ConstrainSwapSpace.

              NOTE: When using ConstrainRAMSpace, if the combined memory used by all processes in
              a  step  is  greater  than  the  limit,  then the kernel will trigger an OOM event,
              killing one or more of the processes in the step. The step state will be marked  as
              OOM,  but  the  step  itself  will keep running and other processes in the step may
              continue to run as well.  This differs from the behavior of  OverMemoryKill,  where
              the whole step will be killed/cancelled.

              NOTE:  When enabled, ConstrainRAMSpace can lead to a noticeable decline in per-node
              job throughout. Sites with high-throughput requirements should carefully weigh  the
              tradeoff between per-node throughput, versus potential problems that can arise from
              unconstrained       memory       usage        on        the        node.        See
              <https://slurm.schedmd.com/high_throughput.html> for further discussion.

       ConstrainSwapSpace=<yes|no>
              If  configured  to  "yes"  then  constrain the job's swap space usage.  The default
              value is "no". Note that when set to "yes" and ConstrainRAMSpace is  set  to  "no",
              AllowedRAMSpace  is automatically set to 100% in order to limit the RAM+Swap amount
              to 100% of job's requirement plus the percent of allowed swap space. This amount is
              thus  set to both RAM and RAM+Swap limits. This means that in that particular case,
              ConstrainRAMSpace is automatically enabled with the same limit as the one  used  to
              constrain  swap  space. CR_*_Memory must be set in slurm.conf for this parameter to
              take any effect.  Also see AllowedSwapSpace.

       MaxRAMPercent=PERCENT
              Set an upper bound in percent of total RAM (configured RealMemory of the  node)  on
              the  RAM  constraint  for a job. This will be the memory constraint applied to jobs
              that are not explicitly allocated memory by Slurm (i.e. Slurm's  select  plugin  is
              not  configured  to  manage  memory  allocations).  The PERCENT may be an arbitrary
              floating point number. The default value is 100.

       MaxSwapPercent=PERCENT
              Set an upper bound (in percent of total RAM, configured RealMemory of the node)  on
              the  amount  of  RAM+Swap  that  may be used for a job. This will be the swap limit
              applied to jobs on systems where memory is not being explicitly allocated  to  job.
              The  PERCENT  may  be  an  arbitrary  floating  point number between 0 and 100. The
              default value is 100.

       MemorySwappiness=<number>
              Only for cgroup/v1.  Configure the kernel's priority  for  swapping  out  anonymous
              pages  (such  as  program  data)  verses file cache pages for the job cgroup. Valid
              values are between 0 and 100, inclusive. A value of  0  prevents  the  kernel  from
              swapping out program data. A value of 100 gives equal priority to swapping out file
              cache or anonymous pages. If not set, then the kernel's  default  swappiness  value
              will  be used. ConstrainSwapSpace must be set to yes in order for this parameter to
              be applied.

       MinRAMSpace=<number>
              Set a lower bound (in MB) on the  memory  limits  defined  by  AllowedRAMSpace  and
              AllowedSwapSpace.  This  prevents accidentally creating a memory cgroup with such a
              low limit that slurmstepd is immediately killed due to lack  of  RAM.  The  default
              limit is 30M.

PROCTRACK/CGROUP PLUGIN

       The  following  cgroup.conf  parameters  are  defined  to  control  the  behavior  of this
       particular plugin:

       SignalChildrenProcesses=<yes|no>
              If configured to "yes", then send signals (for  cancelling,  suspending,  resuming,
              etc.)  to all children processes in a job/step. Otherwise, only send signals to the
              parent process of a job/step. The default setting is "no".

DISTRIBUTION-SPECIFIC NOTES

       Debian and derivatives (e.g. Ubuntu) usually exclude the memory and memsw  (swap)  cgroups
       by  default.  To  include  them,  add the following parameters to the kernel command line:
       cgroup_enable=memory swapaccount=1

       This can usually be placed in /etc/default/grub inside the GRUB_CMDLINE_LINUX variable.  A
       command such as update-grub must be run after updating the file.

EXAMPLE

       /etc/slurm/cgroup.conf:
              This  example cgroup.conf file shows a configuration that enables the more commonly
              used cgroup enforcement mechanisms.

              ###
              # Slurm cgroup support configuration file.
              ###
              ConstrainCores=yes
              ConstrainDevices=yes
              ConstrainRAMSpace=yes
              ConstrainSwapSpace=yes

       /etc/slurm/slurm.conf:
              These are the entries required in slurm.conf to  activate  the  cgroup  enforcement
              mechanisms.  Make  sure  that the node definitions in your slurm.conf closely match
              the configuration as shown by "slurmd -C".  Either MemSpecLimit should  be  set  or
              RealMemory  should be defined with less than the actual amount of memory for a node
              to ensure that all system/non-job processes will  have  sufficient  memory  at  all
              times.  Sites  should also configure pam_slurm_adopt to ensure users can not escape
              the cgroups via ssh.

              ###
              # Slurm configuration entries for cgroups
              ###
              ProctrackType=proctrack/cgroup
              TaskPlugin=task/cgroup,task/affinity
              JobAcctGatherType=jobacct_gather/cgroup #optional for gathering metrics
              PrologFlags=Contain                     #X11 flag is also suggested

COPYING

       Copyright (C) 2010-2012  Lawrence  Livermore  National  Security.   Produced  at  Lawrence
       Livermore National Laboratory (cf, DISCLAIMER).
       Copyright (C) 2010-2022 SchedMD LLC.

       This   file   is  part  of  Slurm,  a  resource  management  program.   For  details,  see
       <https://slurm.schedmd.com/>.

       Slurm is free software; you can redistribute it and/or modify it under the  terms  of  the
       GNU  General Public License as published by the Free Software Foundation; either version 2
       of the License, or (at your option) any later version.

       Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
       GNU General Public License for more details.

SEE ALSO

       slurm.conf(5)