Provided by: slurm-client_21.08.5-2ubuntu2_amd64 bug

NAME

       cgroup.conf - Slurm configuration file for the cgroup support

DESCRIPTION

       cgroup.conf is an ASCII file which defines parameters used by Slurm's Linux cgroup related
       plugins.   The  file  location  can  be  modified  at  system   build   time   using   the
       DEFAULT_SLURM_CONF  parameter  or  at execution time by setting the SLURM_CONF environment
       variable. The file will always be located in the same directory as the slurm.conf file.

       Parameter names are case insensitive.  Any text following a "#" in the configuration  file
       is  treated  as a comment through the end of that line.  Changes to the configuration file
       take effect upon restart of Slurm  daemons,  daemon  receipt  of  the  SIGHUP  signal,  or
       execution of the command "scontrol reconfigure" unless otherwise noted.

       For    general    Slurm    cgroups    information,    see    the    Cgroups    Guide    at
       <https://slurm.schedmd.com/cgroups.html>.

       The following cgroup.conf parameters are defined to control the general behavior of  Slurm
       cgroup plugins.

       CgroupAutomount=<yes|no>
              Slurm  cgroup  plugins  require valid and functional cgroup subsystem to be mounted
              under  /sys/fs/cgroup/<subsystem_name>.   When  launched,   plugins   check   their
              subsystem   availability.   If  not  available,  the  plugin  launch  fails  unless
              CgroupAutomount is set to yes. In that case, the plugin will first try to mount the
              required subsystems.

       CgroupMountpoint=PATH
              Specify  the  PATH under which cgroups should be mounted. This should be a writable
              directory which will contain cgroups mounted one per subsystem. The default PATH is
              /sys/fs/cgroup.

       CgroupPlugin=<cgroup/v1|autodetect>
              Specify  the  plugin  to  be  used  when  interacting  with  the  cgroup subsystem.
              Supported values at the moment are  only  "cgroup/v1"  which  supports  the  legacy
              interface  of  cgroup  v1,  or  "autodetect"  which tries to determine which cgroup
              version does your system  provide.  This  is  useful  if  nodes  have  support  for
              different cgroup versions. The default value is "autodetect".

TASK/CGROUP PLUGIN

       The  following  cgroup.conf  parameters  are  defined  to  control  the  behavior  of this
       particular plugin:

       AllowedKmemSpace=<number>
              Constrain the job cgroup kernel memory to this  amount  of  the  allocated  memory,
              specified in bytes. The AllowedKmemSpace must be between the upper and lower memory
              limits,  specified   by   MaxKmemPercent   and   MinKmemSpace,   respectively.   If
              AllowedKmemSpace  goes  beyond  the  upper or lower limit, it will be reset to that
              upper or lower limit, whichever has been exceeded.

       AllowedRAMSpace=<number>
              Constrain the job/step cgroup RAM to this percentage of the allocated memory.   The
              percentage  supplied  may  be expressed as floating point number, e.g. 101.5.  Sets
              the cgroup soft memory limit at  the  allocated  memory  size  and  then  sets  the
              job/step  hard memory limit at the (AllowedRAMSpace/100) * allocated memory. If the
              job/step exceeds the hard limit, then it might trigger Out Of Memory  (OOM)  events
              (including  oom-kill)  which  will  be  logged  to kernel log ring buffer (dmesg in
              Linux). Setting AllowedRAMSpace above 100 may cause  system  Out  of  Memory  (OOM)
              events  as it allows job/step to allocate more memory than configured to the nodes.
              Reducing configured node available memory to avoid system OOM events is  suggested.
              Setting  AllowedRAMSpace  below  100 will result in jobs receiving less memory than
              allocated and soft memory limit will set to the same value as the hard limit.  Also
              see ConstrainRAMSpace.  The default value is 100.

       AllowedSwapSpace=<number>
              Constrain  the  job  cgroup  swap space to this percentage of the allocated memory.
              The  default  value  is  0,  which  means  that  RAM+Swap  will   be   limited   to
              AllowedRAMSpace.  The  supplied  percentage  may  be  expressed as a floating point
              number, e.g. 50.5.  If the limit is exceeded, the job steps will be  killed  and  a
              warning  message  will  be written to standard error.  Also see ConstrainSwapSpace.
              NOTE: Setting AllowedSwapSpace to 0 does not restrict the Linux kernel  from  using
              swap space. To control how the kernel uses swap space, see MemorySwappiness.

       ConstrainCores=<yes|no>
              If  configured  to  "yes"  then  constrain allowed cores to the subset of allocated
              resources. This functionality makes use of the cpuset  subsystem.   Due  to  a  bug
              fixed  in  version  1.11.5  of  HWLOC,  the task/affinity plugin may be required in
              addition to task/cgroup for this to function properly.  The default value is "no".

       ConstrainDevices=<yes|no>
              If configured to "yes" then constrain the  job's  allowed  devices  based  on  GRES
              allocated  resources. It uses the devices subsystem for that.  The default value is
              "no".

       ConstrainKmemSpace=<yes|no>
              If configured to "yes" then constrain the job's Kmem RAM usage in addition  to  RAM
              usage.  Only  takes  effect  if  ConstrainRAMSpace is set to "yes". If enabled, the
              job's Kmem limit will be assigned the value of AllowedKmemSpace or the value coming
              from  MaxKmemPercent.   The  default  value  is  "no" which will leave Kmem setting
              untouched by Slurm.  Also see AllowedKmemSpace, MaxKmemPercent.

       ConstrainRAMSpace=<yes|no>
              If configured to "yes" then constrain the job's RAM usage  by  setting  the  memory
              soft  limit  to  the  allocated memory and the hard limit to the allocated memory *
              AllowedRAMSpace.  The default value is "no", in which case the job's RAM limit will
              be  set  to  its  swap space limit if ConstrainSwapSpace is set to "yes".  Also see
              AllowedSwapSpace, AllowedRAMSpace and ConstrainSwapSpace.

              NOTE: When using ConstrainRAMSpace, if the combined memory used by all processes in
              a  step  is  greater  than  the  limit,  then the kernel will trigger an OOM event,
              killing one or more of the processes in the step. The step state will be marked  as
              OOM,  but  the  step  itself  will keep running and other processes in the step may
              continue to run as well.  This differs from the behavior of  OverMemoryKill,  where
              the whole step will be killed/cancelled.

              NOTE:  When enabled, ConstrainRAMSpace can lead to a noticeable decline in per-node
              job throughout. Sites with high-throughput requirements should carefully weigh  the
              tradeoff between per-node throughput, versus potential problems that can arise from
              unconstrained       memory       usage        on        the        node.        See
              <https://slurm.schedmd.com/high_throughput.html> for further discussion.

       ConstrainSwapSpace=<yes|no>
              If  configured  to  "yes"  then  constrain the job's swap space usage.  The default
              value is "no". Note that when set to "yes" and ConstrainRAMSpace is  set  to  "no",
              AllowedRAMSpace  is automatically set to 100% in order to limit the RAM+Swap amount
              to 100% of job's requirement plus the percent of allowed swap space. This amount is
              thus  set to both RAM and RAM+Swap limits. This means that in that particular case,
              ConstrainRAMSpace is automatically enabled with the same limit as the one  used  to
              constrain swap space.  Also see AllowedSwapSpace.

       MaxRAMPercent=PERCENT
              Set  an  upper bound in percent of total RAM on the RAM constraint for a job.  This
              will be the memory constraint applied to jobs that  are  not  explicitly  allocated
              memory  by  Slurm  (i.e.  Slurm's  select plugin is not configured to manage memory
              allocations). The PERCENT may be an arbitrary floating point  number.  The  default
              value is 100.

       MaxSwapPercent=PERCENT
              Set  an upper bound (in percent of total RAM) on the amount of RAM+Swap that may be
              used for a job. This will be the swap limit applied to jobs on systems where memory
              is  not being explicitly allocated to job. The PERCENT may be an arbitrary floating
              point number between 0 and 100.  The default value is 100.

       MaxKmemPercent=PERCENT
              Set an upper bound in percent of total RAM as the  maximum  Kmem  for  a  job.  The
              PERCENT  may  be  an  arbitrary  floating  point  number,  however,  the product of
              MaxKmemPercent and job requested memory has to fall between  MinKmemSpace  and  job
              requested memory, otherwise the boundary value is used. The default value is 100.

       MemorySwappiness=<number>
              Configure  the  kernel's priority for swapping out anonymous pages (such as program
              data) verses file cache pages for the job cgroup. Valid values are  between  0  and
              100,  inclusive. A value of 0 prevents the kernel from swapping out program data. A
              value of 100 gives equal priority to swapping out file cache or anonymous pages. If
              not   set,   then   the   kernel's   default   swappiness   value   will  be  used.
              ConstrainSwapSpace must be set to yes in order for this parameter to be applied.

       MinKmemSpace=<number>
              Set a lower bound (in MB) on the memory limits  defined  by  AllowedKmemSpace.  The
              default limit is 30M.

       MinRAMSpace=<number>
              Set  a  lower  bound  (in  MB)  on the memory limits defined by AllowedRAMSpace and
              AllowedSwapSpace. This prevents accidentally creating a memory cgroup with  such  a
              low  limit  that  slurmstepd  is immediately killed due to lack of RAM. The default
              limit is 30M.

DISTRIBUTION-SPECIFIC NOTES

       Debian and derivatives (e.g. Ubuntu) usually exclude the memory and memsw  (swap)  cgroups
       by  default.  To  include  them,  add the following parameters to the kernel command line:
       cgroup_enable=memory swapaccount=1

       This can usually be placed in /etc/default/grub inside the GRUB_CMDLINE_LINUX variable.  A
       command such as update-grub must be run after updating the file.

EXAMPLE

       /etc/slurm/cgroup.conf:
              This  example cgroup.conf file shows a configuration that enables the more commonly
              used cgroup enforcement mechanisms.

              ###
              # Slurm cgroup support configuration file.
              ###
              CgroupAutomount=yes
              CgroupMountpoint=/sys/fs/cgroup
              ConstrainCores=yes
              ConstrainDevices=yes
              ConstrainKmemSpace=no        #avoid known Kernel issues
              ConstrainRAMSpace=yes
              ConstrainSwapSpace=yes

       /etc/slurm/slurm.conf:
              These are the entries required in slurm.conf to  activate  the  cgroup  enforcement
              mechanisms.  Make  sure  that the node definitions in your slurm.conf closely match
              the configuration as shown by "slurmd -C".  Either MemSpecLimit should  be  set  or
              RealMemory  should be defined with less than the actual amount of memory for a node
              to ensure that all system/non-job processes will  have  sufficient  memory  at  all
              times.  Sites  should also configure pam_slurm_adopt to ensure users can not escape
              the cgroups via ssh.

              ###
              # Slurm configuration entries for cgroups
              ###
              ProctrackType=proctrack/cgroup
              TaskPlugin=task/cgroup,task/affinity
              JobAcctGatherType=jobacct_gather/cgroup #optional for gathering metrics
              PrologFlags=Contain                     #X11 flag is also suggested

COPYING

       Copyright (C) 2010-2012  Lawrence  Livermore  National  Security.   Produced  at  Lawrence
       Livermore National Laboratory (cf, DISCLAIMER).
       Copyright (C) 2010-2021 SchedMD LLC.

       This   file   is  part  of  Slurm,  a  resource  management  program.   For  details,  see
       <https://slurm.schedmd.com/>.

       Slurm is free software; you can redistribute it and/or modify it under the  terms  of  the
       GNU  General Public License as published by the Free Software Foundation; either version 2
       of the License, or (at your option) any later version.

       Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       GNU General Public License for more details.

SEE ALSO

       slurm.conf(5)