Provided by: slurm-client_21.08.5-2ubuntu1_amd64 bug

NAME

       acct_gather.conf - Slurm configuration file for the acct_gather plugins

DESCRIPTION

       acct_gather.conf  is  a  UTF8  formatted  file  which  defines  parameters used by Slurm's
       acct_gather related plugins.  The file location can be modified at system build time using
       the   DEFAULT_SLURM_CONF  parameter  or  at  execution  time  by  setting  the  SLURM_CONF
       environment variable. The file will always  be  located  in  the  same  directory  as  the
       slurm.conf file.

       Parameter  names  are  case insensitive but parameter values are case sensitive.  Any text
       following a "#" in the configuration file is treated as a comment through the end of  that
       line.  The size of each line in the file is limited to 1024 characters.

       Changes to the configuration file take effect upon restart of the Slurm daemons.

       The  following  acct_gather.conf parameters are defined to control the general behavior of
       various plugins in Slurm.

       The acct_gather.conf file is different than other Slurm .conf files. Each  plugin  defines
       which  options are available. Each plugin to be loaded must be specified in the slurm.conf
       under the following configuration entries:

       • AcctGatherEnergyType (plugin type=acct_gather_energy)
       • AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
       • AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
       • AcctGatherProfileType (plugin type=acct_gather_profile)

       If the respective plugin for an option is not loaded then that option will  appear  to  be
       unknown  by  Slurm and silently ignored. If you decide to change plugin types you may also
       have to change the related options.

acct_gather_energy/IPMI

       Required entry in slurm.conf:
              AcctGatherEnergyType=acct_gather_energy/ipmi

       Options used for acct_gather_energy/ipmi are as follows:

              EnergyIPMIFrequency=<number>
                        This parameter is the number of seconds between BMC access samples.

              EnergyIPMICalcAdjustment=<yes|no>
                        If set to "yes", the consumption between the last BMC access sample and a
                        step  consumption  update  is  approximated  to  get  more  accurate task
                        consumption.  The adjustment is made at the step start and each time  the
                        consumption  is  updated,  including the step end. The approximations are
                        not accumulated,  only  the  first  and  last  adjustments  are  used  to
                        calculated the consumption. The default is "no".

              EnergyIPMIPowerSensors=<key=values>
                        Optionally specify the ids of the sensors to used.  Multiple <key=values>
                        can be set with ";" separators.  The key "Node" is mandatory and is  used
                        to  know  the  consumed  energy  for  nodes (scontrol show node) and jobs
                        (sacct).  Other keys are optional and are named by administrator.   These
                        keys  are useful only when profile is activated for energy to store power
                        (in watt) of each key.  <values> are integers, multiple values can be set
                        with "," separators.  The sum of the listed sensors is used for each key.
                        EnergyIPMIPowerSensors is optional, default value is "Node=number"  where
                        "number" is the id of the first power sensor returned by ipmi-sensors.
                        i.e.
                        EnergyIPMIPowerSensors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
                        EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
                        EnergyIPMIPowerSensors=Node=1280

              The following acct_gather.conf parameters are defined to control  the  IPMI  config
              default values for libipmiconsole.

              EnergyIPMIUsername=USERNAME
                        Specify BMC Username.

              EnergyIPMIPassword=PASSWORD
                        Specify BMC Password.

acct_gather_energy/XCC

       Required entry in slurm.conf:
              AcctGatherEnergyType=acct_gather_energy/xcc

       Options  used for acct_gather_energy/xcc include only in-band communications with XClarity
       Controller, thus a reduced set of configurations is supported:

              EnergyIPMIFrequency=<number>
                        This parameter is the number  of  seconds  between  XCC  access  samples.
                        Default is 30 seconds.

              EnergyIPMITimeout=<number>
                        Timeout,  in  seconds,  for  initializing  the IPMI XCC context for a new
                        gathering thread. Default is 10 seconds.

acct_gather_profile/HDF5

       Required entry in slurm.conf:
              AcctGatherProfileType=acct_gather_profile/hdf5

       Options used for acct_gather_profile/hdf5 are as follows:

              ProfileHDF5Dir=<path>
                     This  parameter  is  the  path  to  the  shared  folder   into   which   the
                     acct_gather_profile  plugin  will  write  detailed  data (usually as an HDF5
                     file).  The directory is assumed to be  on  a  file  system  shared  by  the
                     controller and all compute nodes. This is a required parameter.

              ProfileHDF5Default
                     A  comma-delimited  list  of  data  types  to  be  collected  for  each  job
                     submission.  Allowed values are:

                     All     All data  types  are  collected.  (Cannot  be  combined  with  other
                             values.)

                     None    No  data  types  are  collected.  This  is  the default.  (Cannot be
                             combined with other values.)

                     Energy  Energy data is collected.

                     Filesystem
                             File system (Lustre) data is collected.

                     Network Network (InfiniBand) data is collected.

                     Task    Task (I/O, Memory, ...) data is collected.

acct_gather_profile/InfluxDB

       Required entry in slurm.conf:
              AcctGatherProfileType=acct_gather_profile/influxdb

       The InfluxDB plugin provides the same information as the HDF5 plugin but will instead send
       information to the configured InfluxDB server.

       The  InfluxDB plugin is designed against 1.x protocol of InfluxDB. Any site running a v2.x
       InfluxDB server will need to configure  a  v1.x  compatibility  endpoint  along  with  the
       correct user and password authorization. Token authentication is not currently supported.

   Options:
       ProfileInfluxDBDatabase
              InfluxDB v1.x database name where profiling information is to be written.  InfluxDB
              v2.x bucket name where profiling information is to be written.

       ProfileInfluxDBDefault
              A comma-delimited list of data types to  be  collected  for  each  job  submission.
              Allowed values are:

              All       All data types are collected. Cannot be combined with other values.

              None      No  data  types  are  collected. This is the default.  Cannot be combined
                        with other values.

              Energy    Energy data is collected.

              Filesystem
                        File system (Lustre) data is collected.

              Network   Network (InfiniBand) data is collected.

              Task      Task (I/O, Memory, ...) data is collected.

       ProfileInfluxDBHost=<hostname>:<port>
              The hostname of the machine where the InfluxDB instance is executed  and  the  port
              used  by  the HTTP API. The port used by the HTTP API is the one configured through
              the bind-address influxdb.conf option in the [http] section.   Example:
              ProfileInfluxDBHost=myinfluxhost:8086

       ProfileInfluxDBPass
              Password for username configured  in  ProfileInfluxDBUser.  Required  in  v2.x  and
              optional in v1.x InfluxDB.

       ProfileInfluxDBRTPolicy
              The   InfluxDB   v1.x   retention  policy  name  for  the  database  configured  in
              ProfileInfluxDBDatabase option. The InfluxDB v2.x retention policy bucket name  for
              the database configured in ProfileInfluxDBDatabase option.

       ProfileInfluxDBUser
              InfluxDB  username that should be used to gain access to the database configured in
              ProfileInfluxDBDatabase. Required in v2.x and optional in v1.x InfluxDB.   This  is
              only  needed  if  InfluxDB  v1.x  is  configured with authentication enabled in the
              [http] config section and a user has been granted at  least  WRITE  access  to  the
              database. See also ProfileInfluxDBPass.

   NOTES:
       This  plugin  requires  the  libcurl  development  files  to  be installed and linkable at
       configure time. The plugin will not build otherwise.

       Information on how to install and  configure  InfluxDB  and  manage  databases,  retention
       policies and such is available on the official webpage.

       Collected  information is written from every compute node where a job runs to the InfluxDB
       instance listening on the ProfileInfluxDBHost. In order to avoid overloading the  InfluxDB
       instance  with  incoming  connection requests, the plugin uses an internal buffer which is
       filled with samples. Once the buffer is full, a HTTP API write request  is  performed  and
       the buffer is emptied to hold subsequent samples. A final request is also performed when a
       task ends even if the buffer isn't full.

       Failed HTTP API write requests are silently discarded. This means that  collected  profile
       information  in  the plugin buffer is lost if it can't be written to the InfluxDB database
       for any reason.

       Plugin messages are logged along with the slurmstepd logs to SlurmdLogFile.  In  order  to
       troubleshoot  any issues, it is recommended to temporarily increase the slurmd debug level
       to debug3 and add Profile to the debug flags. This can  be  accomplished  by  setting  the
       slurm.conf  SlurmdDebug  and  DebugFlags  respectively  or  dynamically  through  scontrol
       setdebug and setdebugflags.

       Grafana can be used to create charts based on the data held by  InfluxDB.   This  kind  of
       tool  permits  one  to  create dashboards, tables and other graphics using the stored time
       series.

acct_gather_interconnect/OFED

       Required entry in slurm.conf:
              AcctGatherInterconnectType=acct_gather_interconnect/ofed

       Options used for acct_gather_interconnect/ofed are as follows:

              InfinibandOFEDPort=<number>
                        This parameter represents the port number of the  local  Infiniband  card
                        that we are willing to monitor.  The default port is 1.

EXAMPLE

       ###
       # Slurm acct_gather configuration file
       ###
       # Parameters for acct_gather_energy/impi plugin
       EnergyIPMIFrequency=10
       EnergyIPMICalcAdjustment=yes
       #
       # Parameters for acct_gather_profile/hdf5 plugin
       ProfileHDF5Dir=/app/slurm/profile_data
       # Parameters for acct_gather_interconnect/ofed plugin
       InfinibandOFEDPort=1

COPYING

       Copyright (C) 2012-2013 Bull.  Copyright (C) 2012-2021 SchedMD LLC.  Produced at Bull (cf,
       DISCLAIMER).

       This  file  is  part  of  Slurm,  a  resource  management  program.   For   details,   see
       <https://slurm.schedmd.com/>.

       Slurm  is  free  software; you can redistribute it and/or modify it under the terms of the
       GNU General Public License as published by the Free Software Foundation; either version  2
       of the License, or (at your option) any later version.

       Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       GNU General Public License for more details.

SEE ALSO

       slurm.conf(5)