Provided by: slurm-client_21.08.5-2ubuntu1_amd64 bug

NAME

       acct_gather.conf - Slurm configuration file for the acct_gather plugins

DESCRIPTION

       acct_gather.conf  is  a  UTF8 formatted file which defines parameters used by Slurm's acct_gather related
       plugins.  The file location can be modified at system build time using the  DEFAULT_SLURM_CONF  parameter
       or  at  execution time by setting the SLURM_CONF environment variable. The file will always be located in
       the same directory as the slurm.conf file.

       Parameter names are case insensitive but parameter values are case sensitive.  Any text following  a  "#"
       in the configuration file is treated as a comment through the end of that line.  The size of each line in
       the file is limited to 1024 characters.

       Changes to the configuration file take effect upon restart of the Slurm daemons.

       The following acct_gather.conf parameters are defined to control the general behavior of various  plugins
       in Slurm.

       The  acct_gather.conf  file  is different than other Slurm .conf files. Each plugin defines which options
       are available. Each plugin to be  loaded  must  be  specified  in  the  slurm.conf  under  the  following
       configuration entries:

       • AcctGatherEnergyType (plugin type=acct_gather_energy)
       • AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
       • AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
       • AcctGatherProfileType (plugin type=acct_gather_profile)

       If  the respective plugin for an option is not loaded then that option will appear to be unknown by Slurm
       and silently ignored. If you decide to change plugin types you  may  also  have  to  change  the  related
       options.

acct_gather_energy/IPMI

       Required entry in slurm.conf:
              AcctGatherEnergyType=acct_gather_energy/ipmi

       Options used for acct_gather_energy/ipmi are as follows:

              EnergyIPMIFrequency=<number>
                        This parameter is the number of seconds between BMC access samples.

              EnergyIPMICalcAdjustment=<yes|no>
                        If  set  to  "yes",  the  consumption  between  the  last  BMC  access sample and a step
                        consumption  update  is  approximated  to  get  more  accurate  task  consumption.   The
                        adjustment is made at the step start and each time the consumption is updated, including
                        the step  end.  The  approximations  are  not  accumulated,  only  the  first  and  last
                        adjustments are used to calculated the consumption. The default is "no".

              EnergyIPMIPowerSensors=<key=values>
                        Optionally  specify  the  ids  of the sensors to used.  Multiple <key=values> can be set
                        with ";" separators.  The key "Node" is mandatory and  is  used  to  know  the  consumed
                        energy for nodes (scontrol show node) and jobs (sacct).  Other keys are optional and are
                        named by administrator.  These keys are useful only when profile is activated for energy
                        to store power (in watt) of each key.  <values> are integers, multiple values can be set
                        with  ","  separators.   The  sum  of  the  listed  sensors  is  used  for   each   key.
                        EnergyIPMIPowerSensors is optional, default value is "Node=number" where "number" is the
                        id of the first power sensor returned by ipmi-sensors.
                        i.e.
                        EnergyIPMIPowerSensors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
                        EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
                        EnergyIPMIPowerSensors=Node=1280

              The following acct_gather.conf parameters are defined to control the IPMI  config  default  values
              for libipmiconsole.

              EnergyIPMIUsername=USERNAME
                        Specify BMC Username.

              EnergyIPMIPassword=PASSWORD
                        Specify BMC Password.

acct_gather_energy/XCC

       Required entry in slurm.conf:
              AcctGatherEnergyType=acct_gather_energy/xcc

       Options  used  for  acct_gather_energy/xcc  include only in-band communications with XClarity Controller,
       thus a reduced set of configurations is supported:

              EnergyIPMIFrequency=<number>
                        This parameter is the number of seconds between  XCC  access  samples.   Default  is  30
                        seconds.

              EnergyIPMITimeout=<number>
                        Timeout,  in  seconds, for initializing the IPMI XCC context for a new gathering thread.
                        Default is 10 seconds.

acct_gather_profile/HDF5

       Required entry in slurm.conf:
              AcctGatherProfileType=acct_gather_profile/hdf5

       Options used for acct_gather_profile/hdf5 are as follows:

              ProfileHDF5Dir=<path>
                     This parameter is the path to the shared folder into which the  acct_gather_profile  plugin
                     will  write  detailed  data (usually as an HDF5 file).  The directory is assumed to be on a
                     file system shared by the controller and all compute nodes. This is a required parameter.

              ProfileHDF5Default
                     A comma-delimited list of data types to be collected  for  each  job  submission.   Allowed
                     values are:

                     All     All data types are collected. (Cannot be combined with other values.)

                     None    No  data  types are collected. This is the default.  (Cannot be combined with other
                             values.)

                     Energy  Energy data is collected.

                     Filesystem
                             File system (Lustre) data is collected.

                     Network Network (InfiniBand) data is collected.

                     Task    Task (I/O, Memory, ...) data is collected.

acct_gather_profile/InfluxDB

       Required entry in slurm.conf:
              AcctGatherProfileType=acct_gather_profile/influxdb

       The InfluxDB plugin provides the same information as the HDF5 plugin but will instead send information to
       the configured InfluxDB server.

       The InfluxDB plugin is designed against 1.x protocol of InfluxDB. Any site running a v2.x InfluxDB server
       will need to  configure  a  v1.x  compatibility  endpoint  along  with  the  correct  user  and  password
       authorization. Token authentication is not currently supported.

   Options:
       ProfileInfluxDBDatabase
              InfluxDB  v1.x  database  name where profiling information is to be written.  InfluxDB v2.x bucket
              name where profiling information is to be written.

       ProfileInfluxDBDefault
              A comma-delimited list of data types to be collected for each job submission.  Allowed values are:

              All       All data types are collected. Cannot be combined with other values.

              None      No data types are collected. This is the default.  Cannot be combined with other values.

              Energy    Energy data is collected.

              Filesystem
                        File system (Lustre) data is collected.

              Network   Network (InfiniBand) data is collected.

              Task      Task (I/O, Memory, ...) data is collected.

       ProfileInfluxDBHost=<hostname>:<port>
              The hostname of the machine where the InfluxDB instance is executed and the port used by the  HTTP
              API.  The  port  used by the HTTP API is the one configured through the bind-address influxdb.conf
              option in the [http] section.   Example:
              ProfileInfluxDBHost=myinfluxhost:8086

       ProfileInfluxDBPass
              Password for username configured in ProfileInfluxDBUser. Required in v2.x  and  optional  in  v1.x
              InfluxDB.

       ProfileInfluxDBRTPolicy
              The  InfluxDB  v1.x  retention  policy name for the database configured in ProfileInfluxDBDatabase
              option.  The  InfluxDB  v2.x  retention  policy  bucket  name  for  the  database  configured   in
              ProfileInfluxDBDatabase option.

       ProfileInfluxDBUser
              InfluxDB   username   that   should  be  used  to  gain  access  to  the  database  configured  in
              ProfileInfluxDBDatabase. Required in v2.x and optional in v1.x InfluxDB.  This is only  needed  if
              InfluxDB  v1.x  is  configured with authentication enabled in the [http] config section and a user
              has been granted at least WRITE access to the database. See also ProfileInfluxDBPass.

   NOTES:
       This plugin requires the libcurl development files to be installed and linkable at  configure  time.  The
       plugin will not build otherwise.

       Information on how to install and configure InfluxDB and manage databases, retention policies and such is
       available on the official webpage.

       Collected information is written from every compute node where  a  job  runs  to  the  InfluxDB  instance
       listening  on  the ProfileInfluxDBHost. In order to avoid overloading the InfluxDB instance with incoming
       connection requests, the plugin uses an internal buffer which is filled with samples. Once the buffer  is
       full, a HTTP API write request is performed and the buffer is emptied to hold subsequent samples. A final
       request is also performed when a task ends even if the buffer isn't full.

       Failed HTTP API write requests are silently discarded. This means that collected profile  information  in
       the plugin buffer is lost if it can't be written to the InfluxDB database for any reason.

       Plugin  messages are logged along with the slurmstepd logs to SlurmdLogFile. In order to troubleshoot any
       issues, it is recommended to temporarily increase the slurmd debug level to debug3 and add Profile to the
       debug  flags.  This can be accomplished by setting the slurm.conf SlurmdDebug and DebugFlags respectively
       or dynamically through scontrol setdebug and setdebugflags.

       Grafana can be used to create charts based on the data held by InfluxDB.  This kind of tool  permits  one
       to create dashboards, tables and other graphics using the stored time series.

acct_gather_interconnect/OFED

       Required entry in slurm.conf:
              AcctGatherInterconnectType=acct_gather_interconnect/ofed

       Options used for acct_gather_interconnect/ofed are as follows:

              InfinibandOFEDPort=<number>
                        This  parameter  represents  the  port  number  of the local Infiniband card that we are
                        willing to monitor.  The default port is 1.

EXAMPLE

       ###
       # Slurm acct_gather configuration file
       ###
       # Parameters for acct_gather_energy/impi plugin
       EnergyIPMIFrequency=10
       EnergyIPMICalcAdjustment=yes
       #
       # Parameters for acct_gather_profile/hdf5 plugin
       ProfileHDF5Dir=/app/slurm/profile_data
       # Parameters for acct_gather_interconnect/ofed plugin
       InfinibandOFEDPort=1

COPYING

       Copyright (C) 2012-2013 Bull.  Copyright (C) 2012-2021 SchedMD LLC.  Produced at Bull (cf, DISCLAIMER).

       This   file   is   part   of   Slurm,   a   resource    management    program.     For    details,    see
       <https://slurm.schedmd.com/>.

       Slurm  is  free  software;  you  can  redistribute it and/or modify it under the terms of the GNU General
       Public License as published by the Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       Slurm  is  distributed  in  the  hope  that it will be useful, but WITHOUT ANY WARRANTY; without even the
       implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.   See  the  GNU  General  Public
       License for more details.

SEE ALSO

       slurm.conf(5)