Provided by: slurm-client_19.05.5-1_amd64 bug

NAME

       acct_gather.conf - Slurm configuration file for the acct_gather plugins

DESCRIPTION

       acct_gather.conf  is  an ASCII file which defines parameters used by Slurm's acct_gather related plugins.
       The file location can be modified at system build time  using  the  DEFAULT_SLURM_CONF  parameter  or  at
       execution  time  by  setting  the SLURM_CONF environment variable. The file will always be located in the
       same directory as the slurm.conf file.

       Parameter names are case insensitive.  Any text following a "#" in the configuration file is treated as a
       comment through the end of that line.  The size of each line in the file is limited to  1024  characters.
       Changes to the configuration file take effect upon restart of Slurm daemons, daemon receipt of the SIGHUP
       signal, or execution of the command "scontrol reconfigure" unless otherwise noted.

       The  following acct_gather.conf parameters are defined to control the general behavior of various plugins
       in Slurm.

       The acct_gather.conf file is different than other Slurm .conf files.  Each plugin defines  which  options
       are  available.   So if you do not load the respective plugin for an option that option will appear to be
       unknown by Slurm and could cause Slurm not to load.  If you decide to change plugin types you might  also
       have to change the related options as well.

       EnergyIPMI
              Options used for AcctGatherEnergyType/ipmi are as follows:

              EnergyIPMIFrequency=<number>
                        This parameter is the number of seconds between BMC access samples.

              EnergyIPMICalcAdjustment=<yes|no>
                        If  set  to  "yes",  the  consumption  between  the  last  BMC  access sample and a step
                        consumption  update  is  approximated  to  get  more  accurate  task  consumption.   The
                        adjustment is made at the step start and each time the consumption is updated, including
                        the  step  end.  The  approximations  are  not  accumulated,  only  the  first  and last
                        adjustments are used to calculated the consumption. The default is "no".

              EnergyIPMIPowerSensors=<key=values>
                        Optionally specify the ids of the sensors to used.  Multiple  <key=values>  can  be  set
                        with  ";"  separators.   The  key  "Node"  is mandatory and is used to know the consumed
                        energy for nodes (scontrol show node) and jobs (sacct).  Other keys are optional and are
                        named by administrator.  These keys are useful only when profile is activated for energy
                        to store power (in watt) of each key.  <values> are integers, multiple values can be set
                        with  ","  separators.   The  sum  of  the  listed  sensors  is  used  for   each   key.
                        EnergyIPMIPowerSensors is optional, default value is "Node=number" where "number" is the
                        id of the first power sensor returned by ipmi-sensors.
                        i.e.
                        EnergyIPMIPowerSensors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
                        EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
                        EnergyIPMIPowerSensors=Node=1280

              The  following  acct_gather.conf  parameters are defined to control the IPMI config default values
              for libipmiconsole.

              EnergyIPMIUsername=USERNAME
                        Specify BMC Username.

              EnergyIPMIPassword=PASSWORD
                        Specify BMC Password.

       ProfileHDF5
              Options used for AcctGatherProfileType/hdf5 are as follows:

              ProfileHDF5Dir=<path>
                        This parameter is the path to the  shared  folder  into  which  the  acct_gather_profile
                        plugin  will write detailed data (usually as an HDF5 file).  The directory is assumed to
                        be on a file system shared by the controller and all compute nodes. This is  a  required
                        parameter.

              ProfileHDF5Default
                        A  comma  delimited list of data types to be collected for each job submission.  Allowed
                        values are:

                        All     All data types are collected. (Cannot be combined with other values.)

                        None    No data types are collected. This is the  default.   (Cannot  be  combined  with
                                other values.)

                        Energy  Energy data is collected.

                        Filesystem
                                File system (Lustre) data is collected.

                        Network Network (InfiniBand) data is collected.

                        Task    Task (I/O, Memory, ...) data is collected.

       ProfileInfluxDB
              Options used for AcctGatherProfileType/influxdb are as follows:

              ProfileInfluxDBDatabase
                        InfluxDB database name where profiling information is to be written.

              ProfileInfluxDBDefault
                        A  comma  delimited list of data types to be collected for each job submission.  Allowed
                        values are:

                        All     All data types are collected. (Cannot be combined with other values.)

                        None    No data types are collected. This is the  default.   (Cannot  be  combined  with
                                other values.)

                        Energy  Energy data is collected.

                        Filesystem
                                File system (Lustre) data is collected.

                        Network Network (InfiniBand) data is collected.

                        Task    Task (I/O, Memory, ...) data is collected.

              ProfileInfluxDBHost=<hostname>:<port>
                        The  hostname of the machine where the influxd instance is executed and the port used by
                        the HTTP API. The port used by the HTTP API is the  one  configured  through  the  bind-
                        address influxdb.conf option in the [http] section. Example:

                        ProfileInfluxDBHost=myinfluxhost:8086

              ProfileInfluxDBPass
                        Optional password for username configured in ProfileInfluxDBUser.

              ProfileInfluxDBRTPolicy
                        The    InfluxDB    retention    policy    name    for   the   database   configured   in
                        ProfileInfluxDBDatabase option.

              ProfileInfluxDBUser
                        Optional InfluxDB username that should be used to gain access to the database configured
                        in  ProfileInfluxDBDatabase.  This  is  only  needed   InfluxDB   is   configured   with
                        authentication enabled in the [http] config section and a user has been granted at least
                        WRITE access to the database. See also ProfileInfluxDBPass.

       NOTE:  This plugin requires the libcurl development files to be installed.

       NOTE:  Information  on how to install and configure InfluxDB and manage databases, retention policies and
              such is available on the official webpage.

       NOTE:  Collected information is written from every compute node where a job runs to the influxd  instance
              listening  on  the  ProfileInfluxDBHost.  In  order to avoid overloading the influxd instance with
              incoming connection requests, the plugin uses an internal buffer which  is  filled  with  samples.
              Once  the  buffer is full, a HTTP API write request is performed and the buffer is emptied to hold
              subsequent samples. A final request is also performed when a task ends even if  the  buffer  isn't
              full.

       NOTE:  Failed HTTP API write requests are discarded. This means that collected profile information in the
              plugin buffer is lost if it can't be written to the influxd database for any reason.

       NOTE:  Plugin  messages  are  logged  along  with  the  slurmstepd  logs  to  SlurmdLogFile.  In order to
              troubleshoot any issues, it is recommended to temporarily  increase  the  slurmd  debug  level  to
              debug3  and  add  Profile  to  the debug flags. This can be accomplished by setting the slurm.conf
              SlurmdDebug  and  DebugFlags  respectively  or   dynamically   through   scontrol   setdebug   and
              setdebugflags.

       NOTE:  Perhaps  it's  a  good  idea  to  use  a  monitoring  and analytics tool such as Grafana on top of
              InfluxDB. This kind of tools permit one to create dashboards, tables, and other graphics using the
              stored time series. This way, it is easier to correlate resource usage  peaks  reported  by  other
              node monitoring tools such as Ganglia with specific job step tasks.

       InfinibandOFED
              Options used for AcctGatherInfinbandType/ofed are as follows:

              InfinibandOFEDPort=<number>
                        This  parameter  represents  the  port  number  of the local Infiniband card that we are
                        willing to monitor.  The default port is 1.

EXAMPLE

       ###
       # Slurm acct_gather configuration file
       ###
       # Parameters for AcctGatherEnergy/impi plugin
       EnergyIPMIFrequency=10
       EnergyIPMICalcAdjustment=yes
       #
       # Parameters for AcctGatherProfileType/hdf5 plugin
       ProfileHDF5Dir=/app/slurm/profile_data
       # Parameters for AcctGatherInfiniband/ofed plugin
       InfinibandOFEDPort=1

COPYING

       Copyright (C) 2012-2013 Bull.  Produced at Bull (cf, DISCLAIMER).

       This   file   is   part   of   Slurm,   a   resource    management    program.     For    details,    see
       <https://slurm.schedmd.com/>.

       Slurm  is  free  software;  you  can  redistribute it and/or modify it under the terms of the GNU General
       Public License as published by the Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       Slurm is distributed in the hope that it will be useful, but  WITHOUT  ANY  WARRANTY;  without  even  the
       implied  warranty  of  MERCHANTABILITY  or  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
       License for more details.

SEE ALSO

       slurm.conf(5)

April 2015                                  Slurm Configuration File                         acct_gather.conf(5)