Provided by: slurm-client_19.05.5-1_amd64 bug

NAME

       acct_gather.conf - Slurm configuration file for the acct_gather plugins

DESCRIPTION

       acct_gather.conf  is  an  ASCII  file which defines parameters used by Slurm's acct_gather
       related plugins.  The file location can  be  modified  at  system  build  time  using  the
       DEFAULT_SLURM_CONF  parameter  or  at execution time by setting the SLURM_CONF environment
       variable. The file will always be located in the same directory as the slurm.conf file.

       Parameter names are case insensitive.  Any text following a "#" in the configuration  file
       is  treated  as a comment through the end of that line.  The size of each line in the file
       is limited to 1024 characters.  Changes to the configuration file take effect upon restart
       of  Slurm  daemons,  daemon  receipt  of  the  SIGHUP  signal, or execution of the command
       "scontrol reconfigure" unless otherwise noted.

       The following acct_gather.conf parameters are defined to control the general  behavior  of
       various plugins in Slurm.

       The  acct_gather.conf file is different than other Slurm .conf files.  Each plugin defines
       which options are available.  So if you do not load the respective plugin  for  an  option
       that  option will appear to be unknown by Slurm and could cause Slurm not to load.  If you
       decide to change plugin types you might also have to change the related options as well.

       EnergyIPMI
              Options used for AcctGatherEnergyType/ipmi are as follows:

              EnergyIPMIFrequency=<number>
                        This parameter is the number of seconds between BMC access samples.

              EnergyIPMICalcAdjustment=<yes|no>
                        If set to "yes", the consumption between the last BMC access sample and a
                        step  consumption  update  is  approximated  to  get  more  accurate task
                        consumption.  The adjustment is made at the step start and each time  the
                        consumption  is  updated,  including the step end. The approximations are
                        not accumulated,  only  the  first  and  last  adjustments  are  used  to
                        calculated the consumption. The default is "no".

              EnergyIPMIPowerSensors=<key=values>
                        Optionally specify the ids of the sensors to used.  Multiple <key=values>
                        can be set with ";" separators.  The key "Node" is mandatory and is  used
                        to  know  the  consumed  energy  for  nodes (scontrol show node) and jobs
                        (sacct).  Other keys are optional and are named by administrator.   These
                        keys  are useful only when profile is activated for energy to store power
                        (in watt) of each key.  <values> are integers, multiple values can be set
                        with "," separators.  The sum of the listed sensors is used for each key.
                        EnergyIPMIPowerSensors is optional, default value is "Node=number"  where
                        "number" is the id of the first power sensor returned by ipmi-sensors.
                        i.e.
                        EnergyIPMIPowerSensors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
                        EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
                        EnergyIPMIPowerSensors=Node=1280

              The following acct_gather.conf parameters are defined to control  the  IPMI  config
              default values for libipmiconsole.

              EnergyIPMIUsername=USERNAME
                        Specify BMC Username.

              EnergyIPMIPassword=PASSWORD
                        Specify BMC Password.

       ProfileHDF5
              Options used for AcctGatherProfileType/hdf5 are as follows:

              ProfileHDF5Dir=<path>
                        This  parameter  is  the  path  to  the  shared  folder  into  which  the
                        acct_gather_profile plugin will write detailed data (usually as  an  HDF5
                        file).   The  directory  is  assumed to be on a file system shared by the
                        controller and all compute nodes. This is a required parameter.

              ProfileHDF5Default
                        A comma delimited list of  data  types  to  be  collected  for  each  job
                        submission.  Allowed values are:

                        All     All  data  types  are  collected.  (Cannot be combined with other
                                values.)

                        None    No data types are collected. This is  the  default.   (Cannot  be
                                combined with other values.)

                        Energy  Energy data is collected.

                        Filesystem
                                File system (Lustre) data is collected.

                        Network Network (InfiniBand) data is collected.

                        Task    Task (I/O, Memory, ...) data is collected.

       ProfileInfluxDB
              Options used for AcctGatherProfileType/influxdb are as follows:

              ProfileInfluxDBDatabase
                        InfluxDB database name where profiling information is to be written.

              ProfileInfluxDBDefault
                        A  comma  delimited  list  of  data  types  to  be collected for each job
                        submission.  Allowed values are:

                        All     All data types are collected.  (Cannot  be  combined  with  other
                                values.)

                        None    No  data  types  are  collected. This is the default.  (Cannot be
                                combined with other values.)

                        Energy  Energy data is collected.

                        Filesystem
                                File system (Lustre) data is collected.

                        Network Network (InfiniBand) data is collected.

                        Task    Task (I/O, Memory, ...) data is collected.

              ProfileInfluxDBHost=<hostname>:<port>
                        The hostname of the machine where the influxd instance  is  executed  and
                        the  port  used by the HTTP API. The port used by the HTTP API is the one
                        configured through the bind-address influxdb.conf option  in  the  [http]
                        section. Example:

                        ProfileInfluxDBHost=myinfluxhost:8086

              ProfileInfluxDBPass
                        Optional password for username configured in ProfileInfluxDBUser.

              ProfileInfluxDBRTPolicy
                        The  InfluxDB  retention  policy  name  for  the  database  configured in
                        ProfileInfluxDBDatabase option.

              ProfileInfluxDBUser
                        Optional InfluxDB username that should be used  to  gain  access  to  the
                        database  configured  in  ProfileInfluxDBDatabase.  This  is  only needed
                        InfluxDB is configured with authentication enabled in the  [http]  config
                        section  and  a  user  has  been  granted  at  least  WRITE access to the
                        database. See also ProfileInfluxDBPass.

       NOTE:  This plugin requires the libcurl development files to be installed.

       NOTE:  Information on  how  to  install  and  configure  InfluxDB  and  manage  databases,
              retention policies and such is available on the official webpage.

       NOTE:  Collected  information  is  written from every compute node where a job runs to the
              influxd  instance  listening  on  the  ProfileInfluxDBHost.  In  order   to   avoid
              overloading the influxd instance with incoming connection requests, the plugin uses
              an internal buffer which is filled with samples. Once the buffer is  full,  a  HTTP
              API  write  request  is  performed  and  the  buffer  is emptied to hold subsequent
              samples. A final request is also performed when a task  ends  even  if  the  buffer
              isn't full.

       NOTE:  Failed  HTTP  API  write  requests are discarded. This means that collected profile
              information in the plugin buffer is lost if it can't  be  written  to  the  influxd
              database for any reason.

       NOTE:  Plugin  messages  are  logged  along  with the slurmstepd logs to SlurmdLogFile. In
              order to troubleshoot any issues, it is recommended  to  temporarily  increase  the
              slurmd  debug  level  to  debug3  and  add  Profile to the debug flags. This can be
              accomplished by setting the slurm.conf SlurmdDebug and DebugFlags  respectively  or
              dynamically through scontrol setdebug and setdebugflags.

       NOTE:  Perhaps  it's a good idea to use a monitoring and analytics tool such as Grafana on
              top of InfluxDB. This kind of tools permit one to create  dashboards,  tables,  and
              other  graphics  using  the stored time series. This way, it is easier to correlate
              resource usage peaks reported by other node monitoring tools such as  Ganglia  with
              specific job step tasks.

       InfinibandOFED
              Options used for AcctGatherInfinbandType/ofed are as follows:

              InfinibandOFEDPort=<number>
                        This  parameter  represents  the port number of the local Infiniband card
                        that we are willing to monitor.  The default port is 1.

EXAMPLE

       ###
       # Slurm acct_gather configuration file
       ###
       # Parameters for AcctGatherEnergy/impi plugin
       EnergyIPMIFrequency=10
       EnergyIPMICalcAdjustment=yes
       #
       # Parameters for AcctGatherProfileType/hdf5 plugin
       ProfileHDF5Dir=/app/slurm/profile_data
       # Parameters for AcctGatherInfiniband/ofed plugin
       InfinibandOFEDPort=1

COPYING

       Copyright (C) 2012-2013 Bull.  Produced at Bull (cf, DISCLAIMER).

       This  file  is  part  of  Slurm,  a  resource  management  program.   For   details,   see
       <https://slurm.schedmd.com/>.

       Slurm  is  free  software; you can redistribute it and/or modify it under the terms of the
       GNU General Public License as published by the Free Software Foundation; either version  2
       of the License, or (at your option) any later version.

       Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       GNU General Public License for more details.

SEE ALSO

       slurm.conf(5)