Ubuntu Manpage: slurmdbd.conf - Slurm Database Daemon (SlurmDBD) configuration file

NAME

       slurmdbd.conf - Slurm Database Daemon (SlurmDBD) configuration file

DESCRIPTION

slurmdbd.conf is an ASCII file which describes Slurm Database Daemon (SlurmDBD)
configuration information. The file will always be located in the same directory as the
slurm.conf.

The contents of the file are case insensitive except for the names of nodes and files. Any
text following a "#" in the configuration file is treated as a comment through the end of
that line. Changes to the configuration file take effect upon restart of SlurmDBD or
daemon receipt of the SIGHUP signal unless otherwise noted.

This file should be only on the computer where SlurmDBD executes and should only be
readable by the user which executes SlurmDBD (e.g. "slurm"). If the slurmdbd daemon is
started as user root and changes to another user ID, the configuration file will initially
be read as user root, but will be read as the other user ID in response to a SIGHUP
signal. This file should be protected from unauthorized access since it contains a
database password. The overall configuration parameters available include:

AllowNoDefAcct
Remove requirement for users to have a default account. Boolean, yes to turn on,
no (default) to enforce default accounts.

ArchiveDir
If ArchiveScript is not set the slurmdbd will generate a file that can be read in
anytime with sacctmgr load filename. This directory is where the file will be
placed after a purge event has happened and archive for that element is set to
true. Default is /tmp. The format for this files name is
$ArchiveDir/$ClusterName_$ArchiveObject_archive_$BeginTimeStamp_$endTimeStamp We
limit archive files to 50000 records per file. If more than 50000 records exist
during that time period, they will be written to a new file. Subsequent archive
files during the same time period will have ".<number>" appended to the file, for
example .2, with the number increasing by one for each file in the same time
period.

ArchiveEvents
When purging events also archive them. Boolean, yes to archive event data, no
otherwise. Default is no.

ArchiveJobs
When purging jobs also archive them. Boolean, yes to archive job data, no
otherwise. Default is no.

ArchiveResvs
When purging reservations also archive them. Boolean, yes to archive reservation
data, no otherwise. Default is no.

ArchiveScript
This script can be executed every time a rollup happens (every hour, day and
month), depending on the Purge*After options. This script is used to transfer
accounting records out of the database into an archive. It is used in place of the
internal process used to archive objects. The script is executed with no
arguments, and the following environment variables are set.

SLURM_ARCHIVE_EVENTS
1 for archive events 0 otherwise.

SLURM_ARCHIVE_LAST_EVENT
Time of last event start to archive.

SLURM_ARCHIVE_JOBS
1 for archive jobs 0 otherwise.

SLURM_ARCHIVE_LAST_JOB
Time of last job submit to archive.

SLURM_ARCHIVE_STEPS
1 for archive steps 0 otherwise.

SLURM_ARCHIVE_LAST_STEP
Time of last step start to archive.

SLURM_ARCHIVE_SUSPEND
1 for archive suspend data 0 otherwise.

SLURM_ARCHIVE_TXN
1 for archive transaction data 0 otherwise.

SLURM_ARCHIVE_USAGE
1 for archive usage data 0 otherwise.

SLURM_ARCHIVE_LAST_SUSPEND
Time of last suspend start to archive.

ArchiveSteps
When purging steps also archive them. Boolean, yes to archive step data, no
otherwise. Default is no.

ArchiveSuspend
When purging suspend data also archive it. Boolean, yes to archive suspend data,
no otherwise. Default is no.

ArchiveTXN
When purging transaction data also archive it. Boolean, yes to archive transaction
data, no otherwise. Default is no.

ArchiveUsage
When purging usage data (Cluster, Association and WCKey) also archive it. Boolean,
yes to archive transaction data, no otherwise. Default is no.

AuthInfo
Additional information to be used for authentication of communications with the
Slurm control daemon (slurmctld) on each cluster. The interpretation of this
option is specific to the configured AuthType. In the case of auth/munge, this can
be configured to use a Munge daemon specifically configured to provide
authentication between clusters while the default Munge daemon provides
authentication within a cluster. In that case, this will specify the pathname of
the socket to use. Per default this value is left unspecified, which results in the
default authentication mechanism being used.

AuthAltTypes
Command separated list of alternative authentication plugins that the slurmdbd will
permit for communication.

AuthAltParameters
Used to define alternative authentication plugins options. Multiple options may be
comma separated.

jwks= Absolute path to JWKS file. Only RS256 keys are supported, although other
key types may be listed in the file. If set, no HS256 key will be loaded by
default (and token generation is disabled), although the jwt_key setting may
be used to explicitly re-enable HS256 key use (and token generation).

jwt_key=
Absolute path to JWT key file. Key must be HS256, and should only be
accessible by SlurmUser.

AuthType
Define the authentication method for communications between Slurm components.
Acceptable values at present include "auth/munge", which is the default.
"auth/munge" indicates that LLNL's MUNGE system is to be used (this is the
supported authentication mechanism for Slurm; see "https://dun.github.io/munge/"
for more information). SlurmDBD must be terminated prior to changing the value of
AuthType and later restarted.

CommitDelay
How many seconds between commits on a connection from a Slurmctld. This speeds up
inserts into the database dramatically. If you are running a very high throughput
of jobs you should consider setting this. In testing, 1 second improves the
slurmdbd performance dramatically and reduces overhead. There is a small
probability of data loss though since this creates a window in which if the
slurmdbd seg faults or exits abnormally for any reason the data not committed could
be lost. While this situation should be very rare, it does present an extremely
small risk, but may be the only way to run in extremely heavy environments. In all
honesty, the risk is quite low, but still present.

CommunicationParameters
Comma separated options identifying communication options.

DisableIPv4 Disable IPv4 only operation for the slurmdbd. This should also be
set in your slurm.conf file.

EnableIPv6 Enable using IPv6 addresses for the slurmdbd. When using both IPv4
and IPv6, address family preferences will be based on your
/etc/gai.conf file. This should also be set in your slurm.conf file.

keepaliveinterval=#
Specifies the interval between keepalive probes on the socket
communications between the backup and primary slurmdbd. The default
value is 30 seconds.

keepaliveprobes=#
Specifies the number of keepalive probes sent on the socket
communications between the backup and primary slurmdbd. The default
value is 3.

keepalivetime=#
Specifies how long to wait before sending keepalive probes between
the primary and backup slurmdbd processes. The default value is 30
seconds.

DbdBackupHost
The short, or long, name of the machine where the backup Slurm Database Daemon is
executed (i.e. the name returned by the command "hostname -s"). This host must
have access to the same underlying database specified by the 'Storage' options
mentioned below.

DbdAddr
Name that DbdHost should be referred to in establishing a communications path. This
name will be used as an argument to the getaddrinfo() function for identification.
For example, "elx0000" might be used to designate the Ethernet address for node
"lx0000". By default the DbdAddr will be identical in value to DbdHost.

DbdHost
The short, or long, name of the machine where the Slurm Database Daemon is executed
(i.e. the name returned by the command "hostname -s"). This value must be
specified.

DbdPort
The port number that the Slurm Database Daemon (slurmdbd) listens to for work. The
default value is SLURMDBD_PORT as established at system build time. If no value is
explicitly specified, it will be set to 6819. This value must be equal to the
AccountingStoragePort parameter in the slurm.conf file.

DebugFlags
Defines specific subsystems which should provide more detailed event logging.
Multiple subsystems can be specified with comma separators. Most DebugFlags will
result in verbose logging for the identified subsystems and could impact
performance. Valid subsystems available today (with more to come) include:

DB_ARCHIVE
SQL statements/queries when dealing with archiving and purging the database.

DB_ASSOC
SQL statements/queries when dealing with associations in the database.

DB_EVENT
SQL statements/queries when dealing with (node) events in the database.

DB_JOB SQL statements/queries when dealing with jobs in the database.

DB_QOS SQL statements/queries when dealing with QOS in the database.

DB_QUERY
SQL statements/queries when dealing with transactions and such in the
database.

DB_RESERVATION
SQL statements/queries when dealing with reservations in the database.

DB_RESOURCE
SQL statements/queries when dealing with resources like licenses in the
database.

DB_STEP
SQL statements/queries when dealing with steps in the database.

DB_TRES
SQL statements/queries when dealing with trackable resources in the
database.

DB_USAGE
SQL statements/queries when dealing with usage queries and inserts in the
database.

DB_WCKEY
SQL statements/queries when dealing with wckeys in the database.

FEDERATION
SQL statements/queries when dealing with federations in the database.

DebugLevel
The level of detail to provide the Slurm Database Daemon's logs. The default value
is info.

quiet Log nothing

fatal Log only fatal errors

error Log only errors

info Log errors and general informational messages

verbose Log errors and verbose informational messages

debug Log errors and verbose informational messages and debugging messages

debug2 Log errors and verbose informational messages and more debugging messages

debug3 Log errors and verbose informational messages and even more debugging
messages

debug4 Log errors and verbose informational messages and even more debugging
messages

debug5 Log errors and verbose informational messages and even more debugging
messages

DebugLevelSyslog
The slurmdbd daemon will log events to the syslog file at the specified level of
detail. If not set, the slurmdbd daemon will log to syslog at level fatal, unless
there is no LogFile and it is running in the background, in which case it will log
to syslog at the level specified by DebugLevel (at fatal in the case that
DebugLevel is set to quiet) or it is run in the foreground, when it will be set to
quiet.

quiet Log nothing

fatal Log only fatal errors

error Log only errors

info Log errors and general informational messages

verbose Log errors and verbose informational messages

debug Log errors and verbose informational messages and debugging messages

debug2 Log errors and verbose informational messages and more debugging messages

debug3 Log errors and verbose informational messages and even more debugging
messages

debug4 Log errors and verbose informational messages and even more debugging
messages

debug5 Log errors and verbose informational messages and even more debugging
messages

NOTE: By default, Slurm's systemd service files start daemons in the foreground
with the -D option. This means that systemd will capture stdout/stderr output and
print that to syslog, independent of Slurm printing to syslog directly. To prevent
systemd from doing this, add "StandardOutput=null" and "StandardError=null" to the
respective service files or override files.

DefaultQOS
When adding a new cluster this will be used as the qos for the cluster unless
something is explicitly set by the admin with the create.

LogFile
Fully qualified pathname of a file into which the Slurm Database Daemon's logs are
written. The default value is none (performs logging via syslog).
See the section LOGGING in the slurm.conf man page if a pathname is specified.

LogTimeFormat
Format of the timestamp in slurmdbd log files. Accepted values are "iso8601",
"iso8601_ms", "rfc5424", "rfc5424_ms", "clock", and "short". The values ending in
"_ms" differ from the ones without in that fractional seconds with millisecond
precision are printed. The default value is "iso8601_ms". The "rfc5424" formats are
the same as the "iso8601" formats except that the timezone value is also shown. The
"clock" format shows a timestamp in microseconds retrieved with the C standard
clock() function. The "short" format is a short date and time format. The
"thread_id" format shows the timestamp in the C standard ctime() function form
without the year but including the microseconds, the daemon's process ID and the
current thread ID.

MaxQueryTimeRange
Return an error if a query is against too large of a time span, to prevent
ill-formed queries from causing performance problems within SlurmDBD. Default
value is INFINITE which allows any queries to proceed. Accepted time formats are
the same as the MaxTime option in slurm.conf. Operator and higher privileged users
are exempt from this restriction. Note that queries which attempt to return over
3GB of data will still fail to complete with ESLURM_RESULT_TOO_LARGE.

MessageTimeout
Time permitted for a round-trip communication to complete in seconds. Default value
is 10 seconds.

Parameters
Contains arbitrary comma separated parameters used to alter the behavior of the
slurmdbd.

PreserveCaseUser
When defining users do not force lower case which is the default behavior.

PidFile
Fully qualified pathname of a file into which the Slurm Database Daemon may write
its process ID. This may be used for automated signal processing. The default
value is "/var/run/slurmdbd.pid".

PluginDir
Identifies the places in which to look for Slurm plugins. This is a
colon-separated list of directories, like the PATH environment variable. The
default value is the prefix given at configure time + "/lib/slurm".

PrivateData
This controls what type of information is hidden from regular users. By default,
all information is visible to all users. User SlurmUser, root, and users with
AdminLevel=Admin can always view all information. Multiple values may be specified
with a comma separator. Acceptable values include:

accounts
prevents users from viewing any account definitions unless they are
coordinators of them.

events prevents users from viewing event information unless they have operator
status or above.

jobs prevents users from viewing job records belonging to other users unless they
are coordinators of the account running the job when using sacct.

reservations
restricts getting reservation information to users with operator status and
above.

usage prevents users from viewing usage of any other user. This applies to
sreport.

users prevents users from viewing information of any user other than themselves,
this also makes it so users can only see associations they deal with.
Coordinators can see associations of all users in the account they are
coordinator of, but can only see themselves when listing users.

PurgeEventAfter
Events happening on the cluster over this age are purged from the database. This
includes node down times and such. The time is a numeric value and is a number of
months. If you want to purge more often you can include "hours", or "days" behind
the numeric value to get those more frequent purges (i.e. a value of "12hours"
would purge everything older than 12 hours). The purge takes place at the start of
the each purge interval. For example, if the purge time is 2 months, the purge
would happen at the beginning of each month. If not set (default), then event
records are never purged.

PurgeJobAfter
Individual job records over this age are purged from the database. Aggregated
information will be preserved to "PurgeUsageAfter". The time is a numeric value
and is a number of months. If you want to purge more often you can include
"hours", or "days" behind the numeric value to get those more frequent purges (i.e.
a value of "12hours" would purge everything older than 12 hours). The purge takes
place at the start of the each purge interval. For example, if the purge time is 2
months, the purge would happen at the beginning of each month. If not set
(default), then job records are never purged.

PurgeResvAfter
Individual reservation records over this age are purged from the database.
Aggregated information will be preserved to "PurgeUsageAfter". The time is a
numeric value and is a number of months. If you want to purge more often you can
include "hours", or "days" behind the numeric value to get those more frequent
purges (i.e. a value of "12hours" would purge everything older than 12 hours). The
purge takes place at the start of the each purge interval. For example, if the
purge time is 2 months, the purge would happen at the beginning of each month. If
not set (default), then reservation records are never purged.

PurgeStepAfter
Individual job step records over this age are purged from the database. Aggregated
information will be preserved to "PurgeUsageAfter". The time is a numeric value
and is a number of months. If you want to purge more often you can include
"hours", or "days" behind the numeric value to get those more frequent purges (i.e.
a value of "12hours" would purge everything older than 12 hours). The purge takes
place at the start of the each purge interval. For example, if the purge time is 2
months, the purge would happen at the beginning of each month. If not set
(default), then job step records are never purged.

PurgeSuspendAfter
Records of individual suspend times for jobs over this age are purged from the
database. Aggregated information will be preserved to "PurgeUsageAfter". The time
is a numeric value and is a number of months. If you want to purge more often you
can include "hours", or "days" behind the numeric value to get those more frequent
purges (i.e. a value of "12hours" would purge everything older than 12 hours). The
purge takes place at the start of the each purge interval. For example, if the
purge time is 2 months, the purge would happen at the beginning of each month. If
not set (default), then suspend records are never purged.

PurgeTXNAfter
Records of individual transaction times for transactions over this age are purged
from the database. The time is a numeric value and is a number of months. If you
want to purge more often you can include "hours", or "days" behind the numeric
value to get those more frequent purges (i.e. a value of "12hours" would purge
everything older than 12 hours). The purge takes place at the start of the each
purge interval. For example, if the purge time is 2 months, the purge would happen
at the beginning of each month. If not set (default), then transaction records are
never purged.

PurgeUsageAfter
Usage Records (Cluster, Association and WCKey) over this age are purged from the
database. The time is a numeric value and is a number of months. If you want to
purge more often you can include "hours", or "days" behind the numeric value to get
those more frequent purges (i.e. a value of "12hours" would purge everything older
than 12 hours). The purge takes place at the start of the each purge interval.
For example, if the purge time is 2 months, the purge would happen at the beginning
of each month. If not set (default), then usage records are never purged.

SlurmUser
The name of the user that the slurmdbd daemon executes as. This user should match
the SlurmUser used for all instances of slurmctld that report to slurmdbd. It must
exist on the machine executing the Slurm Database Daemon and have the same UID as
the hosts on which slurmctld executes. For security purposes, a user other than
"root" is recommended. The default value is "root".

NOTE: If the SlurmUser for slurmctld is root you can still use a non-root SlurmUser
for slurmdbd (in any other case, both SlurmUsers should match) by explicitly
setting the user's AdminLevel to Admin. After adding a user in this way, you must
restart slurmctld.

StorageHost
Define the name of the host the database is running where we are going to store the
data. Ideally this should be the host on which slurmdbd executes.

StorageBackupHost
Define the name of the backup host the database is running where we are going to
store the data. This can be viewed as a backup solution when the StorageHost is
not responding. It is up to the backup solution to enforce the coherency of the
accounting information between the two hosts. With clustered database solutions
(active/passive HA), you would not need to use this feature. Default is none.

StorageLoc
Specify the name of the database as the location where accounting records are
written. Defaults to "slurm_acct_db".

StorageParameters
Comma separated list of key-value pair parameters. Currently supported values
include options to establish a secure connection to the database:

SSL_CERT
The path name of the client public key certificate file.

SSL_CA
The path name of the Certificate Authority (CA) certificate file.

SSL_CAPATH
The path name of the directory that contains trusted SSL CA certificate files.

SSL_KEY
The path name of the client private key file.

SSL_CIPHER
The list of permissible ciphers for SSL encryption.

StoragePass
Define the password used to gain access to the database to store the job accounting
data. The '#' character is not permitted in a password.

StoragePort
The port number that the Slurm Database Daemon (slurmdbd) communicates with the
database. Default is 3306.

StorageType
Define the accounting storage mechanism type. Acceptable values at present include
"accounting_storage/mysql". The value "accounting_storage/mysql" indicates that
accounting records should be written to a MySQL or MariaDB database specified by
the StorageLoc parameter. This value must be specified.

StorageUser
Define the name of the user we are going to connect to the database with to store
the job accounting data.

TCPTimeout
Time permitted for TCP connection to be established. Default value is 2 seconds.

TrackSlurmctldDown
Boolean yes or no. If set the slurmdbd will mark all idle resources on the cluster
as down when a slurmctld disconnects or is no longer reachable. The default is no.

TrackWCKey
Boolean yes or no. Used to set display and track of the Workload Characterization
Key. Must be set to track wckey usage. This must be set to generate rolled up
usage tables from WCKeys. NOTE: If TrackWCKey is set here and not in your various
slurm.conf files all jobs will be attributed to their default WCKey.

EXAMPLE

       #
       # Sample /etc/slurmdbd.conf
       #
       ArchiveEvents=yes
       ArchiveJobs=yes
       ArchiveResvs=yes
       ArchiveSteps=no
       ArchiveSuspend=no
       ArchiveTXN=no
       ArchiveUsage=no
       #ArchiveScript=/usr/sbin/slurm.dbd.archive
       AuthInfo=/var/run/munge/munge.socket.2
       AuthType=auth/munge
       DbdHost=db_host
       DebugLevel=info
       PurgeEventAfter=1month
       PurgeJobAfter=12month
       PurgeResvAfter=1month
       PurgeStepAfter=1month
       PurgeSuspendAfter=1month
       PurgeTXNAfter=12month
       PurgeUsageAfter=24month
       LogFile=/var/log/slurmdbd.log
       PidFile=/var/run/slurmdbd.pid
       SlurmUser=slurm_mgr
       StoragePass=password_to_database
       StorageType=accounting_storage/mysql
       StorageUser=database_mgr

COPYING

       Copyright (C) 2008-2010  Lawrence  Livermore  National  Security.   Produced  at  Lawrence
       Livermore National Laboratory (cf, DISCLAIMER).
       Copyright (C) 2010-2022 SchedMD LLC.

       This   file   is  part  of  Slurm,  a  resource  management  program.   For  details,  see
       <https://slurm.schedmd.com/>.

       Slurm is free software; you can redistribute it and/or modify it under the  terms  of  the
       GNU  General Public License as published by the Free Software Foundation; either version 2
       of the License, or (at your option) any later version.

       Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       GNU General Public License for more details.

FILES

       /etc/slurmdbd.conf

NAME

DESCRIPTION

EXAMPLE

COPYING

FILES

SEE ALSO