oracular (8) slurmd.8.gz

Provided by: slurmd_24.05.2-1_amd64 bug

NAME

       slurmd - The compute node daemon for Slurm.

SYNOPSIS

       slurmd [OPTIONS...]

DESCRIPTION

       slurmd  is  the compute node daemon of Slurm. It monitors all tasks running on the compute
       node , accepts work (tasks), launches tasks, and kills running tasks upon request.

OPTIONS

       --authinfo
              Used with configless to set an alternate AuthInfo parameter to be used to establish
              communication  with  slurmctld  before  the  configuration file has been retrieved.
              (E.g., to specify an alternate MUNGE socket location.)

       -b     Report node rebooted when daemon restarted. Used for testing purposes.

       -c     Clear system locks as needed. This may be required if slurmd terminated abnormally.

       -C     Print the actual hardware configuration (not the configuration from the  slurm.conf
              file) and exit.  The format of output is the same as used in slurm.conf to describe
              a node's configuration plus its uptime.

       --conf <node parameters>
              Used in conjunction with the -Z option.  Used  to  override  or  define  additional
              parameters  of  a  dynamic node using the same syntax and parameters used to define
              nodes  in  the  slurm.conf.  Specifying  any  of  CPUs,  Boards,   SocketsPerBoard,
              CoresPerSocket  or  ThreadsPerCore  will  override  the  defaults defined by the -C
              option. NodeName and Port are not supported.

              For example if slurmd -C reports
              NodeName=node1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=31848

              the  following  --conf  specifications  will  generate   the   corresponding   node
              definitions:
              --conf "Gres=gpu:2"
              NodeName=node1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=31848 Gres=gpu:2

              --conf "RealMemory=30000"
              NodeName=node1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=30000

              --conf "CPUs=16"
              NodeName=node1 CPUs=16 RealMemory=331848

              --conf "CPUs=16 RealMemory=30000 Gres=gpu:2"
              NodeName=node1 CPUs=16 RealMemory=30000 Gres=gpu:2"

       --conf-server <host>[:<port>]
              Comma-separated  list of controllers, the first being the primary slurmctld. A port
              can (optionally) be specified for each controller. These hosts are where the slurmd
              will fetch the configuration from when running in "configless" mode.

       -d <file>
              Specify  the  fully  qualified  pathname  to  the slurmstepd program to be used for
              shepherding user job steps. This can be useful for testing purposes.

       -D     Run slurmd in the foreground. Error and debug messages will be copied to stderr.

       --extra <arbitrary string>
              Set  "extra"  data  on   node   startup.   If   this   is   a   json   string   and
              SchedulerParameters=extra_constraints  is  set in slurm.conf, then jobs may use the
              --extra option to filter based on this "extra" data.

       -f <file>
              Read configuration from the specified file. See NOTES below.

       -F[feature]
              Start this node as a Dynamic Future node. It will try to match  a  node  definition
              with  a  state  of FUTURE, optionally using the specified feature to match the node
              definition.

       -G     Print Generic RESource (GRES) configuration (based upon slurm.conf GRES merged with
              gres.conf contents for this node) and exit.

       -h     Help; print a brief summary of command options.

       --instance-id <cloud instance id>
              Set cloud instance ID on node startup.

       --instance-type <cloud instance type>
              Set cloud instance type on node startup.

       -L <file>
              Write log messages to the specified file.

       -M     Lock  slurmd  pages  into system memory using mlockall (2) to disable paging of the
              slurmd process. This may help in cases where nodes are marked DOWN  during  periods
              of  heavy swap activity. If the mlockall (2) system call is not available, an error
              will be printed to the log and slurmd will continue as normal.

              It is suggested to set LaunchParameters=slurmstepd_memlock  in  slurm.conf(5)  when
              setting -M.

       -n <value>
              Set  the  daemon's  nice value to the specified value, typically a negative number.
              Also note the PropagatePrioProcess configuration parameter.

       -N <nodename>
              Run the daemon with the given nodename. Used to emulate a larger system  with  more
              than  one  slurmd  daemon  per  node.  Requires  that  Slurm  be  built  using  the
              --enable-multiple-slurmd configure option.

       -s     Change working directory of  slurmd  to  SlurmdLogFile  path  if  possible,  or  to
              SlurmdSpoolDir otherwise. If both of them fail it will fallback to /var/tmp.

       --systemd
              Use when starting the daemon with systemd. This will allow slurmd to notify systemd
              of the new PID when using 'scontrol reconfigure'.

       -v     Verbose operation. Multiple -v's increase verbosity.

       -V, --version
              Print version information and exit.

       -Z     Start this node as a Dynamic Normal node. If  no  --conf  is  specified,  then  the
              slurmd  will  register  with  the  same hardware configuration as defined by the -C
              option.

ENVIRONMENT VARIABLES

       The following environment variables can be used to override settings compiled into slurmd.

       SLURM_CONF          The location of the Slurm configuration file. This  is  overridden  by
                           explicitly naming a configuration file on the command line.

       SLURM_DEBUG_FLAGS   Specify  debug  flags  for  slurmd  to  use.  See  DebugFlags  in  the
                           slurm.conf(5) man page for a  full  list  of  flags.  The  environment
                           variable takes precedence over the setting in the slurm.conf.

SIGNALS

       SIGTERM SIGINT
              slurmd will shutdown cleanly, waiting for in-progress rollups to finish.

       SIGHUP Reloads the slurm configuration files, similar to 'scontrol reconfigure'.

       SIGUSR2
              Reread the log level from the configs, and then reopen the log file. This should be
              used when setting up logrotate(8).

       SIGPIPE
              This signal is explicitly ignored.

CORE FILE LOCATION

       If slurmd is started with the -D option then the core file will be written to the  current
       working  directory.   Otherwise  if SlurmdLogFile is a fully qualified path name (starting
       with a slash), the core file will be written to  the  same  directory  as  the  log  file.
       Otherwise the core file will be written to the SlurmdSpoolDir directory, or "/var/tmp/" as
       a last resort. If none of the above directories can be  written,  no  core  file  will  be
       produced.

NOTES

       It  may  be  useful  to experiment with different slurmd specific configuration parameters
       using a distinct configuration file (e.g. timeouts). However, this  special  configuration
       file  will  not  be  used  by  the  slurmctld  daemon  or  the  Slurm programs, unless you
       specifically tell each of them to use it. If you desire changing communication ports,  the
       location of the temporary file system, or other parameters used by other Slurm components,
       change the common configuration file, slurm.conf.

       If you are using configless mode with a login node that runs a lot of client commands, you
       may  consider  running  slurmd  on  that  machine so it can manage a cached version of the
       configuration files. Otherwise, each client command will use the DNS record to contact the
       controller and get the configuration information, which could place additional load on the
       controller.

COPYING

       Copyright (C) 2002-2007 The Regents  of  the  University  of  California.   Copyright  (C)
       2008-2010  Lawrence  Livermore  National  Security.   Copyright (C) 2010-2022 SchedMD LLC.
       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).

       This  file  is  part  of  Slurm,  a  resource  management  program.   For   details,   see
       <https://slurm.schedmd.com/>.

       Slurm  is  free  software; you can redistribute it and/or modify it under the terms of the
       GNU General Public License as published by the Free Software Foundation; either version  2
       of the License, or (at your option) any later version.

       Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
       even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See  the
       GNU General Public License for more details.

FILES

       /etc/slurm.conf

SEE ALSO

       slurm.conf(5), slurmctld(8)