Provided by: sbd_1.3.1-2_amd64 bug

NAME

       sbd - STONITH Block Device daemon

SYNOPSIS

       sbd <-d /dev/...> [options] "command"

SUMMARY

       SBD provides a node fencing mechanism (Shoot the other node in the head, STONITH) for Pacemaker-based
       clusters through the exchange of messages via shared block storage such as for example a SAN, iSCSI,
       FCoE. This isolates the fencing mechanism from changes in firmware version or dependencies on specific
       firmware controllers, and it can be used as a STONITH mechanism in all configurations that have reliable
       shared storage.

       SBD can also be used without any shared storage. In this mode, the watchdog device will be used to reset
       the node if it loses quorum, if any monitored daemon is lost and not recovered or if Pacemaker decides
       that the node requires fencing.

       The sbd binary implements both the daemon that watches the message slots as well as the management tool
       for interacting with the block storage device(s). This mode of operation is specified via the "command"
       parameter; some of these modes take additional parameters.

       To use SBD with shared storage, you must first "create" the messaging layout on one to three block
       devices. Second, configure /etc/sysconfig/sbd to list those devices (and possibly adjust other options),
       and restart the cluster stack on each node to ensure that "sbd" is started. Third, configure the
       "external/sbd" fencing resource in the Pacemaker CIB.

       Each of these steps is documented in more detail below the description of the command options.

       "sbd" can only be used as root.

   GENERAL OPTIONS
       -d /dev/...
           Specify  the  block  device(s) to be used. If you have more than one, specify this option up to three
           times. This parameter is mandatory for all modes, since SBD always needs a block device  to  interact
           with.

           This  man page uses /dev/sda1, /dev/sdb1, and /dev/sdc1 as example device names for brevity. However,
           in your production environment, you should instead always refer to them by  using  the  long,  stable
           device name (e.g., /dev/disk/by-id/dm-uuid-part1-mpath-3600508b400105b5a0001500000250000).

       -v  Enable some verbose debug logging.

       -h  Display a concise summary of "sbd" options.

       -n node
           Set local node name; defaults to "uname -n". This should not need to be set.

       -R  Do  not  enable  realtime  priority.  By  default, "sbd" runs at realtime priority, locks itself into
           memory, and also acquires highest IO priority to  protect  itself  against  interference  from  other
           processes on the system. This is a debugging-only option.

       -I N
           Async IO timeout (defaults to 3 seconds, optional). You should not need to adjust this unless your IO
           setup is really very slow.

           (In  daemon  mode,  the  watchdog is refreshed when the majority of devices could be read within this
           time.)

   create
       Example usage:

               sbd -d /dev/sdc2 -d /dev/sdd3 create

       If you specify the create command, sbd will write a metadata header to the device(s) specified  and  also
       initialize the messaging slots for up to 255 nodes.

       Warning: This command will not prompt for confirmation. Roughly the first megabyte of the specified block
       device(s) will be overwritten immediately and without backup.

       This  command  accepts  a  few options to adjust the default timings that are written to the metadata (to
       ensure they are identical across all nodes accessing the device).

       -1 N
           Set watchdog timeout to N seconds. This depends mostly on  your  storage  latency;  the  majority  of
           devices must be successfully read within this time, or else the node will self-fence.

           If  your  sbd  device(s)  reside  on  a multipath setup or iSCSI, this should be the time required to
           detect a path failure. You may be able to reduce this if your device outages are independent,  or  if
           you are using the Pacemaker integration.

       -2 N
           Set slot allocation timeout to N seconds. You should not need to tune this.

       -3 N
           Set daemon loop timeout to N seconds. You should not need to tune this.

       -4 N
           Set  msgwait  timeout to N seconds. This should be twice the watchdog timeout. This is the time after
           which a message written to a node's slot will be considered delivered. (Or long enough for  the  node
           to detect that it needed to self-fence.)

           This also affects the stonith-timeout in Pacemaker's CIB; see below.

   list
       Example usage:

               # sbd -d /dev/sda1 list
               0       hex-0   clear
               1       hex-7   clear
               2       hex-9   clear

       List  all  allocated  slots on device, and messages. You should see all cluster nodes that have ever been
       started against this device. Nodes that are currently running should have a clear state; nodes that  have
       been fenced, but not yet restarted, will show the appropriate fencing message.

   dump
       Example usage:

               # sbd -d /dev/sda1 dump
               ==Dumping header on disk /dev/sda1
               Header version     : 2
               Number of slots    : 255
               Sector size        : 512
               Timeout (watchdog) : 15
               Timeout (allocate) : 2
               Timeout (loop)     : 1
               Timeout (msgwait)  : 30
               ==Header on disk /dev/sda1 is dumped

       Dump meta-data header from device.

   watch
       Example usage:

               sbd -d /dev/sdc2 -d /dev/sdd3 -P watch

       This  command  will  make  "sbd" start in daemon mode. It will constantly monitor the message slot of the
       local node for incoming messages, reachability, and optionally take Pacemaker's state into account.

       "sbd" must be started on boot before the cluster stack! See below for enabling  this  according  to  your
       boot environment.

       The  options for this mode are rarely specified directly on the commandline directly, but most frequently
       set via /etc/sysconfig/sbd.

       It also constantly monitors connectivity to the storage device, and self-fences  in  case  the  partition
       becomes unreachable, guaranteeing that it does not disconnect from fencing messages.

       A  node  slot  is  automatically allocated on the device(s) the first time the daemon starts watching the
       device; hence, manual allocation is not usually required.

       If a watchdog is used together with the "sbd" as is strongly recommended, the watchdog  is  activated  at
       initial  start  of  the  sbd daemon. The watchdog is refreshed every time the majority of SBD devices has
       been successfully read. Using a watchdog provides additional protection against "sbd" crashing.

       If the Pacemaker integration is activated, "sbd" will not self-fence if device majority is lost, if:

       1.  The partition the node is in is still quorate according to the CIB;

       2.  it is still quorate according to Corosync's node count;

       3.  the node itself is considered online and healthy by Pacemaker.

       This allows "sbd" to survive temporary outages of the majority of devices. However, while the cluster  is
       in  such  a  degraded  state,  it  can  neither successfully fence nor be shutdown cleanly (as taking the
       cluster below the quorum threshold will immediately cause all remaining nodes to self-fence).  In  short,
       it will not tolerate any further faults.  Please repair the system before continuing.

       There  is one "sbd" process that acts as a master to which all watchers report; one per device to monitor
       the node's slot; and, optionally, one that handles the Pacemaker integration.

       -W  Enable or disable use of the system watchdog to protect against the sbd  processes  failing  and  the
           node being left in an undefined state. Specify this once to enable, twice to disable.

           Defaults to enabled.

       -w /dev/watchdog
           This can be used to override the default watchdog device used and should not usually be necessary.

       -p /var/run/sbd.pid
           This option can be used to specify a pidfile for the main sbd process.

       -F N
           Number  of  failures  before  a  failing  servant process will not be restarted immediately until the
           dampening delay has expired. If set to zero, servants will be restarted immediately and indefinitely.
           If set to one, a failed servant will be restarted once every -t seconds. If set to a different value,
           the servant will be restarted that many times within the dampening period and then delay.

           Defaults to 1.

       -t N
           Dampening delay before faulty servants are restarted. Combined with "-F 1", the most logical  way  to
           tune the restart frequency of servant processes.  Default is 5 seconds.

           If set to zero, processes will be restarted indefinitely and immediately.

       -P  Enable  Pacemaker  integration  which  checks Pacemaker quorum and node health.  Specify this once to
           enable, twice to disable.

           Defaults to enabled.

       -S N
           Set the start mode. (Defaults to 0.)

           If this is set to zero, sbd will always start up unconditionally, regardless of whether the node  was
           previously fenced or not.

           If  set  to  one, sbd will only start if the node was previously shutdown cleanly (as indicated by an
           exit request message in the slot), or if the slot is empty. A reset, crashdump, or power-off  request
           in any slot will halt the start up.

           This  is  useful  to  prevent  nodes  from  rejoining  if they were faulty. The node must be manually
           "unfenced" by sending an empty message to it:

                   sbd -d /dev/sda1 message node1 clear

       -s N
           Set the start-up wait time for devices. (Defaults to 120.)

           Dynamic block devices such as iSCSI might not be fully initialized and present yet.  This  allows  to
           set  a  timeout  for waiting for devices to appear on start-up. If set to 0, start-up will be aborted
           immediately if no devices are available.

       -Z  Enable trace mode. Warning: this is unsafe for production, use at your own risk! Specifying this once
           will turn all reboots or power-offs, be they caused by  self-fence  decisions  or  messages,  into  a
           crashdump.  Specifying this twice will just log them but not continue running.

       -T  By  default,  the  daemon will set the watchdog timeout as specified in the device metadata. However,
           this does not work for every watchdog device.  In this  case,  you  must  manually  ensure  that  the
           watchdog  timeout used by the system correctly matches the SBD settings, and then specify this option
           to allow "sbd" to continue with start-up.

       -5 N
           Warn if the time interval for tickling the watchdog exceeds this many seconds.   Since  the  node  is
           unable  to  log  the  watchdog  expiry  (it reboots immediately without a chance to write its logs to
           disk), this is very useful for getting an indication that the watchdog timeout is too short  for  the
           IO load of the system.

           Default is 3 seconds, set to zero to disable.

       -C N
           Watchdog  timeout  to  set before crashdumping. If SBD is set to crashdump instead of reboot - either
           via the trace mode settings or the external/sbd fencing agent's parameter  -,  SBD  will  adjust  the
           watchdog  timeout  to  this setting before triggering the dump. Otherwise, the watchdog might trigger
           and prevent a successful crashdump from ever being written.

           Defaults to 240 seconds. Set to zero to disable.

   allocate
       Example usage:

               sbd -d /dev/sda1 allocate node1

       Explicitly allocates a slot for the specified node name. This should rarely be necessary, as  every  node
       will automatically allocate itself a slot the first time it starts up on watch mode.

   message
       Example usage:

               sbd -d /dev/sda1 message node1 test

       Writes  the specified message to node's slot. This is rarely done directly, but rather abstracted via the
       "external/sbd" fencing agent configured as a cluster resource.

       Supported message types are:

       test
           This only generates a log message on the receiving node and can be used to check if SBD is seeing the
           device. Note that this could overwrite a fencing request send by the cluster, so should not  be  used
           during production.

       reset
           Reset the target upon receipt of this message.

       off Power-off the target.

       crashdump
           Cause the target node to crashdump.

       exit
           This  will  make  the  "sbd"  daemon  exit  cleanly  on  the target. You should not send this message
           manually; this is handled properly during shutdown of the cluster stack. Manually stopping the daemon
           means the node is unprotected!

       clear
           This message indicates that no real message has been sent to the  node.   You  should  not  set  this
           manually;  "sbd" will clear the message slot automatically during start-up, and setting this manually
           could overwrite a fencing message by the cluster.

Base system configuration

   Configure a watchdog
       It is highly recommended that you configure your Linux system to load a  watchdog  driver  with  hardware
       assistance  (as is available on most modern systems), such as hpwdt, iTCO_wdt, or others. As a fall-back,
       you can use the softdog module.

       No other software must access the watchdog timer; it can only be accessed by one  process  at  any  given
       time.  Some  hardware  vendors  ship  systems management software that use the watchdog for system resets
       (f.e. HP ASR daemon). Such software has to be disabled if the watchdog is to be used by SBD.

   Choosing and initializing the block device(s)
       First, you have to decide if you want to use one, two, or three devices.

       If you are using multiple ones, they should reside on independent storage setups. Putting  all  three  of
       them on the same logical unit for example would not provide any additional redundancy.

       The  SBD  device can be connected via Fibre Channel, Fibre Channel over Ethernet, or even iSCSI. Thus, an
       iSCSI target can become a sort-of network-based quorum server; the advantage is that it does not  require
       a smart host at your third location, just block storage.

       The  SBD  partitions  themselves  must not be mirrored (via MD, DRBD, or the storage layer itself), since
       this could result in a split-mirror scenario. Nor can they reside on cLVM2 volume groups, since they must
       be accessed by the cluster stack before it has started the cLVM2 daemons; hence, these should  be  either
       raw partitions or logical units on (multipath) storage.

       The  block  device(s)  must  be accessible from all nodes. (While it is not necessary that they share the
       same path name on all nodes, this is considered a very good idea.)

       SBD will only use about one megabyte per device, so you can easily create  a  small  partition,  or  very
       small  logical  units.   (The  size of the SBD device depends on the block size of the underlying device.
       Thus, 1MB is fine on plain SCSI devices  and  SAN  storage  with  512  byte  blocks.  On  the  IBM  s390x
       architecture in particular, disks default to 4k blocks, and thus require roughly 4MB.)

       The number of devices will affect the operation of SBD as follows:

       One device
           In  its  most  simple implementation, you use one device only. This is appropriate for clusters where
           all your data is on the same shared storage (with internal redundancy) anyway; the  SBD  device  does
           not introduce an additional single point of failure then.

           If the SBD device is not accessible, the daemon will fail to start and inhibit openais startup.

       Two devices
           This  configuration  is  a  trade-off,  primarily aimed at environments where host-based mirroring is
           used, but no third storage device is available.

           SBD will not commit suicide if it loses access to one mirror leg; this allows the cluster to continue
           to function even in the face of one outage.

           However, SBD will not fence the other side while only one mirror leg is available, since it does  not
           have  enough  knowledge  to  detect  an  asymmetric  split  of the storage. So it will not be able to
           automatically tolerate a second failure while one of the storage arrays is down. (Though you can  use
           the appropriate crm command to acknowledge the fence manually.)

           It will not start unless both devices are accessible on boot.

       Three devices
           In this most reliable and recommended configuration, SBD will only self-fence if more than one device
           is lost; hence, this configuration is resilient against temporary single device outages (be it due to
           failures or maintenance).  Fencing messages can still be successfully relayed if at least two devices
           remain accessible.

           This  configuration  is  appropriate  for  more  complex scenarios where storage is not confined to a
           single array. For example, host-based mirroring solutions could have one  SBD  per  mirror  leg  (not
           mirrored itself), and an additional tie-breaker on iSCSI.

           It will only start if at least two devices are accessible on boot.

       After  you  have  chosen  the  devices and created the appropriate partitions and perhaps multipath alias
       names to ease management, use the "sbd create" command described above to initialize the SBD metadata  on
       them.

       Sharing the block device(s) between multiple clusters

       It  is  possible to share the block devices between multiple clusters, provided the total number of nodes
       accessing them does not exceed 255 nodes, and they all must share the same SBD timeouts (since these  are
       part of the metadata).

       If  you  are  using multiple devices this can reduce the setup overhead required. However, you should not
       share devices between clusters in different security domains.

   Configure SBD to start on boot
       On systems using "sysvinit", the "openais" or "corosync" system start-up scripts must handle starting  or
       stopping "sbd" as required before starting the rest of the cluster stack.

       For "systemd", sbd simply has to be enabled using

               systemctl enable sbd.service

       The  daemon is brought online on each node before corosync and Pacemaker are started, and terminated only
       after all other cluster components have been shut down  -  ensuring  that  cluster  resources  are  never
       activated without SBD supervision.

   Configuration via sysconfig
       The  system  instance  of "sbd" is configured via /etc/sysconfig/sbd.  In this file, you must specify the
       device(s) used, as well as any options to pass to the daemon:

               SBD_DEVICE="/dev/sda1;/dev/sdb1;/dev/sdc1"
               SBD_PACEMAKER="true"

       "sbd" will fail to start if no "SBD_DEVICE" is specified. See the installed  template  for  more  options
       that can be configured here.

   Testing the sbd installation
       After  a restart of the cluster stack on this node, you can now try sending a test message to it as root,
       from this or any other node:

               sbd -d /dev/sda1 message node1 test

       The node will acknowledge the receipt of the message in the system logs:

               Aug 29 14:10:00 node1 sbd: [13412]: info: Received command test from node2

       This confirms that SBD is indeed up and running on the node, and that it is ready to receive messages.

       Make sure that /etc/sysconfig/sbd is identical on all cluster nodes,  and  that  all  cluster  nodes  are
       running the daemon.

Pacemaker CIB integration

   Fencing resource
       Pacemaker can only interact with SBD to issue a node fence if there is a configure fencing resource. This
       should be a primitive, not a clone, as follows:

               primitive fencing-sbd stonith:external/sbd \
                       params pcmk_delay_max=30

       This will automatically use the same devices as configured in /etc/sysconfig/sbd.

       While  you  should  not  configure this as a clone (as Pacemaker will register the fencing device on each
       node automatically), the pcmk_delay_max setting enables random fencing delay which ensures, in a scenario
       where a split-brain scenario did occur in a two node cluster, that one of the nodes has a  better  chance
       to survive to avoid double fencing.

       SBD  also  supports turning the reset request into a crash request, which may be helpful for debugging if
       you have kernel crashdumping configured; then, every fence request will cause the node to dump core.  You
       can enable this via the "crashdump="true"" parameter on the fencing resource. This is not recommended for
       production use, but only for debugging phases.

   General cluster properties
       You  must  also  enable  STONITH in general, and set the STONITH timeout to be at least twice the msgwait
       timeout you have configured, to allow enough time for the  fencing  message  to  be  delivered.  If  your
       msgwait timeout is 60 seconds, this is a possible configuration:

               property stonith-enabled="true"
               property stonith-timeout="120s"

       Caution:  if  stonith-timeout  is  too low for msgwait and the system overhead, sbd will never be able to
       successfully complete a fence request. This will create a fencing loop.

       Note that the sbd fencing agent will try to detect this  and  automatically  extend  the  stonith-timeout
       setting  to  a reasonable value, on the assumption that sbd modifying your configuration is preferable to
       not fencing.

Management tasks

   Recovering from temporary SBD device outage
       If you have multiple devices, failure of a single device is not immediately fatal. "sbd"  will  retry  to
       restart the monitor for the device every 5 seconds by default. However, you can tune this via the options
       to the watch command.

       In  case  you  wish  the immediately force a restart of all currently disabled monitor processes, you can
       send a SIGUSR1 to the SBD inquisitor process.

LICENSE

       Copyright (C) 2008-2013 Lars Marowsky-Bree

       This program is free software; you can redistribute it and/or modify  it  under  the  terms  of  the  GNU
       General  Public License as published by the Free Software Foundation; either version 2 of the License, or
       (at your option) any later version.

       This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;  without  even
       the  implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
       License for more details.

       For details see the GNU General Public License at  http://www.gnu.org/licenses/gpl-2.0.html  (version  2)
       and/or http://www.gnu.org/licenses/gpl.html (the newest as per "any later").

SBD                                                2017-11-25                                             SBD(8)