Provided by: drbd-utils_8.9.10-2ubuntu0.1_amd64 bug

NAME

       drbd.conf - DRBD Configuration Files

INTRODUCTION

       DRBD implements block devices which replicate their data to all nodes of a cluster. The
       actual data and associated metadata are usually stored redundantly on "ordinary" block
       devices on each cluster node.

       Replicated block devices are called /dev/drbdminor by default. They are grouped into
       resources, with one or more devices per resource. Replication among the devices in a
       resource takes place in chronological order. With DRBD, we refer to the devices inside a
       resource as volumes.

       In DRBD 9, a resource can be replicated between two or more cluster nodes. The connections
       between cluster nodes are point-to-point links, and use TCP or a TCP-like protocol. All
       nodes must be directly connected.

       DRBD consists of low-level user-space components which interact with the kernel and
       perform basic operations (drbdsetup, drbdmeta), a high-level user-space component which
       understands and processes the DRBD configuration and translates it into basic operations
       of the low-level components (drbdadm), and a kernel component.

       The default DRBD configuration consists of /etc/drbd.conf and of additional files included
       from there, usually global_common.conf and all *.res files inside /etc/drbd.d/. It has
       turned out to be useful to define each resource in a separate *.res file.

       The configuration files are designed so that each cluster node can contain an identical
       copy of the entire cluster configuration. The host name of each node determines which
       parts of the configuration apply (uname -n). It is highly recommended to keep the cluster
       configuration on all nodes in sync by manually copying it to all nodes, or by automating
       the process with csync2 or a similar tool.

EXAMPLE CONFIGURATION FILE

           resource r0 {
                 net {
                      cram-hmac-alg sha1;
                      shared-secret "FooFunFactory";
                 }
                 volume 0 {
                      device    /dev/drbd1;
                      disk      /dev/sda7;
                      meta-disk internal;
                 }
                 on alice {
                      node-id   0;
                      address   10.1.1.31:7000;
                 }
                 on bob {
                      node-id   1;
                      address   10.1.1.32:7000;
                 }
                 connection {
                      host      alice  port 7000;
                      host      bob    port 7000;
                      net {
                          protocol C;
                      }
                 }
           }

       This example defines a resource r0 which contains a single replicated device with volume
       number 0. The resource is replicated among hosts alice and bob, which have the IPv4
       addresses 10.1.1.31 and 10.1.1.32 and the node identifiers 0 and 1, respectively. On both
       hosts, the replicated device is called /dev/drbd1, and the actual data and metadata are
       stored on the lower-level device /dev/sda7. The connection between the hosts uses protocol
       C.

       Please refer to the DRBD User's Guide[1] for more examples.

FILE FORMAT

       DRBD configuration files consist of sections, which contain other sections and parameters
       depending on the section types. Each section consists of one or more keywords, sometimes a
       section name, an opening brace (“{”), the section's contents, and a closing brace (“}”).
       Parameters inside a section consist of a keyword, followed by one or more keywords or
       values, and a semicolon (“;”).

       Some parameter values have a default scale which applies when a plain number is specified
       (for example Kilo, or 1024 times the numeric value). Such default scales can be overridden
       by using a suffix (for example, M for Mega). The common suffixes K = 2^10 = 1024, M = 1024
       K, and G = 1024 M are supported.

       Comments start with a hash sign (“#”) and extend to the end of the line. In addition, any
       section can be prefixed with the keyword skip, which causes the section and any
       sub-sections to be ignored.

       Additional files can be included with the include file-pattern statement (see glob(7) for
       the expressions supported in file-pattern). Include statements are only allowed outside of
       sections.

       The following sections are defined (indentation indicates in which context):

           common
              [disk]
              [handlers]
              [net]
              [options]
              [startup]
           global
           resource
              connection
                 path
                 net
              connection-mesh
                 net
              [disk]
              floating
              handlers
              [net]
              on
                 volume
                    disk
                    [disk]
              options
              stacked-on-top-of
              startup

       Sections in brackets affect other parts of the configuration: inside the common section,
       they apply to all resources. A disk section inside a resource or on section applies to all
       volumes of that resource, and a net section inside a resource section applies to all
       connections of that resource. This allows to avoid repeating identical options for each
       resource, connection, or volume. Options can be overridden in a more specific resource,
       connection, on, or volume section.

   Sections
       common

           This section can contain each a disk, handlers, net, options, and startup section. All
           resources inherit the parameters in these sections as their default values.

       connection [name]

           Define a connection between two hosts. This section must contain two host parameters
           or multiple path sections. The optional name is used to refer to the connection in the
           system log and in other messages. If no name is specified, the peer's host name is
           used instead.

       path

           Define a path between two hosts. This section must contain two host parameters.

       connection-mesh

           Define a connection mesh between multiple hosts. This section must contain a hosts
           parameter, which has the host names as arguments. This section is a shortcut to define
           many connections which share the same network options.

       disk

           Define parameters for a volume. All parameters in this section are optional.

       floating [address-family] addr:port

           Like the on section, except that instead of the host name a network address is used to
           determine if it matches a floating section.

           The node-id parameter in this section is required. If the address parameter is not
           provided, no connections to peers will be created by default. The device, disk, and
           meta-disk parameters must be defined in, or inherited by, this section.

       global

           Define some global parameters. All parameters in this section are optional. Only one
           global section is allowed in the configuration.

       handlers

           Define handlers to be invoked when certain events occur. The kernel passes the
           resource name in the first command-line argument and sets the following environment
           variables depending on the event's context:

           •   For events related to a particular device: the device's minor number in
               DRBD_MINOR, the device's volume number in DRBD_VOLUME.

           •   For events related to a particular device on a particular peer: the connection
               endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF, DRBD_PEER_ADDRESS, and DRBD_PEER_AF; the
               device's local minor number in DRBD_MINOR, and the device's volume number in
               DRBD_VOLUME.

           •   For events related to a particular connection: the connection endpoints in
               DRBD_MY_ADDRESS, DRBD_MY_AF, DRBD_PEER_ADDRESS, and DRBD_PEER_AF; and, for each
               device defined for that connection: the device's minor number in
               DRBD_MINOR_volume-number.

           •   For events that identify a device, if a lower-level device is attached, the
               lower-level device's device name is passed in DRBD_BACKING_DEV (or
               DRBD_BACKING_DEV_volume-number).

           All parameters in this section are optional. Only a single handler can be defined for
           each event; if no handler is defined, nothing will happen.

       net

           Define parameters for a connection. All parameters in this section are optional.

       on host-name [...]

           Define the properties of a resource on a particular host or set of hosts. Specifying
           more than one host name can make sense in a setup with IP address failover, for
           example. The host-name argument must match the Linux host name (uname -n).

           Usually contains or inherits at least one volume section. The node-id and address
           parameters must be defined in this section. The device, disk, and meta-disk parameters
           must be defined in, or inherited by, this section.

           A normal configuration file contains two or more on sections for each resource. Also
           see the floating section.

       options

           Define parameters for a resource. All parameters in this section are optional.

       resource name

           Define a resource. Usually contains at least two on sections and at least one
           connection section.

       stacked-on-top-of resource

           Used instead of an on section for configuring a stacked resource with three to four
           nodes.

           Starting with DRBD 9, stacking is deprecated. It is advised to use resources which are
           replicated among more than two nodes instead.

       startup

           The parameters in this section determine the behavior of a resource at startup time.

       volume volume-number

           Define a volume within a resource. The volume numbers in the various volume sections
           of a resource define which devices on which hosts form a replicated device.

   Section connection Parameters
       host name [address [address-family] address] [port port-number]

           Defines an endpoint for a connection. Each host statement refers to an on section in a
           resource. If a port number is defined, this endpoint will use the specified port
           instead of the port defined in the on section. Each connection section must contain
           exactly two host parameters. Instead of two host parameters the connection may contain
           multiple path sections.

   Section path Parameters
       host name [address [address-family] address] [port port-number]

           Defines an endpoint for a connection. Each host statement refers to an on section in a
           resource. If a port number is defined, this endpoint will use the specified port
           instead of the port defined in the on section. Each path section must contain exactly
           two host parameters.

   Section connection-mesh Parameters
       hosts name...

           Defines all nodes of a mesh. Each name refers to an on section in a resource. The port
           that is defined in the on section will be used.

   Section disk Parameters
       al-extents extents

           DRBD automatically maintains a "hot" or "active" disk area likely to be written to
           again soon based on the recent write activity. The "active" disk area can be written
           to immediately, while "inactive" disk areas must be "activated" first, which requires
           a meta-data write. We also refer to this active disk area as the "activity log".

           The activity log saves meta-data writes, but the whole log must be resynced upon
           recovery of a failed node. The size of the activity log is a major factor of how long
           a resync will take and how fast a replicated disk will become consistent after a
           crash.

           The activity log consists of a number of 4-Megabyte segments; the al-extents parameter
           determines how many of those segments can be active at the same time. The default
           value for al-extents is 1237, with a minimum of 7 and a maximum of 65536.

           Note that the effective maximum may be smaller, depending on how you created the
           device meta data, see also drbdmeta(8) The effective maximum is 919 * (available
           on-disk activity-log ring-buffer area/4kB -1), the default 32kB ring-buffer effects a
           maximum of 6433 (covers more than 25 GiB of data) We recommend to keep this well
           within the amount your backend storage and replication link are able to resync inside
           of about 5 minutes.

       al-updates {yes | no}

           With this parameter, the activity log can be turned off entirely (see the al-extents
           parameter). This will speed up writes because fewer meta-data writes will be
           necessary, but the entire device needs to be resynchronized opon recovery of a failed
           primary node. The default value for al-updates is yes.

       c-delay-target delay_target,
       c-fill-target fill_target,
       c-max-rate max_rate,
       c-plan-ahead plan_time
           Dynamically control the resync speed. This mechanism is enabled by setting the
           c-plan-ahead parameter to a positive value. The goal is to either fill the buffers
           along the data path with a defined amount of data if c-fill-target is defined, or to
           have a defined delay along the path if c-delay-target is defined. The maximum
           bandwidth is limited by the c-max-rate parameter.

           The c-plan-ahead parameter defines how fast drbd adapts to changes in the resync
           speed. It should be set to five times the network round-trip time or more. Common
           values for c-fill-target for "normal" data paths range from 4K to 100K. If drbd-proxy
           is used, it is advised to use c-delay-target instead of c-fill-target. The
           c-delay-target parameter is used if the c-fill-target parameter is undefined or set to
           0. The c-delay-target parameter should be set to five times the network round-trip
           time or more. The c-max-rate option should be set to either the bandwidth available
           between the DRBD-hosts and the machines hosting DRBD-proxy, or to the available disk
           bandwidth.

           The default values of these parameters are: c-plan-ahead = 20 (in units of 0.1
           seconds), c-fill-target = 0 (in units of sectors), c-delay-target = 1 (in units of 0.1
           seconds), and c-max-rate = 102400 (in units of KiB/s).

           Dynamic resync speed control is available since DRBD 8.3.9.

       c-min-rate min_rate
           A node which is primary and sync-source has to schedule application I/O requests and
           resync I/O requests. The c-min-rate parameter limits how much bandwidth is available
           for resync I/O; the remaining bandwidth is used for application I/O.

           A c-min-rate value of 0 means that there is no limit on the resync I/O bandwidth. This
           can slow down application I/O significantly. Use a value of 1 (1 KiB/s) for the lowest
           possible resync rate.

           The default value of c-min-rate is 4096, in units of KiB/s.

       disk-barrier,
       disk-flushes,
       disk-drain
           DRBD has three methods of handling the ordering of dependent write requests:

           disk-barrier
               Use disk barriers to make sure that requests are written to disk in the right
               order. Barriers ensure that all requests submitted before a barrier make it to the
               disk before any requests submitted after the barrier. This is implemented using
               'tagged command queuing' on SCSI devices and 'native command queuing' on SATA
               devices. Only some devices and device stacks support this method. The device
               mapper (LVM) only supports barriers in some configurations.

               Note that on systems which do not support disk barriers, enabling this option can
               lead to data loss or corruption. Until DRBD 8.4.1, disk-barrier was turned on if
               the I/O stack below DRBD did support barriers. Kernels since linux-2.6.36 (or
               2.6.32 RHEL6) no longer allow to detect if barriers are supported. Since
               drbd-8.4.2, this option is off by default and needs to be enabled explicitly.

           disk-flushes
               Use disk flushes between dependent write requests, also referred to as 'force unit
               access' by drive vendors. This forces all data to disk. This option is enabled by
               default.

           disk-drain
               Wait for the request queue to "drain" (that is, wait for the requests to finish)
               before submitting a dependent write request. This method requires that requests
               are stable on disk when they finish. Before DRBD 8.0.9, this was the only method
               implemented. This option is enabled by default. Do not disable in production
               environments.

           From these three methods, drbd will use the first that is enabled and supported by the
           backing storage device. If all three of these options are turned off, DRBD will submit
           write requests without bothering about dependencies. Depending on the I/O stack, write
           requests can be reordered, and they can be submitted in a different order on different
           cluster nodes. This can result in data loss or corruption. Therefore, turning off all
           three methods of controlling write ordering is strongly discouraged.

           A general guideline for configuring write ordering is to use disk barriers or disk
           flushes when using ordinary disks (or an ordinary disk array) with a volatile write
           cache. On storage without cache or with a battery backed write cache, disk draining
           can be a reasonable choice.

       disk-timeout
           If the lower-level device on which a DRBD device stores its data does not finish an
           I/O request within the defined disk-timeout, DRBD treats this as a failure. The
           lower-level device is detached, and the device's disk state advances to Diskless. If
           DRBD is connected to one or more peers, the failed request is passed on to one of
           them.

           This option is dangerous and may lead to kernel panic!

           "Aborting" requests, or force-detaching the disk, is intended for completely
           blocked/hung local backing devices which do no longer complete requests at all, not
           even do error completions. In this situation, usually a hard-reset and failover is the
           only way out.

           By "aborting", basically faking a local error-completion, we allow for a more graceful
           swichover by cleanly migrating services. Still the affected node has to be rebooted
           "soon".

           By completing these requests, we allow the upper layers to re-use the associated data
           pages.

           If later the local backing device "recovers", and now DMAs some data from disk into
           the original request pages, in the best case it will just put random data into unused
           pages; but typically it will corrupt meanwhile completely unrelated data, causing all
           sorts of damage.

           Which means delayed successful completion, especially for READ requests, is a reason
           to panic(). We assume that a delayed *error* completion is OK, though we still will
           complain noisily about it.

           The default value of disk-timeout is 0, which stands for an infinite timeout. Timeouts
           are specified in units of 0.1 seconds. This option is available since DRBD 8.3.12.

       md-flushes
           Enable disk flushes and disk barriers on the meta-data device. This option is enabled
           by default. See the disk-flushes parameter.

       on-io-error handler

           Configure how DRBD reacts to I/O errors on a lower-level device. The following
           policies are defined:

           pass_on
               Change the disk status to Inconsistent, mark the failed block as inconsistent in
               the bitmap, and retry the I/O operation on a remote cluster node.

           call-local-io-error
               Call the local-io-error handler (see the handlers section).

           detach
               Detach the lower-level device and continue in diskless mode.

       read-balancing policy
           Distribute read requests among cluster nodes as defined by policy. The supported
           policies are prefer-local (the default), prefer-remote, round-robin, least-pending,
           when-congested-remote, 32K-striping, 64K-striping, 128K-striping, 256K-striping,
           512K-striping and 1M-striping.

           This option is available since DRBD 8.4.1.

       resync-after res-name/volume

           Define that a device should only resynchronize after the specified other device. By
           default, no order between devices is defined, and all devices will resynchronize in
           parallel. Depending on the configuration of the lower-level devices, and the available
           network and disk bandwidth, this can slow down the overall resync process. This option
           can be used to form a chain or tree of dependencies among devices.

       resync-rate rate

           Define how much bandwidth DRBD may use for resynchronizing. DRBD allows "normal"
           application I/O even during a resync. If the resync takes up too much bandwidth,
           application I/O can become very slow. This parameter allows to avoid that. Please note
           this is option only works when the dynamic resync controller is disabled.

       rs-discard-granularity byte
           When rs-discard-granularity is set to a non zero, positive value then DRBD tries to do
           a resync operation in requests of this size. In case such a block contains only zero
           bytes on the sync source node, the sync target node will issue a discard/trim/unmap
           command for the area.

           The value is constrained by the discard granularity of the backing block device. In
           case rs-discard-granularity is not a multiplier of the discard granularity of the
           backing block device DRBD rounds it up. The feature only gets active if the backing
           block device reads back zeroes after a discard command.

           The default value of is 0. This option is available since 8.4.7.

       discard-zeroes-if-aligned {yes | no}

           There are several aspects to discard/trim/unmap support on linux block devices. Even
           if discard is supported in general, it may fail silently, or may partially ignore
           discard requests. Devices also announce whether reading from unmapped blocks returns
           defined data (usually zeroes), or undefined data (possibly old data, possibly
           garbage).

           If on different nodes, DRBD is backed by devices with differing discard
           characteristics, discards may lead to data divergence (old data or garbage left over
           on one backend, zeroes due to unmapped areas on the other backend). Online verify
           would now potentially report tons of spurious differences. While probably harmless for
           most use cases (fstrim on a file system), DRBD cannot have that.

           To play safe, we have to disable discard support, if our local backend (on a Primary)
           does not support "discard_zeroes_data=true". We also have to translate discards to
           explicit zero-out on the receiving side, unless the receiving side (Secondary)
           supports "discard_zeroes_data=true", thereby allocating areas what were supposed to be
           unmapped.

           There are some devices (notably the LVM/DM thin provisioning) that are capable of
           discard, but announce discard_zeroes_data=false. In the case of DM-thin, discards
           aligned to the chunk size will be unmapped, and reading from unmapped sectors will
           return zeroes. However, unaligned partial head or tail areas of discard requests will
           be silently ignored.

           If we now add a helper to explicitly zero-out these unaligned partial areas, while
           passing on the discard of the aligned full chunks, we effectively achieve
           discard_zeroes_data=true on such devices.

           Setting discard-zeroes-if-aligned to yes will allow DRBD to use discards, and to
           announce discard_zeroes_data=true, even on backends that announce
           discard_zeroes_data=false.

           Setting discard-zeroes-if-aligned to no will cause DRBD to always fall-back to
           zero-out on the receiving side, and to not even announce discard capabilities on the
           Primary, if the respective backend announces discard_zeroes_data=false.

           We used to ignore the discard_zeroes_data setting completely. To not break established
           and expected behaviour, and suddenly cause fstrim on thin-provisioned LVs to run
           out-of-space instead of freeing up space, the default value is yes.

           This option is available since 8.4.7.

   Section global Parameters
       dialog-refresh time

           The DRBD init script can be used to configure and start DRBD devices, which can
           involve waiting for other cluster nodes. While waiting, the init script shows the
           remaining waiting time. The dialog-refresh defines the number of seconds between
           updates of that countdown. The default value is 1; a value of 0 turns off the
           countdown.

       disable-ip-verification
           Normally, DRBD verifies that the IP addresses in the configuration match the host
           names. Use the disable-ip-verification parameter to disable these checks.

       usage-count {yes | no | ask}
           A explained on DRBD's Online Usage Counter[2] web page, DRBD includes a mechanism for
           anonymously counting how many installations are using which versions of DRBD. The
           results are available on the web page for anyone to see.

           This parameter defines if a cluster node participates in the usage counter; the
           supported values are yes, no, and ask (ask the user, the default).

           We would like to ask users to participate in the online usage counter as this provides
           us valuable feedback for steering the development of DRBD.

   Section handlers Parameters
       after-resync-target cmd

           Called on a resync target when a node state changes from Inconsistent to Consistent
           when a resync finishes. This handler can be used for removing the snapshot created in
           the before-resync-target handler.

       before-resync-target cmd

           Called on a resync target before a resync begins. This handler can be used for
           creating a snapshot of the lower-level device for the duration of the resync: if the
           resync source becomes unavailable during a resync, reverting to the snapshot can
           restore a consistent state.

       fence-peer cmd

           Called when a node should fence a resource on a particular peer. The handler should
           not use the same communication path that DRBD uses for talking to the peer.

       unfence-peer cmd

           Called when a node should remove fencing constraints from other nodes.

       initial-split-brain cmd

           Called when DRBD connects to a peer and detects that the peer is in a split-brain
           state with the local node. This handler is also called for split-brain scenarios which
           will be resolved automatically.

       local-io-error cmd

           Called when an I/O error occurs on a lower-level device.

       pri-lost cmd

           The local node is currently primary, but DRBD believes that it should become a sync
           target. The node should give up its primary role.

       pri-lost-after-sb cmd

           The local node is currently primary, but it has lost the after-split-brain auto
           recovery procedure. The node should be abandoned.

       pri-on-incon-degr cmd

           The local node is primary, and neither the local lower-level device nor a lower-level
           device on a peer is up to date. (The primary has no device to read from or to write
           to.)

       split-brain cmd

           DRBD has detected a split-brain situation which could not be resolved automatically.
           Manual recovery is necessary. This handler can be used to call for administrator
           attention.

   Section net Parameters
       after-sb-0pri policy
           Define how to react if a split-brain scenario is detected and none of the two nodes is
           in primary role. (We detect split-brain scenarios when two nodes connect; split-brain
           decisions are always between two nodes.) The defined policies are:

           disconnect
               No automatic resynchronization; simply disconnect.

           discard-younger-primary,
           discard-older-primary
               Resynchronize from the node which became primary first (discard-younger-primary)
               or last (discard-older-primary). If both nodes became primary independently, the
               discard-least-changes policy is used.

           discard-zero-changes
               If only one of the nodes wrote data since the split brain situation was detected,
               resynchronize from this node to the other. If both nodes wrote data, disconnect.

           discard-least-changes
               Resynchronize from the node with more modified blocks.

           discard-node-nodename
               Always resynchronize to the named node.

       after-sb-1pri policy
           Define how to react if a split-brain scenario is detected, with one node in primary
           role and one node in secondary role. (We detect split-brain scenarios when two nodes
           connect, so split-brain decisions are always among two nodes.) The defined policies
           are:

           disconnect
               No automatic resynchronization, simply disconnect.

           consensus
               Discard the data on the secondary node if the after-sb-0pri algorithm would also
               discard the data on the secondary node. Otherwise, disconnect.

           violently-as0p
               Always take the decision of the after-sb-0pri algorithm, even if it causes an
               erratic change of the primary's view of the data. This is only useful if a
               single-node file system (i.e., not OCFS2 or GFS) with the allow-two-primaries flag
               is used. This option can cause the primary node to crash, and should not be used.

           discard-secondary
               Discard the data on the secondary node.

           call-pri-lost-after-sb
               Always take the decision of the after-sb-0pri algorithm. If the decision is to
               discard the data on the primary node, call the pri-lost-after-sb handler on the
               primary node.

       after-sb-2pri policy
           Define how to react if a split-brain scenario is detected and both nodes are in
           primary role. (We detect split-brain scenarios when two nodes connect, so split-brain
           decisions are always among two nodes.) The defined policies are:

           disconnect
               No automatic resynchronization, simply disconnect.

           violently-as0p
               See the violently-as0p policy for after-sb-1pri.

           call-pri-lost-after-sb
               Call the pri-lost-after-sb helper program on one of the machines unless that
               machine can demote to secondary. The helper program is expected to reboot the
               machine, which brings the node into a secondary role. Which machine runs the
               helper program is determined by the after-sb-0pri strategy.

       allow-two-primaries

           The most common way to configure DRBD devices is to allow only one node to be primary
           (and thus writable) at a time.

           In some scenarios it is preferable to allow two nodes to be primary at once; a
           mechanism outside of DRBD then must make sure that writes to the shared, replicated
           device happen in a coordinated way. This can be done with a shared-storage cluster
           file system like OCFS2 and GFS, or with virtual machine images and a virtual machine
           manager that can migrate virtual machines between physical machines.

           The allow-two-primaries parameter tells DRBD to allow two nodes to be primary at the
           same time. Never enable this option when using a non-distributed file system;
           otherwise, data corruption and node crashes will result!

       always-asbp
           Normally the automatic after-split-brain policies are only used if current states of
           the UUIDs do not indicate the presence of a third node.

           With this option you request that the automatic after-split-brain policies are used as
           long as the data sets of the nodes are somehow related. This might cause a full sync,
           if the UUIDs indicate the presence of a third node. (Or double faults led to strange
           UUID sets.)

       connect-int time

           As soon as a connection between two nodes is configured with drbdsetup connect, DRBD
           immediately tries to establish the connection. If this fails, DRBD waits for
           connect-int seconds and then repeats. The default value of connect-int is 10 seconds.

       cram-hmac-alg hash-algorithm

           Configure the hash-based message authentication code (HMAC) or secure hash algorithm
           to use for peer authentication. The kernel supports a number of different algorithms,
           some of which may be loadable as kernel modules. See the shash algorithms listed in
           /proc/crypto. By default, cram-hmac-alg is unset. Peer authentication also requires a
           shared-secret to be configured.

       csums-alg hash-algorithm

           Normally, when two nodes resynchronize, the sync target requests a piece of
           out-of-sync data from the sync source, and the sync source sends the data. With many
           usage patterns, a significant number of those blocks will actually be identical.

           When a csums-alg algorithm is specified, when requesting a piece of out-of-sync data,
           the sync target also sends along a hash of the data it currently has. The sync source
           compares this hash with its own version of the data. It sends the sync target the new
           data if the hashes differ, and tells it that the data are the same otherwise. This
           reduces the network bandwidth required, at the cost of higher cpu utilization and
           possibly increased I/O on the sync target.

           The csums-alg can be set to one of the secure hash algorithms supported by the kernel;
           see the shash algorithms listed in /proc/crypto. By default, csums-alg is unset.

       csums-after-crash-only

           Enabling this option (and csums-alg, above) makes it possible to use the checksum
           based resync only for the first resync after primary crash, but not for later "network
           hickups".

           In most cases, block that are marked as need-to-be-resynced are in fact changed, so
           calculating checksums, and both reading and writing the blocks on the resync target is
           all effective overhead.

           The advantage of checksum based resync is mostly after primary crash recovery, where
           the recovery marked larger areas (those covered by the activity log) as
           need-to-be-resynced, just in case. Introduced in 8.4.5.

       data-integrity-alg  alg
           DRBD normally relies on the data integrity checks built into the TCP/IP protocol, but
           if a data integrity algorithm is configured, it will additionally use this algorithm
           to make sure that the data received over the network match what the sender has sent.
           If a data integrity error is detected, DRBD will close the network connection and
           reconnect, which will trigger a resync.

           The data-integrity-alg can be set to one of the secure hash algorithms supported by
           the kernel; see the shash algorithms listed in /proc/crypto. By default, this
           mechanism is turned off.

           Because of the CPU overhead involved, we recommend not to use this option in
           production environments. Also see the notes on data integrity below.

       fencing fencing_policy

           Fencing is a preventive measure to avoid situations where both nodes are primary and
           disconnected. This is also known as a split-brain situation. DRBD supports the
           following fencing policies:

           dont-care
               No fencing actions are taken. This is the default policy.

           resource-only
               If a node becomes a disconnected primary, it tries to fence the peer. This is done
               by calling the fence-peer handler. The handler is supposed to reach the peer over
               an alternative communication path and call 'drbdadm outdate minor' there.

           resource-and-stonith
               If a node becomes a disconnected primary, it freezes all its IO operations and
               calls its fence-peer handler. The fence-peer handler is supposed to reach the peer
               over an alternative communication path and call 'drbdadm outdate minor' there. In
               case it cannot do that, it should stonith the peer. IO is resumed as soon as the
               situation is resolved. In case the fence-peer handler fails, I/O can be resumed
               manually with 'drbdadm resume-io'.

       ko-count number

           If a secondary node fails to complete a write request in ko-count times the timeout
           parameter, it is excluded from the cluster. The primary node then sets the connection
           to this secondary node to Standalone. To disable this feature, you should explicitly
           set it to 0; defaults may change between versions.

       max-buffers number

           Limits the memory usage per DRBD minor device on the receiving side, or for internal
           buffers during resync or online-verify. Unit is PAGE_SIZE, which is 4 KiB on most
           systems. The minimum possible setting is hard coded to 32 (=128 KiB). These buffers
           are used to hold data blocks while they are written to/read from disk. To avoid
           possible distributed deadlocks on congestion, this setting is used as a throttle
           threshold rather than a hard limit. Once more than max-buffers pages are in use,
           further allocation from this pool is throttled. You want to increase max-buffers if
           you cannot saturate the IO backend on the receiving side.

       max-epoch-size number

           Define the maximum number of write requests DRBD may issue before issuing a write
           barrier. The default value is 2048, with a minimum of 1 and a maximum of 20000.
           Setting this parameter to a value below 10 is likely to decrease performance.

       on-congestion policy,
       congestion-fill threshold,
       congestion-extents threshold
           By default, DRBD blocks when the TCP send queue is full. This prevents applications
           from generating further write requests until more buffer space becomes available
           again.

           When DRBD is used together with DRBD-proxy, it can be better to use the pull-ahead
           on-congestion policy, which can switch DRBD into ahead/behind mode before the send
           queue is full. DRBD then records the differences between itself and the peer in its
           bitmap, but it no longer replicates them to the peer. When enough buffer space becomes
           available again, the node resynchronizes with the peer and switches back to normal
           replication.

           This has the advantage of not blocking application I/O even when the queues fill up,
           and the disadvantage that peer nodes can fall behind much further. Also, while
           resynchronizing, peer nodes will become inconsistent.

           The available congestion policies are block (the default) and pull-ahead. The
           congestion-fill parameter defines how much data is allowed to be "in flight" in this
           connection. The default value is 0, which disables this mechanism of congestion
           control, with a maximum of 10 GiBytes. The congestion-extents parameter defines how
           many bitmap extents may be active before switching into ahead/behind mode, with the
           same default and limits as the al-extents parameter. The congestion-extents parameter
           is effective only when set to a value smaller than al-extents.

           Ahead/behind mode is available since DRBD 8.3.10.

       ping-int interval

           When the TCP/IP connection to a peer is idle for more than ping-int seconds, DRBD will
           send a keep-alive packet to make sure that a failed peer or network connection is
           detected reasonably soon. The default value is 10 seconds, with a minimum of 1 and a
           maximum of 120 seconds. The unit is seconds.

       ping-timeout timeout

           Define the timeout for replies to keep-alive packets. If the peer does not reply
           within ping-timeout, DRBD will close and try to reestablish the connection. The
           default value is 0.5 seconds, with a minimum of 0.1 seconds and a maximum of 3
           seconds. The unit is tenths of a second.

       socket-check-timeout timeout
           In setups involving a DRBD-proxy and connections that experience a lot of buffer-bloat
           it might be necessary to set ping-timeout to an unusual high value. By default DRBD
           uses the same value to wait if a newly established TCP-connection is stable. Since the
           DRBD-proxy is usually located in the same data center such a long wait time may hinder
           DRBD's connect process.

           In such setups socket-check-timeout should be set to at least to the round trip time
           between DRBD and DRBD-proxy. I.e. in most cases to 1.

           The default unit is tenths of a second, the default value is 0 (which causes DRBD to
           use the value of ping-timeout instead). Introduced in 8.4.5.

       protocol name
           Use the specified protocol on this connection. The supported protocols are:

           A
               Writes to the DRBD device complete as soon as they have reached the local disk and
               the TCP/IP send buffer.

           B
               Writes to the DRBD device complete as soon as they have reached the local disk,
               and all peers have acknowledged the receipt of the write requests.

           C
               Writes to the DRBD device complete as soon as they have reached the local and all
               remote disks.

       rcvbuf-size size

           Configure the size of the TCP/IP receive buffer. A value of 0 (the default) causes the
           buffer size to adjust dynamically. This parameter usually does not need to be set, but
           it can be set to a value up to 10 MiB. The default unit is bytes.

       rr-conflict policy
           This option helps to solve the cases when the outcome of the resync decision is
           incompatible with the current role assignment in the cluster. The defined policies
           are:

           disconnect
               No automatic resynchronization, simply disconnect.

           violently
               Resync to the primary node is allowed, violating the assumption that data on a
               block device are stable for one of the nodes.  Do not use this option, it is
               dangerous.

           call-pri-lost
               Call the pri-lost handler on one of the machines. The handler is expected to
               reboot the machine, which puts it into secondary role.

       shared-secret secret

           Configure the shared secret used for peer authentication. The secret is a string of up
           to 64 characters. Peer authentication also requires the cram-hmac-alg parameter to be
           set.

       sndbuf-size size

           Configure the size of the TCP/IP send buffer. Since DRBD 8.0.13 / 8.2.7, a value of 0
           (the default) causes the buffer size to adjust dynamically. Values below 32 KiB are
           harmful to the throughput on this connection. Large buffer sizes can be useful
           especially when protocol A is used over high-latency networks; the maximum value
           supported is 10 MiB.

       tcp-cork
           By default, DRBD uses the TCP_CORK socket option to prevent the kernel from sending
           partial messages; this results in fewer and bigger packets on the network. Some
           network stacks can perform worse with this optimization. On these, the tcp-cork
           parameter can be used to turn this optimization off.

       timeout time

           Define the timeout for replies over the network: if a peer node does not send an
           expected reply within the specified timeout, it is considered dead and the TCP/IP
           connection is closed. The timeout value must be lower than connect-int and lower than
           ping-int. The default is 6 seconds; the value is specified in tenths of a second.

       use-rle

           Each replicated device on a cluster node has a separate bitmap for each of its peer
           devices. The bitmaps are used for tracking the differences between the local and peer
           device: depending on the cluster state, a disk range can be marked as different from
           the peer in the device's bitmap, in the peer device's bitmap, or in both bitmaps. When
           two cluster nodes connect, they exchange each other's bitmaps, and they each compute
           the union of the local and peer bitmap to determine the overall differences.

           Bitmaps of very large devices are also relatively large, but they usually compress
           very well using run-length encoding. This can save time and bandwidth for the bitmap
           transfers.

           The use-rle parameter determines if run-length encoding should be used. It is on by
           default since DRBD 8.4.0.

       verify-alg hash-algorithm
           Online verification (drbdadm verify) computes and compares checksums of disk blocks
           (i.e., hash values) in order to detect if they differ. The verify-alg parameter
           determines which algorithm to use for these checksums. It must be set to one of the
           secure hash algorithms supported by the kernel before online verify can be used; see
           the shash algorithms listed in /proc/crypto.

           We recommend to schedule online verifications regularly during low-load periods, for
           example once a month. Also see the notes on data integrity below.

   Section on Parameters
       address [address-family] address:port

           Defines the address family, address, and port of a connection endpoint.

           The address families ipv4, ipv6, ssocks (Dolphin Interconnect Solutions' "super
           sockets"), sdp (Infiniband Sockets Direct Protocol), and sci are supported (sci is an
           alias for ssocks). If no address family is specified, ipv4 is assumed. For all address
           families except ipv6, the address is specified in IPV4 address notation (for example,
           1.2.3.4). For ipv6, the address is enclosed in brackets and uses IPv6 address notation
           (for example, [fd01:2345:6789:abcd::1]). The port is always specified as a decimal
           number from 1 to 65535.

           On each host, the port numbers must be unique for each address; ports cannot be
           shared.

       node-id value

           Defines the unique node identifier for a node in the cluster. Node identifiers are
           used to identify individual nodes in the network protocol, and to assign bitmap slots
           to nodes in the metadata.

           Node identifiers can only be reasssigned in a cluster when the cluster is down. It is
           essential that the node identifiers in the configuration and in the device metadata
           are changed consistently on all hosts. To change the metadata, dump the current state
           with drbdmeta dump-md, adjust the bitmap slot assignment, and update the metadata with
           drbdmeta restore-md.

           The node-id parameter exists since DRBD 9. Its value ranges from 0 to 16; there is no
           default.

   Section options Parameters (Resource Options)
       auto-promote bool-value
           A resource must be promoted to primary role before any of its devices can be mounted
           or opened for writing.

           Before DRBD 9, this could only be done explicitly ("drbdadm primary"). Since DRBD 9,
           the auto-promote parameter allows to automatically promote a resource to primary role
           when one of its devices is mounted or opened for writing. As soon as all devices are
           unmounted or closed with no more remaining users, the role of the resource changes
           back to secondary.

           Automatic promotion only succeeds if the cluster state allows it (that is, if an
           explicit drbdadm primary command would succeed). Otherwise, mounting or opening the
           device fails as it already did before DRBD 9: the mount(2) system call fails with
           errno set to EROFS (Read-only file system); the open(2) system call fails with errno
           set to EMEDIUMTYPE (wrong medium type).

           Irrespective of the auto-promote parameter, if a device is promoted explicitly
           (drbdadm primary), it also needs to be demoted explicitly (drbdadm secondary).

           The auto-promote parameter is available since DRBD 9.0.0, and defaults to yes.

       cpu-mask cpu-mask

           Set the cpu affinity mask for DRBD kernel threads. The cpu mask is specified as a
           hexadecimal number. The default value is 0, which lets the scheduler decide which
           kernel threads run on which CPUs. CPU numbers in cpu-mask which do not exist in the
           system are ignored.

       on-no-data-accessible policy
           Determine how to deal with I/O requests when the requested data is not available
           locally or remotely (for example, when all disks have failed). The defined policies
           are:

           io-error
               System calls fail with errno set to EIO.

           suspend-io
               The resource suspends I/O. I/O can be resumed by (re)attaching the lower-level
               device, by connecting to a peer which has access to the data, or by forcing DRBD
               to resume I/O with drbdadm resume-io res. When no data is available, forcing I/O
               to resume will result in the same behavior as the io-error policy.

           This setting is available since DRBD 8.3.9; the default policy is io-error.

       peer-ack-window value

           On each node and for each device, DRBD maintains a bitmap of the differences between
           the local and remote data for each peer device. For example, in a three-node setup
           (nodes A, B, C) each with a single device, every node maintains one bitmap for each of
           its peers.

           When nodes receive write requests, they know how to update the bitmaps for the writing
           node, but not how to update the bitmaps between themselves. In this example, when a
           write request propagates from node A to B and C, nodes B and C know that they have the
           same data as node A, but not whether or not they both have the same data.

           As a remedy, the writing node occasionally sends peer-ack packets to its peers which
           tell them which state they are in relative to each other.

           The peer-ack-window parameter specifies how much data a primary node may send before
           sending a peer-ack packet. A low value causes increased network traffic; a high value
           causes less network traffic but higher memory consumption on secondary nodes and
           higher resync times between the secondary nodes after primary node failures. (Note:
           peer-ack packets may be sent due to other reasons as well, e.g. membership changes or
           expiry of the peer-ack-delay timer.)

           The default value for peer-ack-window is 2 MiB, the default unit is sectors. This
           option is available since 9.0.0.

       peer-ack-delay expiry-time

           If after the last finished write request no new write request gets issued for
           expiry-time, then a peer-ack packet is sent. If a new write request is issued before
           the timer expires, the timer gets reset to expiry-time. (Note: peer-ack packets may be
           sent due to other reasons as well, e.g. membership changes or the peer-ack-window
           option.)

           This parameter may influence resync behavior on remote nodes. Peer nodes need to wait
           until they receive an peer-ack for releasing a lock on an AL-extent. Resync operations
           between peers may need to wait for for these locks.

           The default value for peer-ack-delay is 100 milliseconds, the default unit is
           milliseconds. This option is available since 9.0.0.

   Section startup Parameters
       The parameters in this section define the behavior of DRBD at system startup time, in the
       DRBD init script. They have no effect once the system is up and running.

       degr-wfc-timeout timeout

           Define how long to wait until all peers are connected in case the cluster consisted of
           a single node only when the system went down. This parameter is usually set to a value
           smaller than wfc-timeout. The assumption here is that peers which were unreachable
           before a reboot are less likely to be be reachable after the reboot, so waiting is
           less likely to help.

           The timeout is specified in seconds. The default value is 0, which stands for an
           infinite timeout. Also see the wfc-timeout parameter.

       outdated-wfc-timeout timeout

           Define how long to wait until all peers are connected if all peers were outdated when
           the system went down. This parameter is usually set to a value smaller than
           wfc-timeout. The assumption here is that an outdated peer cannot have become primary
           in the meantime, so we don't need to wait for it as long as for a node which was alive
           before.

           The timeout is specified in seconds. The default value is 0, which stands for an
           infinite timeout. Also see the wfc-timeout parameter.

       stacked-timeouts
           On stacked devices, the wfc-timeout and degr-wfc-timeout parameters in the
           configuration are usually ignored, and both timeouts are set to twice the connect-int
           timeout. The stacked-timeouts parameter tells DRBD to use the wfc-timeout and
           degr-wfc-timeout parameters as defined in the configuration, even on stacked devices.
           Only use this parameter if the peer of the stacked resource is usually not available,
           or will not become primary. Incorrect use of this parameter can lead to unexpected
           split-brain scenarios.

       wait-after-sb
           This parameter causes DRBD to continue waiting in the init script even when a
           split-brain situation has been detected, and the nodes therefore refuse to connect to
           each other.

       wfc-timeout timeout

           Define how long the init script waits until all peers are connected. This can be
           useful in combination with a cluster manager which cannot manage DRBD resources: when
           the cluster manager starts, the DRBD resources will already be up and running. With a
           more capable cluster manager such as Pacemaker, it makes more sense to let the cluster
           manager control DRBD resources. The timeout is specified in seconds. The default value
           is 0, which stands for an infinite timeout. Also see the degr-wfc-timeout parameter.

   Section volume Parameters
       device /dev/drbdminor-number

           Define the device name and minor number of a replicated block device. This is the
           device that applications are supposed to access; in most cases, the device is not used
           directly, but as a file system. This parameter is required and the standard device
           naming convention is assumed.

           In addition to this device, udev will create /dev/drbd/by-res/resource/volume and
           /dev/drbd/by-disk/lower-level-device symlinks to the device.

       disk {[disk] | none}

           Define the lower-level block device that DRBD will use for storing the actual data.
           While the replicated drbd device is configured, the lower-level device must not be
           used directly. Even read-only access with tools like dumpe2fs(8) and similar is not
           allowed. The keyword none specifies that no lower-level block device is configured;
           this also overrides inheritance of the lower-level device.

       meta-disk internal,
       meta-disk device,
       meta-disk device [index]

           Define where the metadata of a replicated block device resides: it can be internal,
           meaning that the lower-level device contains both the data and the metadata, or on a
           separate device.

           When the index form of this parameter is used, multiple replicated devices can share
           the same metadata device, each using a separate index. Each index occupies 128 MiB of
           data, which corresponds to a replicated device size of at most 4 TiB with two cluster
           nodes. We recommend not to share metadata devices anymore, and to instead use the lvm
           volume manager for creating metadata devices as needed.

           When the index form of this parameter is not used, the size of the lower-level device
           determines the size of the metadata. The size needed is 36 KiB + (size of lower-level
           device) / 32K * (number of nodes - 1). If the metadata device is bigger than that, the
           extra space is not used.

           This parameter is required if a disk other than none is specified, and ignored if disk
           is set to none. A meta-disk parameter without a disk parameter is not allowed.

NOTES ON DATA INTEGRITY

       DRBD supports two different mechanisms for data integrity checking: first, the
       data-integrity-alg network parameter allows to add a checksum to the data sent over the
       network. Second, the online verification mechanism (drbdadm verify and the verify-alg
       parameter) allows to check for differences in the on-disk data.

       Both mechanisms can produce false positives if the data is modified during I/O (i.e.,
       while it is being sent over the network or written to disk). This does not always indicate
       a problem: for example, some file systems and applications do modify data under I/O for
       certain operations. Swap space can also undergo changes while under I/O.

       Network data integrity checking tries to identify data modification during I/O by
       verifying the checksums on the sender side after sending the data. If it detects a
       mismatch, it logs an error. The receiver also logs an error when it detects a mismatch.
       Thus, an error logged only on the receiver side indicates an error on the network, and an
       error logged on both sides indicates data modification under I/O.

       The most recent example of systematic data corruption was identified as a bug in the TCP
       offloading engine and driver of a certain type of GBit NIC in 2007: the data corruption
       happened on the DMA transfer from core memory to the card. Because the TCP checksum were
       calculated on the card, the TCP/IP protocol checksums did not reveal this problem.

VERSION

       This document was revised for version 9.0.0 of the DRBD distribution.

AUTHOR

       Written by Philipp Reisner <philipp.reisner@linbit.com> and Lars Ellenberg
       <lars.ellenberg@linbit.com>.

REPORTING BUGS

       Report bugs to <drbd-user@lists.linbit.com>.

COPYRIGHT

       Copyright 2001-2012 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg. This
       is free software; see the source for copying conditions. There is NO warranty; not even
       for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

SEE ALSO

       drbd(8), drbddisk(8), drbdsetup(8), drbdadm(8), DRBD User's Guide[1], DRBD Web Site[3]

NOTES

        1. DRBD User's Guide
           http://www.drbd.org/users-guide/

        2.

                 Online Usage Counter
           http://usage.drbd.org

        3. DRBD Web Site
           http://www.drbd.org/