oracular (7) o2cb.7.gz

Provided by: ocfs2-tools_1.8.8-2_amd64 bug

NAME

       o2cb - Default cluster stack of the OCFS2 file system.

SYNOPSIS

       o2cb  is  the  default  cluster stack of the OCFS2 file system. It is an in-kernel cluster
       stack that includes a node manager (o2nm) to keep track of the nodes  in  the  cluster,  a
       disk  heartbeat  agent (o2hb) to detect node live-ness, a network agent (o2net) for intra-
       cluster node communication and a distributed lock manager (o2dlm) to keep  track  of  lock
       resources.   It  also  includes  a  synthetic file system, dlmfs, to allow applications to
       access the in-kernel dlm.

CONFIGURATION

       The stack is configured using the  o2cb(8)  cluster  configuration  utility  and  operated
       (online/offline/status) using the o2cb init service.

       CLUSTER CONFIGURATION

              It    has    two    configuration    files.    One    for    the   cluster   layout
              (/etc/ocfs2/cluster.conf)  and  the  other   for   the   cluster   timeouts,   etc.
              (/etc/sysconfig/o2cb).  More  information  about  these  two  files can be found in
              ocfs2.cluster.conf(5) and o2cb.sysconfig(5).

              The o2cb cluster stack supports two heartbeat  modes,  namely,  local  and  global.
              Only one heartbeat mode can be active at any one time.

              Local  heartbeat  refers  to disk heartbeating on all shared devices. In this mode,
              the heartbeat is started during mount and stopped during umount. This mode is  easy
              to  setup as it does not require configuring heartbeat devices. The one drawback in
              this mode is the overhead on servers having a large number  of  OCFS2  mounts.  For
              example,  a  server  with  50  mounts  will  have 50 heartbeat threads. This is the
              default heartbeat mode.

              Global heartbeat, on the other hand, refers  to  heartbeating  on  specific  shared
              devices.   These  devices  are  normal  OCFS2  formatted volumes that could also be
              mounted and used as clustered file systems. In this mode, the heartbeat is  started
              during  cluster  online  and stopped during cluster offline. While this mode can be
              used for all clusters, it is strongly  recommended  for  clusters  having  a  large
              number of mounts.

              More information on disk heartbeat is provided below.

       KERNEL CONFIGURATION

              Two  sysctl  values  need  to  be  set  for  o2cb  to function properly. The first,
              panic_on_oops, must be enabled to turn a kernel oops into  a  panic.  If  a  kernel
              thread required for o2cb to function crashes, the system must be reset to prevent a
              cluster hang. If it is not set, another node may not be able to distinguish whether
              a node is unable to respond or slow to respond.

              The  other related sysctl parameter is panic, which specifies the number of seconds
              after a panic that the system will be auto-reset. Setting this  parameter  to  zero
              disables  autoreset;  the  cluster  will  require  manual intervention. This is not
              preferred in a cluster environment.

              To manually enable panic on oops and set a 30 sec timeout for reboot on panic, do:

              # echo 1 > /proc/sys/kernel/panic_on_oops
              # echo 30 > /proc/sys/kernel/panic

              To enable the above on every boot, add the following to /etc/sysctl.conf:

              kernel.panic_on_oops = 1
              kernel.panic = 30

       OS CONFIGURATION

              The o2cb cluster stack also requires iptables (firewalling) to be  either  disabled
              or  modified  to  allow  network traffic on the private network interface. The port
              used by o2cb is specified in /etc/ocfs2/cluster.conf.

DISK HEARTBEAT

       O2CB uses disk heartbeat to  detect  node  liveness.  The  disk  heartbeat  thread,  o2hb,
       periodically  reads  and  writes  to  a  heartbeat  file in a OCFS2 file system. Its write
       payload contains a sequence number that it increments in each  write.  This  allows  other
       nodes  reading the same heartbeat file to detect the change and associate that with a live
       node.  Conversely, a node whose sequence number  has  stopped  changing  is  marked  as  a
       possible dead node. Possible. Not confirmed. That is because it just could be slow I/Os.

       To differentiate between a dead node and one that has slow I/Os, O2CB has a disk heartbeat
       threshold (timeout). Only nodes  whose  sequence  number  has  not  incremented  for  that
       duration are marked dead.

       However  that  node  may  not be dead but just experiencing slow I/O. To prevent that, the
       heartbeat thread keeps track of the time elapsed since the last completed write.  If  that
       time  exceeds  the timeout, it forces a self-fence. It does so to prevent other nodes from
       marking it as dead while it is still alive.

       This self-fencing scheme has proven to be very reliable as it relies on kernel timers  and
       pci  bus  reset. External fencing, while attractive, is rarely as reliable as it relies on
       external hardware and software that is prone to failure due to misconfiguration, etc.

       Having said that, O2CB disk heartbeat has had its share of  problems  with  self  fencing.
       Nodes experiencing slow I/O on only one of multiple devices have to initiate self-fence.

       This  is  because  in  the  default  local heartbeat scheme, nodes in a cluster may not be
       heartbeating on the same set of devices.

       The global heartbeat mode addresses this shortcoming by introducing a scheme  that  forces
       all  nodes  to heartbeat on the same set of devices. In this scheme, a node experiencing a
       slowdown in I/O on a device may not need to initiate self-fence. It will only have  to  do
       so if it encounters slowdown on 50% or more of the heartbeat devices.  In a cluster with 3
       heartbeat regions, a slowdown in 1 region will be tolerated. In a cluster with 5  regions,
       a slowdown in 2 will be tolerated.

       It  is  for  this  reason,  this  mode  is recommended for users that have 3 or more OCFS2
       mounts.

       O2CB allows up to 32 heartbeat regions to be configured in the global heartbeat mode.

ONLINE CLUSTER MODIFICATION

       The O2CB cluster stack allows adding and removing nodes in an online cluster when  run  in
       the  global  heartbeat  mode.  Use  the  o2cb(8)  utility  to  make  the  changes  in  the
       configuration and (re)online the cluster using the o2cb init script. The user must do  the
       same on all nodes in the cluster. The cluster will not allow any new cluster mounts if the
       node configuration on all nodes is not the same.

       The removal of nodes will only succeed if that node is no  longer  in  use.  If  the  user
       removes an active node from the configuration, the re-online will fail.

       The  cluster stack also allows adding and removing heartbeat regions in an online cluster.
       Use the o2cb(8) utility to make the changes in the configuration file and  (re)online  the
       cluster using the o2cb init script. The user must do the same on all nodes in the cluster.
       The cluster will not allow any new cluster mounts if the heartbeat region configuration on
       all nodes is not the same.

       The removal of heartbeat regions will only succeed if the active heartbeat region count is
       greater than 3. This is to protect  against  edge  conditions  that  can  destabilize  the
       cluster.

GETTING STARTED

       The first step in configuring o2cb is deciding whether to setup local or global heartbeat.
       If global heartbeat, then one has to format atleast one heartbeat device.

       To format a OCFS2 volume with global heartbeat enabled, do:

       # mkfs.ocfs2 --cluster-stack=o2cb --cluster-name=webcluster --global-heartbeat -L "hbvol1" /dev/sdb1

       Once  formatted,  setup  /etc/ocfs2/cluster.conf  following  the   example   provided   in
       ocfs2.cluster.conf(5).

       If  local  heartbeat,  then  one can setup cluster.conf without any heartbeat devices. The
       next step is starting the cluster.

       To online the cluster stack, do:

       # service o2cb online
       Loading stack plugin "o2cb": OK
       Loading filesystem "ocfs2_dlmfs": OK
       Mounting ocfs2_dlmfs filesystem at /dlm: OK
       Setting cluster stack "o2cb": OK
       Registering O2CB cluster "webcluster": OK
       Setting O2CB cluster timeouts : OK
       Starting global heartbeat for cluster "webcluster": OK

       Once the cluster stack is online, new OCFS2 volumes  can  be  formatted  normally  without
       specifying  the  cluster  stack  information.  mkfs.ocfs2(8) will pick up that information
       automatically.

       # mkfs.ocfs2 -L "datavol" /dev/sdc1

       Meanwhile existing volumes can be converted to the new cluster stack using tunefs.ocfs2(8)
       utility.

       # tunefs.ocfs2 --update-cluster-stack /dev/sdd1
       Updating on-disk cluster information to match the running cluster.
       DANGER: YOU MUST BE ABSOLUTELY SURE THAT NO OTHER NODE IS USING THIS FILESYSTEM
       BEFORE MODIFYING ITS CLUSTER CONFIGURATION.
       Update the on-disk cluster information? y

       Another utility mounted.ocfs2(8) is useful is listing all the OCFS2 volumes alonghwith the
       cluster stack information.

       To get a list of OCFS2 volumes, do:

       # mounted.ocfs2 -d
       Device     Stack  Cluster     F  UUID                              Label
       /dev/sdb1  o2cb   webcluster  G  DCDA2845177F4D59A0F2DCD8DE507CC3  hbvol1
       /dev/sdc1  None                  23878C320CF3478095D1318CB5C99EED  localmount
       /dev/sdd1  o2cb   webcluster  G  8AB016CD59FC4327A2CDAB69F08518E3  webvol
       /dev/sdg1  o2cb   webcluster  G  77D95EF51C0149D2823674FCC162CF8B  logsvol
       /dev/sdh1  o2cb   webcluster  G  BBA1DBD0F73F449384CE75197D9B7098  scratch

       The o2cb init script can also be used to check the status  of  the  cluster,  offline  the
       cluster, etc.

       To check the status of the cluster stack, do:

       # service o2cb status
       Driver for "configfs": Loaded
       Filesystem "configfs": Mounted
       Stack glue driver: Loaded
       Stack plugin "o2cb": Loaded
       Driver for "ocfs2_dlmfs": Loaded
       Filesystem "ocfs2_dlmfs": Mounted
       Checking O2CB cluster "webcluster": Online
         Heartbeat dead threshold: 62
         Network idle timeout: 60000
         Network keepalive delay: 2000
         Network reconnect delay: 2000
         Heartbeat mode: Global
       Checking O2CB heartbeat: Active
         77D95EF51C0149D2823674FCC162CF8B /dev/sdg1
         DCDA2845177F4D59A0F2DCD8DE507CC3 /dev/sdk1
         BBA1DBD0F73F449384CE75197D9B7098 /dev/sdh1
       Nodes in O2CB cluster: 6 7 10
       Active userdlm domains:  ovm

       To offline and unload the cluster stack, do:

       # service o2cb offline
       Clean userdlm domains: OK
       Stopping global heartbeat on cluster "webcluster": OK
       Stopping O2CB cluster webcluster: OK
       Unregistering O2CB cluster "webcluster": OK

       # service o2cb unload
       Clean userdlm domains: OK
       Unmounting ocfs2_dlmfs filesystem: OK
       Unloading module "ocfs2_dlmfs": OK
       Unloading module "ocfs2_stack_o2cb": OK

SEE ALSO

       o2cb(8) o2cb.sysconfig(5) ocfs2.cluster.conf(5) o2hbmonitor(8)

AUTHORS

       Oracle Corporation

       Copyright © 2004, 2011 Oracle. All rights reserved.