Provided by: cman_2.20080826-0ubuntu1_i386 bug

NAME

       cman_tool - Cluster Management Tool

SYNOPSIS

       cman_tool  join  |  leave  | kill | expected | votes | version | wait |
       status | nodes | services | debug [options]

DESCRIPTION

       cman_tool is a program that manages the  cluster  management  subsystem
       CMAN.  cman_tool  can  be used to join the node to a cluster, leave the
       cluster, kill another cluster node or  change  the  value  of  expected
       votes of a cluster.
       Be  careful that you understand the consequences of the commands issued
       via cman_tool as they can affect all nodes in your cluster. Most of the
       time  the cman_tool will only be invoked from your startup and shutdown
       scripts.

SUBCOMMANDS

       join   This is the main use of  cman_tool.  It  instructs  the  cluster
              manager  to  attempt  to  join  an  existing  cluster  or (if no
              existing cluster exists) then to form a new one on its own.
              If no options are given to this command then it  will  take  the
              cluster configuration information from cluster.conf. However, it
              is possible to provide all the information on  the  command-line
              or to override cluster.conf values by using the command line.

       leave  Tells CMAN to leave the cluster. You cannot do this if there are
              subsystems (eg DLM, GFS) active. You  should  dismount  all  GFS
              filesystems,  shutdown  CLVM, fenced and anything else using the
              cluster  manager  before  using  cman_tool   leave.    Look   at
              ’cman_tool  status’  and  group_tool to see how many (and which)
              subsystems are active.
              When a node leaves the cluster, the remaining nodes  recalculate
              quorum  and  this  may  block  cluster  activity if the required
              number of votes is not present.  If this node is to be down  for
              an  extended  period  of  time  and you need to keep the cluster
              running, add the remove option, and  the  remaining  nodes  will
              recalculate quorum such that activity can continue.

       kill   Tells  CMAN to kill another node in the cluster. This will cause
              the local node to send a "KILL" message to that node and it will
              shut down.  Recovery will occur for the killed node as if it had
              failed.  This is a sort of remote version of  "leave  force"  so
              only use if if you really know what you are doing.

       expected
              Tells  CMAN  a  new  value of expected votes and instructs it to
              recalculate quorum based on this value.
              Use this option if your cluster has lost  quorum  due  to  nodes
              failing and you need to get it running again in a hurry.

       version
              Used  alone  this will report the major, minor, patch and config
              versions used by CMAN (also displayed in ’cman_tool status’). It
              can  also  be  used  with  -r to set a new config version on all
              cluster members.

       wait   Waits until the node  is  a  member  of  the  cluster  and  then
              returns.

       status Displays the local view of the cluster status.

       nodes  Displays the local view of the cluster nodes.

       services
              Displays  the  local  view of subsystems using cman (deprecated,
              group_tool should be used instead).

       debug  Sets the debug level of the running cman  daemon.  Debug  output
              will  be sent to syslog level LOG_DEBUG. the -d switch specifies
              the new logging  level.  This  is  the  same  bitmask  used  for
              cman_tool join -d

LEAVE OPTIONS

       -w     Normally,  "cman_tool  leave"  will  fail  if  the cluster is in
              transition (ie another node is joining or leaving the  cluster).
              By  adding  the -w flag, cman_tool will wait and retry the leave
              operation repeatedly until it succeeds or a more  serious  error
              occurs.

       -t <seconds>
              If  -w  is also specified then -t dictates the maximum amount of
              time cman_tool is prepared to wait. If the operation  times  out
              then a status of 2 is returned.

       force  Shuts  down the cluster manager without first telling any of the
              subsystems to close down. Use this option with extreme  care  as
              it could easily cause data loss.

       remove Tells  the  rest  of the cluster to recalculate quorum such that
              activity can continue without this node.

EXPECTED OPTIONS

       -e <expected-votes>
              The new value of expected votes to use.  This  will  usually  be
              enough  to  bring  the  cluster  back to life. Values that would
              cause incorrect quorum will be rejected.

KILL OPTIONS

       -n <nodename>
              The node name of the node to  be  killed.  This  should  be  the
              unqualified node name as it appears in ’cman_tool nodes’.

VERSION OPTIONS

       -r <config_version>
              The new config version. You don’t need to use this when adding a
              new node, the new cman node will tell the rest of the cluster to
              get their latest version of the config file automatically.

WAIT OPTIONS

       -q     Waits  until  the  cluster  is  quorate  before  returning.   -t
              <seconds> Dictates the  maximum  amount  of  time  cman_tool  is
              prepared to wait.  If the operation times out then a status of 2
              is returned.

JOIN OPTIONS

       -c <clustername>
              Provides a text name for  the  cluster.  You  can  have  several
              clusters  on  one  LAN  and they are distinguished by this name.
              Note that the name is hashed to provide a unique number which is
              what  actually distinguishes the cluster, so it is possible that
              two different names can clash. If this happens,  the  node  will
              not  be  allowed  into the existing cluster and you will have to
              pick another name or  use  different  port  number  for  cluster
              communication.

       -p <port>
              UDP port number used for cluster communication. This defaults to
              5405.

       -v <votes>
              Number of votes this node has in the cluster. Defaults to 1.

       -e <expected votes>
              Number of expected votes for the  whole  cluster.  If  different
              nodes  provide  different  values  then the highest is used. The
              cluster will only operate when quorum is reached - that is  more
              than  half the available votes are available to the cluster. The
              default for this value is the total  number  of  votes  for  all
              nodes in the configuration file.

       -2     Sets  the cluster up for a special "two node only" mode. Because
              of the quorum requirements mentioned above, a  two-node  cluster
              cannot  be  valid.   This  option tells the cluster manager that
              there will only ever be two nodes in the cluster and  relies  on
              fencing  to  ensure  cluster integrity.  If you specify this you
              cannot add more nodes without taking down the  existing  cluster
              and  reconfiguring  it.  Expected votes should be set to 1 for a
              two-node cluster.

       -n <nodename>
              Overrides the node name. By default the unqualified hostname  is
              used.  This  option  is  also used to specify which interface is
              used for cluster communication.

       -N <nodeid>
              Overrides the  node  ID  for  this  node.  Normally,  nodes  are
              assigned  a node id in cluster.conf. If you specify an incorrect
              node ID here, the node might not be allowed to join the cluster.
              Setting  node IDs in the configuration is a far better way to do
              this.   Note that the node’s application to join the cluster may
              be rejected if you try to set the nodeid to one that has already
              been used, or if the node was previously a member of the cluster
              but with a different nodeid.

       -o <nodename>
              Override  the name this node will have in the cluster. This will
              normally be the hostname or the  first  name  specified  by  -n.
              Note  how  this  differs from -n: -n tells cman_tool how to find
              the host address and/or the entry in the configuration file.  -o
              simply  changes  the  name the node will have in the cluster and
              has no bearing on the actual  name  of  the  machine.  Use  this
              option will extreme caution.

       -m <multicast-address>
              Specifies  a multicast address to use for cluster communication.
              This is required for IPv6 operation. You should also specify  an
              ethernet  interface  to bind to this multicast address using the
              -i option.

       -w     Join and wait until the node is a cluster member.

       -q     Join and wait until the cluster is quorate.  If the cluster join
              fails and -w (or -q) is specified, then it will be retried. Note
              that cman_tool cannot tell whether the cluster join was rejected
              by  another node for a good reason or that it timed out for some
              benign reason; so it is strongly recommended that a  timeout  is
              also given with the wait options to join. If you don’t want join
              to retry on failure but do want to wait, use the cman_tool  join
              command without -w followed by cman_tool wait.

       -k <keyfile>
              All  traffic  sent  out by cman/openais is encrypted. By default
              the security key used is simply the cluster name.  If  you  need
              more  security  you can specify a key file that contains the key
              used to encrypt cluster communications.  Of course, the contents
              of the key file must be the same on all nodes in the cluster. It
              is up to you to securely copy the file to the nodes.

       -t <seconds>
              If -w or -q is also  specified  then  -t  dictates  the  maximum
              amount  of  time cman_tool is prepared to wait. If the operation
              times out then a status  of  2  is  returned.   Note  that  just
              because  cman_tool  has given up, does not mean that cman itself
              has stopped trying to join a cluster.

       -X     Tells cman not to use the  configuration  file  to  get  cluster
              information. If you use this option then cman will apply several
              defaults to the cluster to get it going. The cluster  name  will
              be  "RHCluster",  node IDs will default to the IP address of the
              node and remote node names will show up as Node<nodeid>. All  of
              these,  apart  from  the  node  names  can  be overridden on the
              cman_tool command-line if required.
              If you have to set up fence devices, services or  anything  else
              in  cluster.conf  then this option is probably not worthwhile to
              you - the extra readability of sensible node names  and  numbers
              will  make  it worth using cluster.conf for the cluster too. But
              for a simple failover cluster this might save you some effort.
              On each node using this configuration you will need to have  the
              same authorization key installed. To create this key run
              mkdir /etc/ais
              ais-keygen
              mv /etc/ais/authkey /etc/cluster/cman_authkey
              then copy that file to all nodes you want to join the cluster.

       -C     Overrides  the  default  configuration module. Usually cman uses
              ccsd to load its configuration. If you have  your  configuration
              database  held  elsewhere  (eg  LDAP)  and  have a configuration
              plugin for it, then you should specify the name  of  the  module
              (see  the documentation for the module for the name of it - it’s
              not necessarily the same as the filename) here.
              It is possible to chain configuration modules by separating them
              with  colons.  So  to  add  two  modules  (eg)  ’ldapconfig’ and
              ’ldappreproc’   to   the    chain    start    cman    with    -C
              ldapconfig:ldappreproc
              The  default  value for this is ’ccsconfig’. Note that if the -X
              is on the command-line then -C will be ignored.

       -A     Don’t load openais services. Normally cman_tool join  will  load
              the  configuration module ’openaisserviceenable’ which will load
              the services installed by openais.  If you  don’t  want  to  use
              these  services  or  have not installed openais then this switch
              will disable them.

NODES OPTIONS

       -f     Shows the date/time the node was last  fenced  (if  it  has  bee
              fenced), and also the fence system that was used.

       -a     Shows the IP address(es) the nodes are communicating on.

       -n <nodename>
              Shows  node  information for a specific node. This should be the
              unqualified node name as it appears in ’cman_tool nodes’.

       -F <format>
              Specify the format of the output. The format string may  contain
              one  or  more  format  options, each separated by a comma. Valid
              format options include: id, name, type, and addr.

DEBUG OPTIONS

       -d <value>
              The value is a bitmask of
              2 Barriers
              4 Membership messages
              8 Daemon operation, including command-line interaction
              16 Interaction with OpenAIS
              32 Startup debugging (cman_tool join operations only)

NOTES

       the nodes subcommand shows a list of nodes known to cman. the state  is
       one of the following:
       M    The node is a member of the cluster
       X    The node is not a member of the cluster
       d    The node is known to the cluster but disallowed access to it.

ENVIRONMENT VARIABLES

       cman_tool removes most environment variables before forking and running
       OpenAIS, as well as adding some of its own for setting up configuration
       parameters  that  were overridden on the command-line, the exception to
       this is that variable with names starting COROSYNC_ will be passed down
       intact as they are assumed to be used for configuring the daemon.

DISALLOWED NODES

       Occasionally (but very infrequently I hope) you may see nodes marked as
       "Disallowed" in cman_tool status or "d" in cman_tool nodes.  This is  a
       bit  of  a  nasty  hack  to  get around mismatch between what the upper
       layers expect of the cluster manager and OpenAIS.

       If a node experiences a momentary lack of connectivity, but one that is
       long enough to trigger the token timeouts, then it will be removed from
       the cluster. When connectivity is restored OpenAIS will happily let  it
       rejoin the cluster with no fuss. Sadly the upper layers don’t like this
       very much. They may (indeed probably  will  have)  have  changed  their
       internal  state  while  the  other  node  was  away  and  there  is  no
       straightforward way to bring the rejoined  node  up-to-date  with  that
       state.  When  this  happens  the node is marked "Disallowed" and is not
       permitted to take part in cman operations.

       If the remainder of the cluster is quorate the the node will be sent  a
       kill  message and it will be forced to leave the cluster that way. Note
       that fencing should kick in to remove the node permanently anyway,  but
       it may take longer than the network outage for this to complete.

       If  the  remainder  of the cluster is inquorate then we have a problem.
       The likelihood is that we will have two (or more) partitioned  clusters
       and  we cannot decide which is the "right" one. In this case we need to
       defer to the system administrator to kill an appropriate  selection  of
       nodes to restore the cluster to sensible operation.

       The  latter  scenario  should  be  very  rare  and  may  indicate a bug
       somewhere in the code. If the local network is very flaky  or  busy  it
       may be necessary to increase some of the protocol timeouts for OpenAIS.
       We are trying to think of better solutions to this problem.

       Recovering  from  this  state  can,  unfortunately,   be   complicated.
       Fortunately, in the majority of cases, fencing will do the job for you,
       and the disallowed state will only be temporary. If  it  persists,  the
       recommended  approach  it  is to do a cman tool nodes on all systems in
       the cluster and determine the largest common subset of nodes  that  are
       valid members to each other. Then reboot the others and let them rejoin
       correctly. In the case of a single-node disconnection  this  should  be
       straightforward,  with  a  large cluster that has experienced a network
       partition it could get very complicated!

       Example:

       In this example we have a five node  cluster  that  has  experienced  a
       network  partition.  Here  is  the  output  of cman_tool nodes from all
       systems:
       Node  Sts   Inc   Joined               Name
          1   M   2372   2007-11-05 02:58:55  node-01.example.com
          2   d   2376   2007-11-05 02:58:56  node-02.example.com
          3   d   2376   2007-11-05 02:58:56  node-03.example.com
          4   M   2376   2007-11-05 02:58:56  node-04.example.com
          5   M   2376   2007-11-05 02:58:56  node-05.example.com

       Node  Sts   Inc   Joined               Name
          1   d   2372   2007-11-05 02:58:55  node-01.example.com
          2   M   2376   2007-11-05 02:58:56  node-02.example.com
          3   M   2376   2007-11-05 02:58:56  node-03.example.com
          4   d   2376   2007-11-05 02:58:56  node-04.example.com
          5   d   2376   2007-11-05 02:58:56  node-05.example.com

       Node  Sts   Inc   Joined               Name
          1   d   2372   2007-11-05 02:58:55  node-01.example.com
          2   M   2376   2007-11-05 02:58:56  node-02.example.com
          3   M   2376   2007-11-05 02:58:56  node-03.example.com
          4   d   2376   2007-11-05 02:58:56  node-04.example.com
          5   d   2376   2007-11-05 02:58:56  node-05.example.com

       Node  Sts   Inc   Joined               Name
          1   M   2372   2007-11-05 02:58:55  node-01.example.com
          2   d   2376   2007-11-05 02:58:56  node-02.example.com
          3   d   2376   2007-11-05 02:58:56  node-03.example.com
          4   M   2376   2007-11-05 02:58:56  node-04.example.com
          5   M   2376   2007-11-05 02:58:56  node-05.example.com

       Node  Sts   Inc   Joined               Name
          1   M   2372   2007-11-05 02:58:55  node-01.example.com
          2   d   2376   2007-11-05 02:58:56  node-02.example.com
          3   d   2376   2007-11-05 02:58:56  node-03.example.com
          4   M   2376   2007-11-05 02:58:56  node-04.example.com
          5   M   2376   2007-11-05 02:58:56  node-05.example.com
       In this scenario we should  kill  the  node  node-02  and  node-03.  Of
       course,  the 3 node cluster of node-01, node-04 & node-05 should remain
       quorate and be able to fenced the two rejoined nodes anyway, but it  is
       possible that the cluster has a qdisk setup that precludes this.

CONFIGURATION SYSTEMS

       This  section  details  how the configuration systems work in cman. You
       might need to know this if you are using the -C option to cman_tool, or
       writing your own configuration subsystem.
       By  default  cman uses two configuration plugins to OpenAIS. The first,
       ’ccsconfig’, reads the configuration information stored in cluster.conf
       and  stores  it in an internal database, in the same schema as it finds
       in  cluster.conf.   The  second  plugin,  ’cmanpreconfig’,  takes   the
       information   in   that  the  database,  adds  several  cman  defaults,
       determines the OpenAIS node name and nodeID and formats the information
       in  a  similar manner to openais.conf(5). OpenAIS then reads those keys
       to start  the  cluster  protocol.   cmanpreconfig  also  reads  several
       environment variables that might be set by cman_tool which can override
       information in the configuration.
       In the absence of ccsconfig, ie when ’cman_tool join’ is  run  with  -X
       switch  (this  removes  ccsconfig  from the module list), cmanpreconfig
       also generates several defaults so that the cluster can be got  running
       without any configuration information - see above for the details.
       Note  that  cmanpreconfig  will  not  overwrite  OpenAIS  keys that are
       explicitly set in the  configuration  file,  allowing  you  to  provide
       custom  values  for  token  timeouts  etc, even though cman has its own
       defaults for some of those values. The exception to this  is  the  node
       name/address and multicast values, which are always taken from the cman
       configuration keys.