Provided by: corosync_1.3.0-3ubuntu1_i386 bug


       corosync.conf - corosync executive configuration file




       The  corosync.conf  instructs  the  corosync  executive  about  various
       parameters needed to control the corosync executive.  Empty  lines  and
       lines  starting  with  # character are ignored.  The configuration file
       consists of bracketed top level  directives.   The  possible  directive
       choices are:

       totem { }
              This  top level directive contains configuration options for the
              totem protocol.

       logging { }
              This top level  directive  contains  configuration  options  for

       event { }
              This  top level directive contains configuration options for the
              event service.

       It is also possible to specify the top level  parameter  compatibility.
       This  directive  indicates  the level of compatibility requested by the
       user.  The  option  whitetank  can  be  specified  to  remain  backward
       compatable  with  openais-0.80.z.   The option none can be specified to
       only  be  compatable  with  corosync-1.Y.Z.   Extra  processing  during
       configuration changes is required to remain backward compatable.

       The default is whitetank. (backwards compatibility)

       Within  the totem directive, an interface directive is required.  There
       is also one configuration option which is required:

       Within the interface sub-directive of totem there are  four  parameters
       which are required.  There is one parameter which is optional.

              This  specifies  the  ring number for the interface.  When using
              the redundant  ring  protocol,  each  interface  should  specify
              separate  ring  numbers  to  uniquely identify to the membership
              protocol which interface to use for which  redundant  ring.  The
              ringnumber must start at 0.

              This specifies the network address the corosync executive should
              bind to.  For example, if the local  interface  is
              with  netmask, set bindnetaddr to  If
              the   local   interface    is    with    netmask
    , set bindnetaddr to, and so forth.

              This  may also be an IPV6 address, in which case IPV6 networking
              will be used.  In this case, the full address must be  specified
              and  there  is  no  automatic selection of the network interface
              within a specific subnet as with IPv4.

              If IPv6 networking is used, the nodeid field must be specified.

              This is optional and can be set to yes.  If it is  set  to  yes,
              the  broadcast  address will be used for communication.  If this
              option is set, mcastaddr should not be set.

              This is the multicast address used by corosync  executive.   The
              default   should   work  for  most  networks,  but  the  network
              administrator should be queried about  a  multicast  address  to
              use.   Avoid  224.x.x.x  because  this  is  a "config" multicast

              This may also be an IPV6 multicast address, in which  case  IPV6
              networking will be used.  If IPv6 networking is used, the nodeid
              field must be specified.

              This specifies the UDP port number.  It is possible to  use  the
              same  multicast  address on a network with the corosync services
              configured for different UDP ports.  Please note  corosync  uses
              two  UDP  ports mcastport (for mcast receives) and mcastport - 1
              (for mcast sends).  If you have multiple clusters  on  the  same
              network using the same mcastaddr please configure the mcastports
              with a gap.

       member This specifies a member on the interface and used with the  udpu
              transport  only.   Every  node  that  should  be a member of the
              membership should be specified as a separate  member  directive.
              Within  the  member  directive  there  is a parameter memberaddr
              which specifies the ip address of one of the nodes.

       Within the totem directive, there are seven  configuration  options  of
       which one is required, five are optional, and one is required when IPV6
       is configured in the interface subdirective.   The  required  directive
       controls  the  version of the totem configuration.  The optional option
       unless using IPV6 directive controls identification of  the  processor.
       The  optional options control secrecy and authentication, the redundant
       ring mode of operation, maximum network  MTU,  and  number  of  sending
       threads, and the nodeid field.

              This specifies the version of the configuration file.  Currently
              the only valid version for this directive is 2.

       nodeid This configuration  option  is  optional  when  using  IPv4  and
              required when using IPv6.  This is a 32 bit value specifying the
              node identifier delivered to the cluster membership service.  If
              this  is not specified with IPv4, the node id will be determined
              from the 32 bit IP address the system to  which  the  system  is
              bound  with  ring identifier of 0.  The node identifier value of
              zero is reserved and should not be used.

              This configuration option is optional and is only relevant  when
              no  nodeid  is specified.  Some openais clients require a signed
              32 bit nodeid that is  greater  than  zero  however  by  default
              openais  uses  all  32  bits  of  the  IPv4  address  space when
              generating a nodeid.  Set this option to yes to force  the  high
              bit  to  be  zero  and  therefor ensure the nodeid is a positive
              signed 32 bit integer.

              WARNING: The clusters behavior is undefined if  this  option  is
              enabled  on  only  a subset of the cluster (for example during a
              rolling upgrade).

              This specifies that HMAC/SHA1 authentication should be  used  to
              authenticate  all  messages.  It further specifies that all data
              should be encrypted with the sober128  encryption  algorithm  to
              protect data from eavesdropping.

              Enabling this option adds a 36 byte header to every message sent
              by  totem  which  reduces  total  throughput.   Encryption   and
              authentication  consume 75% of CPU cycles in aisexec as measured
              with gprof when enabled.

              For 100mbit  networks  with  1500  MTU  frame  transmissions:  A
              throughput of 9mb/sec is possible with 100% cpu utilization when
              this option is enabled on 3ghz cpus.  A throughput  of  10mb/sec
              is  possible wth 20% cpu utilization when this optin is disabled
              on 3ghz cpus.

              For gig-e networks with large frame transmissions: A  throughput
              of  20mb/sec  is  possible  when  this option is enabled on 3ghz
              cpus.  A throughput of 60mb/sec is possible when this option  is
              disabled on 3ghz cpus.

              The default is on.

              This  specifies  the  mode of redundant ring, which may be none,
              active, or passive.  Active replication  offers  slightly  lower
              latency from transmit to delivery in faulty network environments
              but with  less  performance.   Passive  replication  may  nearly
              double  the  speed of the totem protocol if the protocol doesn't
              become cpu bound.  The final option is none, in which case  only
              one  network  interface  will  be  used  to  operate  the  totem

              If  only  one  interface  directive  is   specified,   none   is
              automatically  chosen.   If  multiple  interface  directives are
              specified, only active or passive may be chosen.

       netmtu This specifies the network maximum transmit unit.  To  set  this
              value  beyond  1500,  the  regular  frame MTU, requires ethernet
              devices that support large, or also called  jumbo,  frames.   If
              any  device  in  the  network  doesn't support large frames, the
              protocol will not operate properly.  The hosts  must  also  have
              their mtu size set from 1500 to whatever frame size is specified

              Please note while  some  NICs  or  switches  claim  large  frame
              support,  they  support  9000  MTU  as  the  maximum  frame size
              including the IP header.  Setting the netmtu and  host  MTUs  to
              9000  will  cause totem to use the full 9000 bytes of the frame.
              Then Linux will add a 18 byte header moving the full frame  size
              to  9018.   As  a result some hardware will not operate properly
              with this size of data.  A netmtu of 8982 seems to work for  the
              few   large   frame   devices   that  have  been  tested.   Some
              manufacturers claim  large  frame  support  when  in  fact  they
              support frame sizes of 4500 bytes.

              Increasing   the  MTU  from  1500  to  8982  doubles  throughput
              performance from 30MB/sec to 60MB/sec as measured with  evsbench
              with 175000 byte messages with the secauth directive set to off.

              When  sending  multicast  traffic,  if  the  network  frequently
              reconfigures, chances  are  that  some  device  in  the  network
              doesn't support large frames.

              Choose  hardware  carefully  if  intending  to  use  large frame

              The default is 1500.

              This directive controls how many threads are used to encrypt and
              send  multicast  messages.  If secauth is off, the protocol will
              never use threaded sending.  If secauth is  on,  this  directive
              allows  systems  to  be  configured  to  use multiple threads to
              encrypt and send multicast messages.

              A thread directive of 0 indicates that no threaded  send  should
              be used.  This mode offers best performance for non-SMP systems.

              The default is 0.

              This  directive  controls the virtual synchrony filter type used
              to identify a primary component.  The preferred  choice  is  YKD
              dynamic  linear  voting,  however,  for  clusters larger then 32
              nodes YKD consumes alot of memory.   For  large  scale  clusters
              that are created by changing the MAX_PROCESSORS_COUNT #define in
              the C code totem.h file, the virtual synchrony filter "none"  is
              recommended  but then AMF and DLCK services (which are currently
              experimental) are not safe for use.

              The default is ykd.  The vsftype can also be set to none.

              This directive controls the transport mechanism  used.   If  the
              interface to which corosync is binding is an RDMA interface such
              as RoCEE or Infiniband, the "iba" parameter  may  be  specified.
              To  avoid  the  use  of  multicast entirely, a unicast transport
              parameter "udpu" can be specified.  This requires specifying the
              list  of  members  that could potentially make up the membership
              before deployment.

              The default is udp.  The transport type can also be set to  udpu
              or iba.

              Within  the  totem  directive,  there  are several configuration
              options which are used to control the operation of the protocol.
              It  is  generally  not recommended to change any of these values
              without proper guidance and sufficient testing.   Some  networks
              may   require   larger   values   if   suffering  from  frequent
              reconfigurations.  Some applications may require faster  failure
              detection  times  which  can  be  achieved by reducing the token

       token  This timeout specifies in milliseconds until  a  token  loss  is
              declared  after  not  receiving a token.  This is the time spent
              detecting a failure of a processor in the current configuration.
              Reforming  a  new  configuration  takes about 50 milliseconds in
              addition to this timeout.

              The default is 1000 milliseconds.

              This timeout specifies in milliseconds  after  how  long  before
              receiving  a  token  the  token  is retransmitted.  This will be
              automatically calculated  if  token  is  modified.   It  is  not
              recommended  to  alter  this  value  without  guidance  from the
              corosync community.

              The default is 238 milliseconds.

       hold   This timeout specifies in milliseconds how long the token should
              be  held  by  the  representative when the protocol is under low
              utilization.   It is not recommended to alter this value without
              guidance from the corosync community.

              The default is 180 milliseconds.

              This  value  identifies  how  many  token  retransmits should be
              attempted before forming a new configuration.  If this value  is
              set,  retransmit  and hold will be automatically calculated from
              retransmits_before_loss and token.

              The default is 4 retransmissions.

       join   This timeout specifies in milliseconds how long to wait for join
              messages in the membership protocol.

              The default is 50 milliseconds.

              This  timeout specifies in milliseconds an upper range between 0
              and send_join to  wait  before  sending  a  join  message.   For
              configurations  with  less  then 32 nodes, this parameter is not
              necessary.  For larger rings, this  parameter  is  necessary  to
              ensure the NIC is not overflowed with join messages on formation
              of a new ring.  A reasonable value for large rings  (128  nodes)
              would  be  80msec.   Other timer values must also change if this
              value is changed.  Seek advice from the corosync mailing list if
              trying to run larger configurations.

              The default is 0 milliseconds.

              This  timeout  specifies  in  milliseconds  how long to wait for
              consensus  to  be  achieved  before  starting  a  new  round  of
              membership  configuration.  The minimum value for consensus must
              be 1.2 * token.  This value will be automatically calculated  at
              1.2 * token if the user doesn't specify a consensus value.

              For  two node clusters, a consensus larger then the join timeout
              but less then token is safe.  For three node or larger clusters,
              consensus  should  be larger then token.  There is an increasing
              risk of odd membership changes,  which  stil  guarantee  virtual
              synchrony,  as node count grows if consensus is less than token.

              The default is 1200 milliseconds.

       merge  This  timeout  specifies in milliseconds how long to wait before
              checking for a partition when  no  multicast  traffic  is  being
              sent.   If  multicast traffic is being sent, the merge detection
              happens automatically as a function of the protocol.

              The default is 200 milliseconds.

              This timeout specifies in milliseconds how long to  wait  before
              checking  that  a network interface is back up after it has been

              The default is 1000 millseconds.

              This constant specifies how many rotations of the token  without
              receiving  any  of the messages when messages should be received
              may occur before a new configuration is formed.

              The default is 50 failures to receive a message.

              This constant specifies how many rotations of the token  without
              any  multicast  traffic  should occur before the merge detection
              timeout is started.

              The default is 30 rotations.

              [HeartBeating mechanism] Configures  the  optional  HeartBeating
              mechanism  for  faster  failure  detection.  Keep  in  mind that
              engaging this mechanism in lossy  networks  could  cause  faulty
              loss  declaration  as  the  mechanism  relies on the network for

              So as a rule of thumb use this mechanism if you require improved
              failure in low to medium utilized networks.

              This  constant  specifies  the  number of heartbeat failures the
              system should tolerate before declaring heartbeat failure e.g 3.
              Also  if  this  value  is  not  set  or  is 0 then the heartbeat
              mechanism is not engaged in the system and token rotation is the
              method of failure detection

              The default is 0 (disabled).

              [HeartBeating mechanism] This constant specifies in milliseconds
              the approximate delay that your network takes to  transport  one
              packet  from  one machine to another. This value is to be set by
              system engineers and please dont change  if  not  sure  as  this
              effects the failure detection mechanism using heartbeat.

              The default is 50 milliseconds.

              This  constant specifies the maximum number of messages that may
              be sent on  one  token  rotation.   If  all  processors  perform
              equally  well,  this  value  could  be  large (300), which would
              introduce higher latency from origination to delivery  for  very
              large  rings.   To  reduce  latency  in  large  rings(16+),  the
              defaults are a safe compromise.  If 1 or more slow  processor(s)
              are  present  among  fast  processors,  window_size should be no
              larger then 256000 / netmtu to  avoid  overflow  of  the  kernel
              receive buffers.  The user is notified of this by the display of
              a retransmit list in the notification logs.  There is no loss of
              data, but performance is reduced when these errors occur.

              The default is 50 messages.

              This  constant specifies the maximum number of messages that may
              be  sent  by  one  processor  on  receipt  of  the  token.   The
              max_messages  parameter is limited to 256000 / netmtu to prevent
              overflow of the kernel transmit buffers.

              The default is 17 messages.

              This  specifies  the  time  in  milliseconds  to   wait   before
              decrementing  the  problem  count  by 1 for a particular ring to
              ensure a  link  is  not  marked  faulty  for  transient  network

              The default is 2000 milliseconds.

              This  specifies the number of times a problem is detected with a
              link before setting the link faulty.  Once a link is set faulty,
              no  more data is transmitted upon it.  Also, the problem counter
              is no longer decremented when the problem count timeout expires.

              A problem is detected whenever all tokens  from  the  proceeding
              processor     have     not     been    received    within    the
              rrp_token_expired_timeout.   The  rrp_problem_count_threshold  *
              rrp_token_expired_timeout should be atleast 50 milliseconds less
              then the token timeout, or a complete reconfiguration may occur.

              The default is 10 problem counts.

              This specifies the time in milliseconds to increment the problem
              counter  for  the  redundant  ring  protocol  after  not  having
              received a token from all rings for a particular processor.

              This value will  automatically  be  calculated  from  the  token
              timeout  and  problem_count_threshold but may be overridden.  It
              is not recommended to override this value without guidance  from
              the corosync community.

              The default is 47 milliseconds.

       Within  the  logging directive, there are several configuration options
       which are all optional.

       The following 3 options are  valid  only  for  the  top  level  logging

              This specifies that a timestamp is placed on all log messages.

              The default is off.

              This specifies that file and line should be printed.

              The default is off.

              This specifies that the code function name should be printed.

              The default is off.

       The  following  options  are valid both for top level logging directive
       and they can be overriden in logger_subsys entries.



              These specify the destination of logging output. Any combination
              of these options may be specified. Valid options are yes and no.

              The default is syslog and stderr.

              Please  note, if you are using to_logfile and want to rotate the
              file, use logrotate(8) with the option copytruncate.  eg.

              /var/log/corosync.log {
                  rotate 7

              If the  to_logfile  directive  is  set  to  yes  ,  this  option
              specifies the pathname of the log file.

              No default.

              This   specifies   the  logfile  priority  for  this  particular
              subsystem. Ignored if debug is on.  Possible values are:  alert,
              crit,  debug  (same  as  debug  = on), emerg, err, info, notice,

              The default is: info.

              This specifies the syslog facility type that will  be  used  for
              any messages sent to syslog. options are daemon, local0, local1,
              local2, local3, local4, local5, local6 & local7.

              The default is daemon.

              This specifies the syslog level for this  particular  subsystem.
              Ignored if debug is on.  Possible values are: alert, crit, debug
              (same as debug = on), emerg, err, info, notice, warning.

              The default is: info.

       debug  This  specifies  whether  debug  output  is  logged   for   this
              particular logger.

              The default is off.

       tags   This  specifies  which tags should be traced for this particular
              logger.  Set debug directive to on in order  to  enable  tracing
              using  tags.   Values  are  specified  using a vertical bar as a
              logical OR separator:


              The default is none.

       Within the logging directive, logger_subsys directives are optional.

       Within the  logger_subsys  sub-directive,  all  of  the  above  logging
       configuration options are valid and can be used to override the default
       settings.  The subsys entry, described below, is mandatory to  identify
       the subsystem.

       subsys This  specifies  the subsystem identity (name) for which logging
              is specified. This is the name used by a service in the log_init
              () call. E.g. 'CKPT'. This directive is required.


              The corosync executive configuration file.


       corosync_overview(8), logrotate(8)