Provided by: corosync_1.4.2-2_amd64 bug

NAME

       corosync.conf - corosync executive configuration file

SYNOPSIS

       /etc/corosync.conf

DESCRIPTION

       The  corosync.conf  instructs  the  corosync  executive about various parameters needed to
       control the corosync executive.  Empty lines and  lines  starting  with  #  character  are
       ignored.  The configuration file consists of bracketed top level directives.  The possible
       directive choices are:

       totem { }
              This top level directive contains configuration options for the totem protocol.

       logging { }
              This top level directive contains configuration options for logging.

       event { }
              This top level directive contains configuration options for the event service.

       It is also possible to specify the top  level  parameter  compatibility.   This  directive
       indicates  the  level of compatibility requested by the user.  The option whitetank can be
       specified to remain backward compatable with  openais-0.80.z.   The  option  none  can  be
       specified   to   only   be   compatable  with  corosync-1.Y.Z.   Extra  processing  during
       configuration changes is required to remain backward compatable.

       The default is whitetank. (backwards compatibility)

       Within the totem directive, an  interface  directive  is  required.   There  is  also  one
       configuration option which is required:

       Within  the interface sub-directive of totem there are four parameters which are required.
       There is one parameter which is optional.

       ringnumber
              This specifies the ring number for the interface.  When using  the  redundant  ring
              protocol,  each interface should specify separate ring numbers to uniquely identify
              to the membership protocol which interface to use for  which  redundant  ring.  The
              ringnumber must start at 0.

       bindnetaddr
              This  specifies  the  network  address  the corosync executive should bind to.  For
              example, if the local interface is 192.168.5.92  with  netmask  255.255.255.0,  set
              bindnetaddr  to  192.168.5.0.   If the local interface is 192.168.5.92 with netmask
              255.255.255.192, set bindnetaddr to 192.168.5.64, and so forth.

              This may also be an IPV6 address, in which case IPV6 networking will be  used.   In
              this  case,  the full address must be specified and there is no automatic selection
              of the network interface within a specific subnet as with IPv4.

              If IPv6 networking is used, the nodeid field must be specified.

       broadcast
              This is optional and can be set to yes.  If it is set to yes, the broadcast address
              will  be  used  for  communication.  If this option is set, mcastaddr should not be
              set.

       mcastaddr
              This is the multicast address used by corosync executive.  The default should  work
              for  most  networks,  but  the  network  administrator  should  be  queried about a
              multicast address to use.  Avoid 224.x.x.x because this  is  a  "config"  multicast
              address.

              This  may  also be an IPV6 multicast address, in which case IPV6 networking will be
              used.  If IPv6 networking is used, the nodeid field must be specified.

       mcastport
              This specifies the UDP port number.  It is  possible  to  use  the  same  multicast
              address on a network with the corosync services configured for different UDP ports.
              Please note corosync  uses  two  UDP  ports  mcastport  (for  mcast  receives)  and
              mcastport - 1 (for mcast sends).  If you have multiple clusters on the same network
              using the same mcastaddr please configure the mcastports with a gap.

       ttl    This specifies the Time To Live (TTL). If you run your cluster on a routed  network
              then  the  default of "1" will be too small. This option provides a way to increase
              this up to 255. The valid range is  0..255.   Note  that  this  is  only  valid  on
              multicast transport types.

       member This  specifies  a  member  on the interface and used with the udpu transport only.
              Every node that should be a member of the  membership  should  be  specified  as  a
              separate  member  directive.   Within  the  member  directive  there is a parameter
              memberaddr which specifies the ip address of one of the nodes.

       Within the totem directive,  there  are  seven  configuration  options  of  which  one  is
       required,  five are optional, and one is required when IPV6 is configured in the interface
       subdirective.  The required directive controls the version  of  the  totem  configuration.
       The  optional option unless using IPV6 directive controls identification of the processor.
       The optional options control secrecy  and  authentication,  the  redundant  ring  mode  of
       operation, maximum network MTU, and number of sending threads, and the nodeid field.

       version
              This  specifies  the  version  of the configuration file.  Currently the only valid
              version for this directive is 2.

       nodeid This configuration option is optional when using IPv4 and required when using IPv6.
              This  is  a  32  bit  value specifying the node identifier delivered to the cluster
              membership service.  If this is not specified  with  IPv4,  the  node  id  will  be
              determined  from the 32 bit IP address the system to which the system is bound with
              ring identifier of 0.  The node identifier value of zero is reserved and should not
              be used.

       clear_node_high_bit
              This  configuration  option  is  optional  and  is  only relevant when no nodeid is
              specified.  Some openais clients require a signed 32 bit  nodeid  that  is  greater
              than  zero  however  by  default openais uses all 32 bits of the IPv4 address space
              when generating a nodeid.  Set this option to yes to force the high bit to be  zero
              and therefor ensure the nodeid is a positive signed 32 bit integer.

              WARNING:  The  clusters  behavior  is undefined if this option is enabled on only a
              subset of the cluster (for example during a rolling upgrade).

       secauth
              This specifies that HMAC/SHA1 authentication should be  used  to  authenticate  all
              messages.  It further specifies that all data should be encrypted with the sober128
              encryption algorithm to protect data from eavesdropping.

              Enabling this option adds a 36 byte header to every message  sent  by  totem  which
              reduces  total throughput.  Encryption and authentication consume 75% of CPU cycles
              in aisexec as measured with gprof when enabled.

              For 100mbit networks with 1500 MTU frame transmissions: A throughput of 9mb/sec  is
              possible  with  100%  cpu  utilization when this option is enabled on 3ghz cpus.  A
              throughput of 10mb/sec is possible wth 20%  cpu  utilization  when  this  optin  is
              disabled on 3ghz cpus.

              For  gig-e  networks  with  large  frame transmissions: A throughput of 20mb/sec is
              possible when this option is enabled on 3ghz cpus.  A  throughput  of  60mb/sec  is
              possible when this option is disabled on 3ghz cpus.

              The default is on.

       rrp_mode
              This  specifies  the mode of redundant ring, which may be none, active, or passive.
              Active replication offers slightly lower  latency  from  transmit  to  delivery  in
              faulty  network  environments  but  with less performance.  Passive replication may
              nearly double the speed of the totem protocol if the protocol  doesn't  become  cpu
              bound.   The final option is none, in which case only one network interface will be
              used to operate the totem protocol.

              If only one interface directive is specified, none  is  automatically  chosen.   If
              multiple interface directives are specified, only active or passive may be chosen.

       netmtu This  specifies  the network maximum transmit unit.  To set this value beyond 1500,
              the regular frame MTU, requires ethernet devices that support large, or also called
              jumbo,  frames.   If  any  device  in the network doesn't support large frames, the
              protocol will not operate properly.  The hosts must also have their  mtu  size  set
              from 1500 to whatever frame size is specified here.

              Please  note  while  some  NICs or switches claim large frame support, they support
              9000 MTU as the maximum frame size including the IP header.  Setting the netmtu and
              host  MTUs  to 9000 will cause totem to use the full 9000 bytes of the frame.  Then
              Linux will add a 18 byte header moving the full frame size to 9018.   As  a  result
              some  hardware  will not operate properly with this size of data.  A netmtu of 8982
              seems to work for the  few  large  frame  devices  that  have  been  tested.   Some
              manufacturers  claim  large  frame support when in fact they support frame sizes of
              4500 bytes.

              Increasing the MTU from 1500 to 8982 doubles throughput performance  from  30MB/sec
              to  60MB/sec  as  measured with evsbench with 175000 byte messages with the secauth
              directive set to off.

              When sending multicast traffic, if the network frequently reconfigures, chances are
              that some device in the network doesn't support large frames.

              Choose hardware carefully if intending to use large frame support.

              The default is 1500.

       threads
              This  directive  controls  how  many threads are used to encrypt and send multicast
              messages.  If secauth is off, the protocol will never  use  threaded  sending.   If
              secauth  is  on,  this  directive  allows  systems to be configured to use multiple
              threads to encrypt and send multicast messages.

              A thread directive of 0 indicates that no threaded send should be used.  This  mode
              offers best performance for non-SMP systems.

              The default is 0.

       vsftype
              This  directive  controls  the  virtual  synchrony  filter  type used to identify a
              primary component.  The preferred choice is YKD dynamic linear voting, however, for
              clusters  larger  then  32  nodes  YKD  consumes  alot  of memory.  For large scale
              clusters that are created by changing the MAX_PROCESSORS_COUNT  #define  in  the  C
              code  totem.h file, the virtual synchrony filter "none" is recommended but then AMF
              and DLCK services (which are currently experimental) are not safe for use.

              The default is ykd.  The vsftype can also be set to none.

       transport
              This directive controls the transport mechanism used.  If the  interface  to  which
              corosync  is  binding  is  an RDMA interface such as RoCEE or Infiniband, the "iba"
              parameter may be specified.  To avoid the use  of  multicast  entirely,  a  unicast
              transport  parameter "udpu" can be specified.  This requires specifying the list of
              members that could potentially make up the membership before deployment.

              The default is udp.  The transport type can also be set to udpu or iba.

              Within the totem directive, there are several configuration options which are  used
              to  control  the  operation  of  the  protocol.  It is generally not recommended to
              change any of these values without proper guidance and  sufficient  testing.   Some
              networks  may  require  larger  values if suffering from frequent reconfigurations.
              Some applications may require faster failure detection times which can be  achieved
              by reducing the token timeout.

       token  This  timeout  specifies  in  milliseconds until a token loss is declared after not
              receiving a token.  This is the time spent detecting a failure of  a  processor  in
              the   current   configuration.   Reforming  a  new  configuration  takes  about  50
              milliseconds in addition to this timeout.

              The default is 1000 milliseconds.

       token_retransmit
              This timeout specifies in milliseconds after how long before receiving a token  the
              token  is  retransmitted.   This  will  be  automatically  calculated  if  token is
              modified.  It is not recommended to alter this  value  without  guidance  from  the
              corosync community.

              The default is 238 milliseconds.

       hold   This  timeout  specifies  in  milliseconds how long the token should be held by the
              representative when the protocol is under low utilization.   It is not  recommended
              to alter this value without guidance from the corosync community.

              The default is 180 milliseconds.

       token_retransmits_before_loss_const
              This value identifies how many token retransmits should be attempted before forming
              a  new  configuration.   If  this  value  is  set,  retransmit  and  hold  will  be
              automatically calculated from retransmits_before_loss and token.

              The default is 4 retransmissions.

       join   This  timeout  specifies  in milliseconds how long to wait for join messages in the
              membership protocol.

              The default is 50 milliseconds.

       send_join
              This timeout specifies in milliseconds an upper range between 0  and  send_join  to
              wait  before  sending  a join message.  For configurations with less then 32 nodes,
              this parameter is not necessary.  For larger rings, this parameter is necessary  to
              ensure  the NIC is not overflowed with join messages on formation of a new ring.  A
              reasonable value for large rings (128 nodes) would be 80msec.  Other  timer  values
              must  also  change if this value is changed.  Seek advice from the corosync mailing
              list if trying to run larger configurations.

              The default is 0 milliseconds.

       consensus
              This timeout specifies in milliseconds  how  long  to  wait  for  consensus  to  be
              achieved  before  starting  a  new  round of membership configuration.  The minimum
              value for consensus must  be  1.2  *  token.   This  value  will  be  automatically
              calculated at 1.2 * token if the user doesn't specify a consensus value.

              For two node clusters, a consensus larger then the join timeout but less then token
              is safe.  For three node or larger clusters, consensus should be larger then token.
              There is an increasing risk of odd membership changes, which stil guarantee virtual
              synchrony,  as node count grows if consensus is less than token.

              The default is 1200 milliseconds.

       merge  This timeout specifies in milliseconds how long  to  wait  before  checking  for  a
              partition  when  no multicast traffic is being sent.  If multicast traffic is being
              sent, the merge detection happens automatically as a function of the protocol.

              The default is 200 milliseconds.

       downcheck
              This timeout specifies in milliseconds how long to  wait  before  checking  that  a
              network interface is back up after it has been downed.

              The default is 1000 millseconds.

       fail_recv_const
              This  constant  specifies  how many rotations of the token without receiving any of
              the messages when messages should be received may occur before a new  configuration
              is formed.

              The default is 2500 failures to receive a message.

       seqno_unchanged_const
              This  constant  specifies  how  many  rotations  of the token without any multicast
              traffic should occur before the merge detection timeout is started.

              The default is 30 rotations.

       heartbeat_failures_allowed
              [HeartBeating mechanism] Configures the optional HeartBeating mechanism for  faster
              failure  detection.  Keep  in  mind  that engaging this mechanism in lossy networks
              could cause faulty loss declaration as the mechanism  relies  on  the  network  for
              heartbeating.

              So  as a rule of thumb use this mechanism if you require improved failure in low to
              medium utilized networks.

              This constant specifies the number of heartbeat failures the system should tolerate
              before  declaring  heartbeat  failure  e.g 3. Also if this value is not set or is 0
              then the heartbeat mechanism is not engaged in the system and token rotation is the
              method of failure detection

              The default is 0 (disabled).

       max_network_delay
              [HeartBeating  mechanism]  This  constant specifies in milliseconds the approximate
              delay that your network takes to transport one packet from one machine to  another.
              This  value  is to be set by system engineers and please dont change if not sure as
              this effects the failure detection mechanism using heartbeat.

              The default is 50 milliseconds.

       window_size
              This constant specifies the maximum number of messages that  may  be  sent  on  one
              token  rotation.  If all processors perform equally well, this value could be large
              (300), which would introduce higher latency from origination to delivery  for  very
              large  rings.   To  reduce  latency  in  large  rings(16+), the defaults are a safe
              compromise.  If 1 or more slow processor(s)  are  present  among  fast  processors,
              window_size  should  be  no  larger  then  256000 / netmtu to avoid overflow of the
              kernel receive buffers.  The  user  is  notified  of  this  by  the  display  of  a
              retransmit  list  in  the  notification  logs.   There  is  no  loss  of  data, but
              performance is reduced when these errors occur.

              The default is 50 messages.

       max_messages
              This constant specifies the maximum number of messages that  may  be  sent  by  one
              processor on receipt of the token.  The max_messages parameter is limited to 256000
              / netmtu to prevent overflow of the kernel transmit buffers.

              The default is 17 messages.

       miss_count_const
              This constant defines the maximum number of times on receipt of a token  a  message
              is  checked  for  retransmission before a retransmission occurs.  This parameter is
              useful to modify for switches that delay  multicast  packets  compared  to  unicast
              packets.  The default setting works well for nearly all modern switches.

              The default is 5 messages.

       rrp_problem_count_timeout
              This  specifies  the  time  in milliseconds to wait before decrementing the problem
              count by 1 for a particular ring  to  ensure  a  link  is  not  marked  faulty  for
              transient network failures.

              The default is 2000 milliseconds.

       rrp_problem_count_threshold
              This specifies the number of times a problem is detected with a link before setting
              the link faulty.  Once a link is set faulty, no more data is transmitted  upon  it.
              Also,  the  problem counter is no longer decremented when the problem count timeout
              expires.

              A problem is detected whenever all tokens from the proceeding  processor  have  not
              been       received       within      the      rrp_token_expired_timeout.       The
              rrp_problem_count_threshold  *  rrp_token_expired_timeout  should  be  atleast   50
              milliseconds less then the token timeout, or a complete reconfiguration may occur.

              The default is 10 problem counts.

       rrp_problem_count_mcast_threshold
              This  specifies  the  number  of  times a problem is detected with multicast before
              setting the link faulty for passive rrp mode. This variable is unused in active rrp
              mode.

              The default is 10 times rrp_problem_count_threshold.

       rrp_token_expired_timeout
              This  specifies  the  time in milliseconds to increment the problem counter for the
              redundant ring protocol after not having received a token  from  all  rings  for  a
              particular processor.

              This   value   will   automatically  be  calculated  from  the  token  timeout  and
              problem_count_threshold but may be overridden.  It is not recommended  to  override
              this value without guidance from the corosync community.

              The default is 47 milliseconds.

       rrp_autorecovery_check_timeout
              This  specifies  the  time in milliseconds to check if the failed ring can be auto-
              recovered.

              The default is 1000 milliseconds.

       Within the logging directive, there  are  several  configuration  options  which  are  all
       optional.

       The following 3 options are valid only for the top level logging directive:

       timestamp
              This specifies that a timestamp is placed on all log messages.

              The default is off.

       fileline
              This specifies that file and line should be printed.

              The default is off.

       function_name
              This specifies that the code function name should be printed.

              The default is off.

       The  following  options  are  valid  both  for top level logging directive and they can be
       overriden in logger_subsys entries.

       to_stderr

       to_logfile

       to_syslog
              These specify the destination of logging output. Any combination of  these  options
              may be specified. Valid options are yes and no.

              The default is syslog and stderr.

              Please  note,  if  you  are  using  to_logfile  and  want  to  rotate the file, use
              logrotate(8) with the option copytruncate.  eg.

              /var/log/corosync.log {
                  missingok
                  compress
                  notifempty
                  daily
                  rotate 7
                  copytruncate
              }

       logfile
              If the to_logfile directive is set to yes , this option specifies the  pathname  of
              the log file.

              No default.

       logfile_priority
              This specifies the logfile priority for this particular subsystem. Ignored if debug
              is on.  Possible values are: alert, crit, debug (same as debug = on),  emerg,  err,
              info, notice, warning.

              The default is: info.

       syslog_facility
              This  specifies the syslog facility type that will be used for any messages sent to
              syslog. options are daemon, local0, local1, local2, local3, local4, local5,  local6
              & local7.

              The default is daemon.

       syslog_priority
              This  specifies the syslog level for this particular subsystem. Ignored if debug is
              on.  Possible values are: alert, crit, debug (same as  debug  =  on),  emerg,  err,
              info, notice, warning.

              The default is: info.

       debug  This specifies whether debug output is logged for this particular logger.

              The default is off.

       tags   This  specifies  which tags should be traced for this particular logger.  Set debug
              directive to on in order to enable tracing using tags.  Values are specified  using
              a vertical bar as a logical OR separator:

              enter|leave|trace1|trace2|trace3|...

              The default is none.

       Within the logging directive, logger_subsys directives are optional.

       Within the logger_subsys sub-directive, all of the above logging configuration options are
       valid and can be used to override the  default  settings.   The  subsys  entry,  described
       below, is mandatory to identify the subsystem.

       subsys This  specifies  the subsystem identity (name) for which logging is specified. This
              is the name used by a service in the log_init () call. E.g. 'CKPT'. This  directive
              is required.

FILES

       /etc/corosync.conf
              The corosync executive configuration file.

SEE ALSO

       corosync_overview(8), logrotate(8)