Provided by: corosync-qdevice_3.0.3-2_amd64 bug

NAME

       corosync-qdevice - QDevice daemon

SYNOPSIS

       corosync-qdevice [-dfh] [-S option=value[,option2=value2,...]]

DESCRIPTION

       corosync-qdevice  is a daemon running on each node of a cluster. It provides a configured number of votes
       to the quorum subsystem based on a third-party arbitrator's decision. Its  primary  use  is  to  allow  a
       cluster  to  sustain more node failures than standard quorum rules allow.  It is recommended for clusters
       with an even number of nodes and highly recommended for 2 node clusters.

OPTIONS

       -d     Forcefully turn on debug information without the need to change corosync.conf.  For bumping syslog
              messages priority to info, use this parameter twice.

       -f     Do not daemonize, run in the foreground.

       -h     Show short help text

       -S     Set  advanced settings described in its own section below. This option shouldn't be generally used
              because most of the options are not safe to change.

CONFIGURATION

       corosync-qdevice reads its configuration from corosync.conf file.

       The main configuration is within quorum.device sub-key. Each model also has its own configuration  within
       a similarly named sub-key.

       model  Specifies  the  model  to be used. This parameter is required.  corosync-qdevice is modular and is
              able to support multiple different models. The model basically defines what type of arbitrator  is
              used. Currently only net is supported.

       timeout
              Specifies  how often corosync-qdevice should call the votequorum_qdevice_poll function. It is also
              used by the net model to adjust its hearbeat timeout. It is recommended that you don't change this
              value.  Default is 10000.

       sync_timeout
              Specifies  how  often  corosync-qdevice  should call the votequorum_qdevice_poll function during a
              sync phase. It is recommended that you don't change this value.  Default is 30000.

       votes  The number of votes provided to the cluster by  qdevice.  Default  is  (number_of_nodes  -  1)  or
              generally sum(votes_per_node) - 1.

       quorum.device.heuristics subkey holds the configuration of the heuristics. Heuristics are set of commands
       executed locally on  startup,  cluster  membership  change,  successful  connect  to  corosync-qnetd  and
       optionally  also  at  regular  times.  Commands  are  executed  in  parallel.   When  all commands finish
       successfully (their return error code is zero) on time,  heuristics  have  passed,  otherwise  they  have
       failed.  The heuristics result is sent to corosync-qnetd and there it's used in calculations to determine
       which partition should be quorate.

       timeout
              Specifies maximum time in  milliseconds  how  long  corosync-qdevice  waits  till  the  heuristics
              commands  finish.  If  some  command doesn't finish before the timeout, it's killed and heuristics
              fail. This timeout is used for heuristics executed at regular times.  Default value is half of the
              quorum.device.timeout, so 5000.

       sync_timeout
              Similar  to  quorum.device.heuristics.timeout but used during membership changes. Default value is
              half of the quorum.device.sync_timeout, so 15000.

       interval
              Specifies  interval  between  two  regular  heuristics   execution.   Default   value   is   3   *
              quorum.device.timeout, so 30000.

       mode   Can be one of on, sync or off and specifies mode of operation of heuristics. Default is off, which
              means heuristics are disabled. When sync is set, heuristics  are  executed  only  during  startup,
              membership  change and when connection to corosync-qnetd is established. When heuristics should be
              running also on regular basis, this option should be set to on value.

       exec_NAME
              defines executables.  NAME can be arbitrary valid cmap key name  string  and  it  has  no  special
              meaning.   The  value  of  this  variable  must  contain a command to execute. The value is parsed
              (split) into arguments similarly as Bourne shell would do. Quoting is possible by using  backslash
              and double quotes.

       quorum.device.net subkey holds the configuration for model net.

       tls    Can be one of on, off or required and specifies if tls should be used.  on means a connection with
              TLS is attempted first, but if the server doesn't advertise TLS support then non-TLS will be used.
              off is used then TLS is not required and it's then not even tried. This mode is the only one which
              doesn't need a properly initialized NSS database.  required means  TLS  is  required  and  if  the
              server doesn't support TLS, qdevice will exit with error message. Default is on.

       host   Specifies the IP address or host name of the qnetd server to be used. This parameter is required.

       port   Specifies TCP port of qnetd server. Default is 5403.

       algorithm
              Decision algorithm. Can be one of the ffsplit or lms.  (actually there are also test and 2nodelms,
              both of which are mainly for developers and shouldn't be used for  production  clusters).   For  a
              description  of  what  each  algorithm  means  and  how the algorithms differ see their individual
              sections.  Default value is ffsplit.

       tie_breaker
              can be one of lowest, highest or valid_node_id (number) values. It's used as a fallback if qdevice
              has  to  decide  between two or more equal partitions.  lowest means the partition with the lowest
              node id is chosen.  highest means the partition with highest node id is chosen. And  valid_node_id
              means that the partition containing the node with the given node id is chosen.  Default is lowest.

       connect_timeout
              Timeout  when  corosync-qdevice  is  trying  to  connect  to corosync-qnetd host. Default is 0.8 *
              quorum.device.timeout.

       force_ip_version
              can be one of 0|4|6 and forces the software to use the given IP version.  0 (default value)  means
              IPv6 is preferred and IPv4 should be used as a fallback.

       keep_active_partition_tie_breaker
              Can  be  one  of on or off and specifies if keep active partition tie breaker should be used. When
              this option is enabled and tie happens QNetd will prefer  partition  with  members  of  previously
              active  (quorate) partition.  This is hard-coded behavior of LMS algorithm so this setting affects
              only FFSplit algorithm.  Default is on.

       Logging configuration is within the logging directive.  corosync-qdevice parses and supports  only  debug
       option. The logger_subsys sub-directive can be also used if subsys is set to QDEVICE.

       For  corosync-qdevice  to  work correctly, the nodelist directive has to be used and properly configured.
       Also the net model requires that totem.cluster_name option is set.

MODEL NET TLS CONFIGURATION

       For model net to work using TLS, it's necessary to create the NSS database, import Qnetd CA  certificate,
       and get/distribute a valid client certificate.

       If pcs is used (recommended) the following steps are not needed because pcs does them automatically.

       corosync-qdevice-net-certutil  is the tool to perform required actions semi-automatically. Please consult
       the help output of it and its man page. For a first time configuration it may make sense  to  start  with
       the -Q option.

       If TLS is not required just edit corosync.conf file and set quorum.device.net.tls to off.

       Depending  on  configuration of NSS (stored in nss.config file usually in /etc/crypto-policies/back-ends/
       directory) disabled ciphers or too short keys may be rejected.  Proper  solution  is  to  regenerate  NSS
       databases  for both corosync-qnetd and corosync-qdevice daemons. As a quick workaround it's also possible
       to set environment variable NSS_IGNORE_SYSTEM_POLICY=1 before running corosync-qdevice daemon.

       When NSS is updated it may also be needed to upgrade database into new format. There is no  consensus  on
       recommended way, but following command seems to work just fine (if qdevice sysconfdir is set to /etc)

       # certutil -N -d /etc/corosync/qdevice/net/nssdb -f /etc/corosync/qdevice/net/nssdb/pwdfile.txt

MODEL NET ALGORITHMS

       Algorithms  are  used  to change behavior of how corosync-qnetd provides votes to a given node/partition.
       Currently there are two algorithms supported.

       ffsplit
              This one makes sense only for clusters with an even number of nodes. It provides exactly one  vote
              to  the  partition  with  the  highest  number  of  active nodes. If there are two exactly similar
              partitions, it provides its vote to the partition with higher score.  The  score  is  computed  as
              (number_of_connected_nodes        +       number_of_connected_nodes_with_passed_heuristics       -
              number_of_connected_nodes_with_failed_heuristics) If the scores are equal, the vote is provided to
              partition  with the most clients connected to the qnetd server. If this number is also equal, then
              the tie_breaker is used. It is able to transition its  vote  if  the  currently  active  partition
              becomes partitioned and a non-active partition still has at least 50% of the active nodes. Because
              of this, a vote is not provided if the qnetd connection is not active.

              To use this algorithm it's required to set the number of votes per node to  1  (default)  and  the
              qdevice  number  of votes has to be also 1. This is achieved by setting quorum.device.votes key in
              corosync.conf file to 1.

       lms    Last-man-standing. If the node is the only one left in the cluster that can see the  qnetd  server
              then we return a vote.

              If  more  than  one  node  can  see  the qnetd server but some nodes can't see each other then the
              cluster is divided up into 'partitions' based on their ring_id and this algorithm returns  a  vote
              to  the  partition  with  highest  heuristics  score  (computed  the  same  way as for the ffsplit
              algorithm), or if there is more than 1 partition with equal scores, the largest  active  partition
              or,  if  there  is  more  than 1 equal partition, the partition that contains the tie_breaker node
              (lowest, highest, etc). For LMS to work, the number of qdevice votes has to be set to default  (so
              just delete quorum.device.votes key from corosync.conf).

ADVANCED SETTINGS

       Set  by  using  -S option. The default value is shown in parentheses)  Options beginning with net_ prefix
       are specific to model net.

       lock_file
              Lock file location. (/var/run/corosync-qdevice/corosync-qdevice.pid)

       local_socket_file
              Internal IPC socket file location. (/var/run/corosync-qdevice/corosync-qdevice.sock)

       local_socket_backlog
              Parameter passed to listen syscall. (10)

       max_cs_try_again
              How many times to retry the call to a corosync function which has returned CS_ERR_TRY_AGAIN. (10)

       votequorum_device_name
              Name used for qdevice registration. (Qdevice)

       ipc_max_clients
              Maximum allowed simultaneous IPC clients. (10)

       ipc_max_receive_size
              Maximum size of a message received by IPC client. (4096)

       ipc_max_send_size
              Maximum size of a message allowed to be sent to an IPC client. (65536)

       master_wins
              Force enable/disable master wins. (default is model)

       heuristics_ipc_max_send_buffers
              Maximum number of heuristics worker send buffers. (128)

       heuristics_ipc_max_send_receive_size
              Maximum size of a message allowed to be send to, or received from heuristics worker. (4096)

       heuristics_min_timeout
              Minimum heuristics timeout accepted by client in ms. (1000)

       heuristics_max_timeout
              Maximum heuristics timeout accepted by client in ms. (120000)

       heuristics_min_interval
              Minimum heuristics interval accepted by client in ms. (1000)

       heuristics_max_interval
              Maximum heuristics interval accepted by client in ms. (3600000)

       heuristics_max_execs
              Maximum number of exec_ commands. (32)

       heuristics_use_execvp
              Use execvp instead of execv for executing commands. (off)

       heuristics_max_processes
              Maximum number of processes running at one time. (160)

       heuristics_kill_list_interval
              Interval between status is gathered and eventually  signal  is  sent  to  processes  which  didn't
              finished on time in ms. (5000)

       net_nss_db_dir
              NSS database directory. (/etc/corosync/qdevice/net/nssdb)

       net_initial_msg_receive_size
              Initial  (used  during  connection  parameters negotiation) maximum size of the receive buffer for
              message (maximum allowed message size received from qnetd). (32768)

       net_initial_msg_send_size
              Initial (used during connection parameter negotiation) maximum size of one send  buffer  (message)
              to be sent to server. (32768)

       net_min_msg_send_size
              Minimum required size of one send buffer (message) to be sent to server. (32768)

       net_max_msg_receive_size
              Maximum allowed size of receive buffer for a message sent by server. (16777216)

       net_max_send_buffers
              Maximum number of send buffers. (10)

       net_nss_qnetd_cn
              Canonical name of qnetd server certificate. (Qnetd Server)

       net_nss_client_cert_nickname
              NSS nickname of qdevice client certificate. (Cluster Cert)

       net_heartbeat_interval_min
              Minimum heartbeat timeout accepted by client in ms. (1000)

       net_heartbeat_interval_max
              Maximum heartbeat timeout accepted by client in ms. (120000)

       net_min_connect_timeout
              Minimum connection timeout accepted by client in ms. (1000)

       net_max_connect_timeout
              Maximum connection timeout accepted by client in ms. (120000)

       net_test_algorithm_enabled
              Enable test algorithm. (if built with --enable-debug on, otherwise off)

EXAMPLE

       Define  qdevice  with  net  model  connecting  to  qnetd running on qnetd.example.org host, using ffsplit
       algorithm.  Heuristics is set to sync mode and executes two commands.

       quorum {
         provider: corosync_votequorum
         device {
           votes: 1
           model: net
           net {
             tls: on
             host: qnetd.example.org
             algorithm: ffsplit
           }
           heuristics {
             mode: sync
             exec_ping: /bin/ping -q -c 1 "www.example.org"
             exec_test_txt_exists: /usr/bin/test -f /tmp/test.txt
           }
       }

SEE ALSO

       corosync-qdevice-tool(8)     corosync-qdevice-net-certutil(8)     corosync-qnetd(8)      corosync.conf(5)
       votequorum_qdevice_poll(3)

AUTHOR

       Jan Friesse

                                                   2020-10-27                                COROSYNC-QDEVICE(8)