For an introduction to the pg_autoctl commands relevant to
the pg_auto_failover Keeper configuration, please see pg_autoctl
config.
An example configuration file looks like the following:
[pg_autoctl]
role = keeper
monitor = postgres://autoctl_node@192.168.1.34:6000/pg_auto_failover
formation = default
group = 0
hostname = node1.db
nodekind = standalone
[postgresql]
pgdata = /data/pgsql/
pg_ctl = /usr/pgsql-10/bin/pg_ctl
dbname = postgres
host = /tmp
port = 5000
[replication]
slot = pgautofailover_standby
maximum_backup_rate = 100M
backup_directory = /data/backup/node1.db
[timeout]
network_partition_timeout = 20
postgresql_restart_failure_timeout = 20
postgresql_restart_failure_max_retries = 3
To output, edit and check entries of the configuration, the
following commands are provided:
pg_autoctl config check [--pgdata <pgdata>]
pg_autoctl config get [--pgdata <pgdata>] section.option
pg_autoctl config set [--pgdata <pgdata>] section.option value
The [postgresql] section is discovered automatically by the
pg_autoctl command and is not intended to be changed manually.
pg_autoctl.monitor
PostgreSQL service URL of the pg_auto_failover monitor, as given
in the output of the pg_autoctl show uri command.
pg_autoctl.formation
A single pg_auto_failover monitor may handle several postgres
formations. The default formation name default is usually fine.
pg_autoctl.group
This information is retrieved by the pg_auto_failover keeper when
registering a node to the monitor, and should not be changed afterwards. Use
at your own risk.
pg_autoctl.hostname
Node hostname used by all the other nodes in the cluster to
contact this node. In particular, if this node is a primary then its standby
uses that address to setup streaming replication.
replication.slot
Name of the PostgreSQL replication slot used in the streaming
replication setup automatically deployed by pg_auto_failover. Replication
slots can't be renamed in PostgreSQL.
replication.maximum_backup_rate
When pg_auto_failover (re-)builds a standby node using the
pg_basebackup command, this parameter is given to
pg_basebackup to throttle the network bandwidth used. Defaults to
100Mbps.
replication.backup_directory
When pg_auto_failover (re-)builds a standby node using the
pg_basebackup command, this parameter is the target directory where
to copy the bits from the primary server. When the copy has been successful,
then the directory is renamed to postgresql.pgdata.
The default value is computed from
${PGDATA}/../backup/${hostname} and can be set to any value of your
preference. Remember that the directory renaming is an atomic operation only
when both the source and the target of the copy are in the same filesystem,
at least in Unix systems.
timeout
This section allows to setup the behavior of the pg_auto_failover
keeper in interesting scenarios.
timeout.network_partition_timeout
Timeout in seconds before we consider failure to communicate with
other nodes indicates a network partition. This check is only done on a
PRIMARY server, so other nodes mean both the monitor and the standby.
When a PRIMARY node is detected to be on the losing side of a
network partition, the pg_auto_failover keeper enters the DEMOTE state and
stops the PostgreSQL instance in order to protect against split brain
situations.
The default is 20s.
timeout.postgresql_restart_failure_timeout
timeout.postgresql_restart_failure_max_retries
When PostgreSQL is not running, the first thing the
pg_auto_failover keeper does is try to restart it. In case of a transient
failure (e.g. file system is full, or other dynamic OS resource constraint),
the best course of action is to try again for a little while before reaching
out to the monitor and ask for a failover.
The pg_auto_failover keeper tries to restart PostgreSQL
timeout.postgresql_restart_failure_max_retries times in a row
(default 3) or up to timeout.postgresql_restart_failure_timeout
(defaults 20s) since it detected that PostgreSQL is not running, whichever
comes first.