Ubuntu Manpage: pg_auto_failover - pg_auto

Provided by: pg-auto-failover-cli_1.6.4-1_amd64

NAME

       pg_auto_failover - pg_auto_failover Documentation

INTRODUCTION TO PG_AUTO_FAILOVER

pg_auto_failover is an extension for PostgreSQL that monitors and manages failover for a
postgres clusters. It is optimised for simplicity and correctness.

Single Standby Architecture
[image: pg_auto_failover Architecture with a primary and a standby node] [image]
pg_auto_failover architecture with a primary and a standby node.UNINDENT

pg_auto_failover implements Business Continuity for your PostgreSQL services.
pg_auto_failover implements a single PostgreSQL service using multiple nodes with
automated failover, and automates PostgreSQL maintenance operations in a way that
guarantees availability of the service to its users and applications.

To that end, pg_auto_failover uses three nodes (machines, servers) per PostgreSQL
service:

• a PostgreSQL primary node,

• a PostgreSQL secondary node, using Synchronous Hot Standby,

• a pg_auto_failover Monitor node that acts both as a witness and an orchestrator.

The pg_auto_failover Monitor implements a state machine and relies on in-core PostgreSQL
facilities to deliver HA. For example. when the secondary node is detected to be
unavailable, or when its lag is reported above a defined threshold (the default is 1 WAL
files, or 16MB, see the pgautofailover.promote_wal_log_threshold GUC on the
pg_auto_failover monitor), then the Monitor removes it from the synchronous_standby_names
setting on the primary node. Until the secondary is back to being monitored healthy,
failover and switchover operations are not allowed, preventing data loss.

Multiple Standby Architecture
[image: pg_auto_failover Architecture for a standalone PostgreSQL service] [image]
pg_auto_failover architecture with a primary and two standby nodes.UNINDENT

In the pictured architecture, pg_auto_failover implements Business Continuity and data
availability by implementing a single PostgreSQL service using multiple with automated
failover and data redundancy. Even after losing any Postgres node in a production
system, this architecture maintains two copies of the data on two different nodes.

When using more than one standby, different architectures can be achieved with
pg_auto_failover, depending on the objectives and trade-offs needed for your production
setup.

Multiple Standbys Architecture with 3 standby nodes, one async
[image: pg_auto_failover architecture with a primary and three standby nodes] [image]
pg_auto_failover architecture with a primary and three standby nodes.UNINDENT

When setting the three parameters above, it's possible to design very different Postgres
architectures for your production needs.

In this case, the system is setup with two standby nodes participating in the
replication quorum, allowing for number_sync_standbys = 1. The system always maintains a
minimum of two copies of the data set: one on the primary, another one on one on either
node B or node D. Whenever we lose one of those nodes, we can hold to this guarantee of
two copies of the data set.

Adding to that, we have the standby server C which has been set up to not participate in
the replication quorum. Node C will not be found in the synchronous_standby_names list
of nodes. Also, node C is set up in a way to never be a candidate for failover, with
candidate-priority = 0.

This architecture would fit a situation where nodes A, B, and D are deployed in the same
data center or availability zone, and node C in another. Those three nodes are set up
to support the main production traffic and implement high availability of both the
Postgres service and the data set.

Node C might be set up for Business Continuity in case the first data center is lost, or
maybe for reporting the need for deployment on another application domain.

MAIN PG_AUTOCTL COMMANDS

       pg_auto_failover includes the command line tool pg_autoctl that implements  many  commands
       to  manage  your Postgres nodes. To implement the Postgres architectures described in this
       documentation, and more, it is generally possible to use only some of the many  pg_autoctl
       commands.

       This  section  of  the documentation is a short introduction to the main commands that are
       useful when getting started with pg_auto_failover. More commands are  available  and  help
       deal with a variety of situations, see the Manual Pages for the whole list.

       To  understand  which  replication  settings  to use in your case, see Architecture Basics
       section and then the Multi-node Architectures section.

       To follow a step by step guide that you can reproduce on your own Azure  subscription  and
       create a production Postgres setup from VMs, see the pg_auto_failover Tutorial section.

       To  understand how to setup pg_auto_failover in a way that is compliant with your internal
       security guide lines, read the Security settings for pg_auto_failover section.

   Command line environment, configuration files, etc
       As a command line tool pg_autoctl depends on some environment variables.  Mostly, the tool
       re-uses the Postgres environment variables that you might already know.

       To  manage  a  Postgres  node  pg_auto_failover  needs to know its data directory location
       on-disk. For that, some users will find it easier to export the PGDATA variable  in  their
       environment.  The  alternative  consists  of  always  using  the  --pgdata  option that is
       available to all the pg_autoctl commands.

   Creating Postgres Nodes
       To get started with the  simplest  Postgres  failover  setup,  3  nodes  are  needed:  the
       pg_auto_failover  monitor,  and  2  Postgres  nodes  that  will  get assigned roles by the
       monitor. One Postgres node will be assigned the primary  role,  the  other  one  will  get
       assigned the secondary role.

       To create the monitor use the command:

          $ pg_autoctl create monitor

       The create the Postgres nodes use the following command on each node you want to create:

          $ pg_autoctl create postgres

       While  those  create  commands  initialize  your  nodes,  now you have to actually run the
       Postgres service that are expected to be running.  For  that  you  can  manually  run  the
       following command on every node:

          $ pg_autoctl run

       It  is  also  possible (and recommended) to integrate the pg_auto_failover service in your
       usual service management facility. When using systemd the following commands can  be  used
       to produce the unit file configuration required:

          $ pg_autoctl show systemd
          INFO  HINT: to complete a systemd integration, run the following commands:
          INFO  pg_autoctl -q show systemd --pgdata "/tmp/pgaf/m" | sudo tee /etc/systemd/system/pgautofailover.service
          INFO  sudo systemctl daemon-reload
          INFO  sudo systemctl enable pgautofailover
          INFO  sudo systemctl start pgautofailover
          [Unit]
          ...

       While  it  is  expected  that for a production deployment each node actually is a separate
       machine (virtual or physical, or even a container), it is also  possible  to  run  several
       Postgres nodes all on the same machine for testing or development purposes.

       TIP:
          When  running  several pg_autoctl nodes on the same machine for testing or contributing
          to pg_auto_failover, each Postgres instance needs to run on its own port, and with  its
          own  data  directory.  It  can make things easier to then set the environment variables
          PGDATA and PGPORT in each terminal, shell, or tab where each instance is started.

   Inspecting nodes
       Once your Postgres nodes have been created, and once each pg_autoctl service  is  running,
       it is possible to inspect the current state of the formation with the following command:

          $ pg_autoctl show state

       The  pg_autoctl  show  state  commands  outputs the current state of the system only once.
       Sometimes it would be nice to have an auto-updated display  such  as  provided  by  common
       tools  such  as  watch(1)  or  top(1)  and  the like. For that, the following commands are
       available (see also pg_autoctl watch):

          $ pg_autoctl watch
          $ pg_autoctl show state --watch

       To analyze what's been happening to get to the current state, it is possible to review the
       past events generated by the pg_auto_failover monitor with the following command:

          $ pg_autoctl show events

       HINT:
          The  pg_autoctl  show  commands can be run from any node in your system.  Those command
          need to connect to the monitor and print the current state or the current known list of
          events as per the monitor view of the system.

          Use  pg_autoctl  show  state  --local to have a view of the local state of a given node
          without connecting to the monitor Postgres instance.

          The option --json is available in most pg_autoctl  commands  and  switches  the  output
          format  from  a  human  readable  table  form to a program friendly JSON pretty-printed
          output.

   Inspecting and Editing Replication Settings
       When  creating  a  node  it  is  possible  to  use  the   --candidate-priority   and   the
       --replication-quorum  options to set the replication properties as required by your choice
       of Postgres architecture.

       To review the current replication settings of a formation, use one of  the  two  following
       commands, which are convenient aliases (the same command with two ways to invoke it):

          $ pg_autoctl show settings
          $ pg_autoctl get formation settings

       It is also possible to edit those replication settings at any time while your nodes are in
       production: you can change your mind or adjust to new elements without having to re-deploy
       everything. Just use the following commands to adjust the replication settings on the fly:

          $ pg_autoctl set formation number-sync-standbys
          $ pg_autoctl set node replication-quorum
          $ pg_autoctl set node candidate-priority

       IMPORTANT:
          The  pg_autoctl  get and pg_autoctl set commands always connect to the monitor Postgres
          instance.

          The  pg_autoctl  set  command  then  changes  the  replication  settings  on  the  node
          registration  on  the monitor. Then the monitor assigns the APPLY_SETTINGS state to the
          current primary node in the system for it to apply the new replication settings to  its
          Postgres streaming replication setup.

          As  a  result,  the pg_autoctl set commands requires a stable state in the system to be
          allowed to proceed. Namely, the current primary node in the system must have  both  its
          Current  State  and its Assigned State set to primary, as per the pg_autoctl show state
          output.

   Implementing Maintenance Operations
       When a Postgres node must be taken offline for a maintenance operation,  such  as  e.g.  a
       kernel  security  upgrade  or  a  minor Postgres update, it is best to make it so that the
       pg_auto_failover monitor knows about it.

          • For one thing, a node that is known to be in  maintenance  does  not  participate  in
            failovers.  If  you are running with two Postgres nodes, then failover operations are
            entirely prevented while the standby node is in maintenance.

          • Moreover, depending on  your  replication  settings,  enabling  maintenance  on  your
            standby  ensures  that the primary node switches to async replication before Postgres
            is shut down on the secondary, avoiding write queries to be blocked.

       To implement maintenance operations, use the following commands:

          $ pg_autoctl enable maintenance
          $ pg_autoctl disable maintenance

       The main pg_autoctl run service that is expected to be running in  the  background  should
       continue to run during the whole maintenance operation.  When a node is in the maintenance
       state, the pg_autoctl service is not controlling the Postgres service anymore.

       Note that it is possible to enable maintenance  on  a  primary  Postgres  node,  and  that
       operation   then   requires   a   failover  to  happen  first.  It  is  possible  to  have
       pg_auto_failover orchestrate that for you when using the command:

          $ pg_autoctl enable maintenance --allow-failover

       IMPORTANT:
          The pg_autoctl enable and pg_autoctl disable commands requires a stable  state  in  the
          system  to  be  allowed to proceed. Namely, the current primary node in the system must
          have both its Current State  and  its  Assigned  State  set  to  primary,  as  per  the
          pg_autoctl show state output.

   Manual failover, switchover, and promotions
       In  the  cases  when  a  failover  is  needed  without  having an actual node failure, the
       pg_auto_failover monitor can be  used  to  orchestrate  the  operation.  Use  one  of  the
       following commands, which are synonyms in the pg_auto_failover design:

          $ pg_autoctl perform failover
          $ pg_autoctl perform switchover

       Finally,  it  is  also  possible  to “elect” a new primary node in your formation with the
       command:

          $ pg_autoctl perform promotion

       IMPORTANT:
          The pg_autoctl perform commands requires a stable state in the system to be allowed  to
          proceed.  Namely,  the  current  primary  node in the system must have both its Current
          State and its Assigned State set to primary, as per the pg_autoctl show state output.

   What's next?
       This section of the documentation is meant to help users get started by  focusing  on  the
       main  commands  of  the  pg_autoctl tool. Each command has many options that can have very
       small impact, or pretty big impact in terms of security or architecture. Read the rest  of
       the  manual  to  understand  how to best use the many pg_autoctl options to implement your
       specific Postgres production architecture.

PG_AUTO_FAILOVER TUTORIAL

       In  this  guide  we’ll  create  a  primary  and  secondary  Postgres  node  and   set   up
       pg_auto_failover  to  replicate  data  between them. We’ll simulate failure in the primary
       node and see how the system smoothly switches (fails over) to the secondary.

       For illustration, we'll run our databases on virtual machines in the Azure  platform,  but
       the  techniques  here  are relevant to any cloud provider or on-premise network. We'll use
       four virtual machines: a primary  database,  a  secondary  database,  a  monitor,  and  an
       "application."  The  monitor  watches  the  other nodes’ health, manages global state, and
       assigns nodes their roles.

   Create virtual network
       Our database machines need to talk to each other and to the monitor node, so let's  create
       a virtual network.

          az group create \
              --name ha-demo \
              --location eastus

          az network vnet create \
              --resource-group ha-demo \
              --name ha-demo-net \
              --address-prefix 10.0.0.0/16

       We  need  to  open  ports 5432 (Postgres) and 22 (SSH) between the machines, and also give
       ourselves access from our remote IP. We'll do this with a network  security  group  and  a
       subnet.

          az network nsg create \
              --resource-group ha-demo \
              --name ha-demo-nsg

          az network nsg rule create \
              --resource-group ha-demo \
              --nsg-name ha-demo-nsg \
              --name ha-demo-ssh-and-pg \
              --access allow \
              --protocol Tcp \
              --direction Inbound \
              --priority 100 \
              --source-address-prefixes `curl ifconfig.me` 10.0.1.0/24 \
              --source-port-range "*" \
              --destination-address-prefix "*" \
              --destination-port-ranges 22 5432

          az network vnet subnet create \
              --resource-group ha-demo \
              --vnet-name ha-demo-net \
              --name ha-demo-subnet \
              --address-prefixes 10.0.1.0/24 \
              --network-security-group ha-demo-nsg

       Finally   add   four   virtual   machines   (ha-demo-a,  ha-demo-b,  ha-demo-monitor,  and
       ha-demo-app). For speed we background the az vm create processes and run them in parallel:

          # create VMs in parallel
          for node in monitor a b app
          do
          az vm create \
              --resource-group ha-demo \
              --name ha-demo-${node} \
              --vnet-name ha-demo-net \
              --subnet ha-demo-subnet \
              --nsg ha-demo-nsg \
              --public-ip-address ha-demo-${node}-ip \
              --image debian \
              --admin-username ha-admin \
              --generate-ssh-keys &
          done
          wait

       To make it easier to SSH into these VMs in future steps, let's make a  shell  function  to
       retrieve their IP addresses:

          # run this in your local shell as well

          vm_ip () {
            az vm list-ip-addresses -g ha-demo -n ha-demo-$1 -o tsv \
              --query '[] [] .virtualMachine.network.publicIpAddresses[0].ipAddress'
          }

          # for convenience with ssh

          for node in monitor a b app
          do
          ssh-keyscan -H `vm_ip $node` >> ~/.ssh/known_hosts
          done

       Let's review what we created so far.

          az resource list --output table --query \
            "[?resourceGroup=='ha-demo'].{ name: name, flavor: kind, resourceType: type, region: location }"

       This shows the following resources:

          Name                             ResourceType                                           Region
          -------------------------------  -----------------------------------------------------  --------
          ha-demo-a                        Microsoft.Compute/virtualMachines                      eastus
          ha-demo-app                      Microsoft.Compute/virtualMachines                      eastus
          ha-demo-b                        Microsoft.Compute/virtualMachines                      eastus
          ha-demo-monitor                  Microsoft.Compute/virtualMachines                      eastus
          ha-demo-appVMNic                 Microsoft.Network/networkInterfaces                    eastus
          ha-demo-aVMNic                   Microsoft.Network/networkInterfaces                    eastus
          ha-demo-bVMNic                   Microsoft.Network/networkInterfaces                    eastus
          ha-demo-monitorVMNic             Microsoft.Network/networkInterfaces                    eastus
          ha-demo-nsg                      Microsoft.Network/networkSecurityGroups                eastus
          ha-demo-a-ip                     Microsoft.Network/publicIPAddresses                    eastus
          ha-demo-app-ip                   Microsoft.Network/publicIPAddresses                    eastus
          ha-demo-b-ip                     Microsoft.Network/publicIPAddresses                    eastus
          ha-demo-monitor-ip               Microsoft.Network/publicIPAddresses                    eastus
          ha-demo-net                      Microsoft.Network/virtualNetworks                      eastus

   Install the pg_autoctl executable
       This guide uses Debian Linux, but similar steps will work on other distributions. All that
       differs are the packages and paths. See Installing pg_auto_failover.

       The pg_auto_failover system is distributed as a single pg_autoctl binary with  subcommands
       to  initialize  and manage a replicated PostgreSQL service.  We’ll install the binary with
       the operating system package manager on all  nodes.  It  will  help  us  run  and  observe
       PostgreSQL.

          for node in monitor a b app
          do
          az vm run-command invoke \
             --resource-group ha-demo \
             --name ha-demo-${node} \
             --command-id RunShellScript \
             --scripts \
                "sudo touch /home/ha-admin/.hushlogin" \
                "curl https://install.citusdata.com/community/deb.sh | sudo bash" \
                "sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y postgresql-common" \
                "echo 'create_main_cluster = false' | sudo tee -a /etc/postgresql-common/createcluster.conf" \
                "sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y postgresql-11-auto-failover-1.4" \
                "sudo usermod -a -G postgres ha-admin" &
          done
          wait

   Run a monitor
       The  pg_auto_failover  monitor  is the first component to run. It periodically attempts to
       contact the other nodes and watches their health. It  also  maintains  global  state  that
       “keepers” on each node consult to determine their own roles in the system.

          # on the monitor virtual machine

          ssh -l ha-admin `vm_ip monitor` -- \
            pg_autoctl create monitor \
              --auth trust \
              --ssl-self-signed \
              --pgdata monitor \
              --pgctl /usr/lib/postgresql/11/bin/pg_ctl

       This  command  initializes  a  PostgreSQL  cluster at the location pointed by the --pgdata
       option. When --pgdata is omitted,  pg_autoctl  attempts  to  use  the  PGDATA  environment
       variable. If a PostgreSQL instance had already existing in the destination directory, this
       command would have configured it to serve as a monitor.

       pg_auto_failover, installs the pgautofailover Postgres extension, and grants access  to  a
       new autoctl_node user.

       In  the  Quick Start we use --auth trust to avoid complex security settings.  The Postgres
       trust  authentication  method  is  not  considered  a  reasonable  choice  for  production
       environments.  Consider  either using the --skip-pg-hba option or --auth scram-sha-256 and
       then setting up passwords yourself.

       At this point the monitor is created. Now we'll install it as a service  with  systemd  so
       that it will resume if the VM restarts.

          ssh -T -l ha-admin `vm_ip monitor` << CMD
            pg_autoctl -q show systemd --pgdata ~ha-admin/monitor > pgautofailover.service
            sudo mv pgautofailover.service /etc/systemd/system
            sudo systemctl daemon-reload
            sudo systemctl enable pgautofailover
            sudo systemctl start pgautofailover
          CMD

   Bring up the nodes
       We’ll create the primary database using the pg_autoctl create subcommand.

          ssh -l ha-admin `vm_ip a` -- \
            pg_autoctl create postgres \
              --pgdata ha \
              --auth trust \
              --ssl-self-signed \
              --username ha-admin \
              --dbname appdb \
              --hostname ha-demo-a.internal.cloudapp.net \
              --pgctl /usr/lib/postgresql/11/bin/pg_ctl \
              --monitor 'postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net/pg_auto_failover?sslmode=require'

       Notice  the  user  and  database  name  in the monitor connection string -- these are what
       monitor init created. We also give it the path to pg_ctl so that the keeper will  use  the
       correct  version  of  pg_ctl in future even if other versions of postgres are installed on
       the system.

       In the example above, the keeper creates a primary database. It chooses to set up  node  A
       as primary because the monitor reports there are no other nodes in the system yet. This is
       one example of how the keeper is state-based: it makes observations and then  adjusts  its
       state, in this case from "init" to "single."

       Also add a setting to trust connections from our "application" VM:

          ssh -T -l ha-admin `vm_ip a` << CMD
            echo 'hostssl "appdb" "ha-admin" ha-demo-app.internal.cloudapp.net trust' \
              >> ~ha-admin/ha/pg_hba.conf
          CMD

       At  this  point  the monitor and primary node are created and running. Next we need to run
       the keeper. It’s an independent process so that it can  continue  operating  even  if  the
       PostgreSQL process goes terminates on the node. We'll install it as a service with systemd
       so that it will resume if the VM restarts.

          ssh -T -l ha-admin `vm_ip a` << CMD
            pg_autoctl -q show systemd --pgdata ~ha-admin/ha > pgautofailover.service
            sudo mv pgautofailover.service /etc/systemd/system
            sudo systemctl daemon-reload
            sudo systemctl enable pgautofailover
            sudo systemctl start pgautofailover
          CMD

       Next connect to node B and do the same process. We'll do both steps at once:

          ssh -l ha-admin `vm_ip b` -- \
            pg_autoctl create postgres \
              --pgdata ha \
              --auth trust \
              --ssl-self-signed \
              --username ha-admin \
              --dbname appdb \
              --hostname ha-demo-b.internal.cloudapp.net \
              --pgctl /usr/lib/postgresql/11/bin/pg_ctl \
              --monitor 'postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net/pg_auto_failover?sslmode=require'

          ssh -T -l ha-admin `vm_ip b` << CMD
            pg_autoctl -q show systemd --pgdata ~ha-admin/ha > pgautofailover.service
            sudo mv pgautofailover.service /etc/systemd/system
            sudo systemctl daemon-reload
            sudo systemctl enable pgautofailover
            sudo systemctl start pgautofailover
          CMD

       It discovers from the monitor that a primary exists, and then switches its own state to be
       a hot standby and begins streaming WAL contents from the primary.

   Node communication
       For  convenience,  pg_autoctl  modifies each node's pg_hba.conf file to allow the nodes to
       connect to one another. For instance, pg_autoctl added the following lines to node A:

          # automatically added to node A

          hostssl "appdb" "ha-admin" ha-demo-a.internal.cloudapp.net trust
          hostssl replication "pgautofailover_replicator" ha-demo-b.internal.cloudapp.net trust
          hostssl "appdb" "pgautofailover_replicator" ha-demo-b.internal.cloudapp.net trust

       For pg_hba.conf on the monitor node pg_autoctl inspects the local network  and  makes  its
       best guess about the subnet to allow. In our case it guessed correctly:

          # automatically added to the monitor

          hostssl "pg_auto_failover" "autoctl_node" 10.0.1.0/24 trust

       If  worker nodes have more ad-hoc addresses and are not in the same subnet, it's better to
       disable pg_autoctl's automatic modification of pg_hba using the --skip-pg-hba command line
       option  during  creation.  You will then need to edit the hba file by hand. Another reason
       for manual edits would be to use special authentication methods.

   Watch the replication
       First let’s verify that the monitor knows about our nodes, and  see  what  states  it  has
       assigned them:

          ssh -l ha-admin `vm_ip monitor` pg_autoctl show state --pgdata monitor

            Name |  Node |                            Host:Port |       LSN | Reachable |       Current State |      Assigned State
          -------+-------+--------------------------------------+-----------+-----------+---------------------+--------------------
          node_1 |     1 | ha-demo-a.internal.cloudapp.net:5432 | 0/3000060 |       yes |             primary |             primary
          node_2 |     2 | ha-demo-b.internal.cloudapp.net:5432 | 0/3000060 |       yes |           secondary |           secondary

       This looks good. We can add data to the primary, and later see it appear in the secondary.
       We'll connect to the database from inside our "app" virtual machine,  using  a  connection
       string obtained from the monitor.

          ssh -l ha-admin `vm_ip monitor` pg_autoctl show uri --pgdata monitor

                Type |    Name | Connection String
          -----------+---------+-------------------------------
             monitor | monitor | postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net:5432/pg_auto_failover?sslmode=require
           formation | default | postgres://ha-demo-b.internal.cloudapp.net:5432,ha-demo-a.internal.cloudapp.net:5432/appdb?target_session_attrs=read-write&sslmode=require

       Now we'll get the connection string and store it in a local environment variable:

          APP_DB_URI=$( \
            ssh -l ha-admin `vm_ip monitor` \
              pg_autoctl show uri --formation default --pgdata monitor \
          )

       The  connection  string  contains  both  our  nodes, comma separated, and includes the url
       parameter ?target_session_attrs=read-write  telling  psql  that  we  want  to  connect  to
       whichever of these servers supports reads and writes.  That will be the primary server.

          # connect to database via psql on the app vm and
          # create a table with a million rows
          ssh -l ha-admin -t `vm_ip app` -- \
            psql "'$APP_DB_URI'" \
              -c "'CREATE TABLE foo AS SELECT generate_series(1,1000000) bar;'"

   Cause a failover
       Now  that  we've  added  data  to node A, let's switch which is considered the primary and
       which the secondary. After the switch we'll connect again and query the  data,  this  time
       from node B.

          # initiate failover to node B
          ssh -l ha-admin -t `vm_ip monitor` \
            pg_autoctl perform switchover --pgdata monitor

       Once  node  B  is  marked "primary" (or "wait_primary") we can connect and verify that the
       data is still present:

          # connect to database via psql on the app vm
          ssh -l ha-admin -t `vm_ip app` -- \
            psql "'$APP_DB_URI'" \
              -c "'SELECT count(*) FROM foo;'"

       It shows

            count
          ---------
           1000000

   Cause a node failure
       This plot is too boring, time to introduce a  problem.  We’ll  turn  off  VM  for  node  B
       (currently the primary after our previous failover) and watch node A get promoted.

       In one terminal let’s keep an eye on events:

          ssh -t -l ha-admin `vm_ip monitor` -- \
            watch -n 1 -d pg_autoctl show state --pgdata monitor

       In another terminal we’ll turn off the virtual server.

          az vm stop \
            --resource-group ha-demo \
            --name ha-demo-b

       After  a  number  of failed attempts to talk to node B, the monitor determines the node is
       unhealthy and puts it into the "demoted" state.  The monitor promotes node A to be the new
       primary.

            Name |  Node |                            Host:Port |       LSN | Reachable |       Current State |      Assigned State
          -------+-------+--------------------------------------+-----------+-----------+---------------------+--------------------
          node_1 |     1 | ha-demo-a.internal.cloudapp.net:5432 | 0/6D4E068 |       yes |        wait_primary |        wait_primary
          node_2 |     2 | ha-demo-b.internal.cloudapp.net:5432 | 0/6D4E000 |       yes |             demoted |          catchingup

       Node  A  cannot be considered in full "primary" state since there is no secondary present,
       but it can still serve client requests. It is marked as "wait_primary" until  a  secondary
       appears, to indicate that it's running without a backup.

       Let's add some data while B is offline.

          # notice how $APP_DB_URI continues to work no matter which node
          # is serving as primary
          ssh -l ha-admin -t `vm_ip app` -- \
            psql "'$APP_DB_URI'" \
              -c "'INSERT INTO foo SELECT generate_series(1000001, 2000000);'"

   Resurrect node B
       Run this command to bring node B back online:

          az vm start \
            --resource-group ha-demo \
            --name ha-demo-b

       Now  the  next  time the keeper retries its health check, it brings the node back.  Node B
       goes through the state "catchingup" while it updates its data  to  match  A.  Once  that's
       done, B becomes a secondary, and A is now a full primary again.

            Name |  Node |                            Host:Port |        LSN | Reachable |       Current State |      Assigned State
          -------+-------+--------------------------------------+------------+-----------+---------------------+--------------------
          node_1 |     1 | ha-demo-a.internal.cloudapp.net:5432 | 0/12000738 |       yes |             primary |             primary
          node_2 |     2 | ha-demo-b.internal.cloudapp.net:5432 | 0/12000738 |       yes |           secondary |           secondary

       What's  more, if we connect directly to the database again, all two million rows are still
       present.

          ssh -l ha-admin -t `vm_ip app` -- \
            psql "'$APP_DB_URI'" \
              -c "'SELECT count(*) FROM foo;'"

       It shows

            count
          ---------
           2000000

ARCHITECTURE BASICS

pg_auto_failover is designed as a simple and robust way to manage automated Postgres
failover in production. On-top of robust operations, pg_auto_failover setup is flexible
and allows either Business Continuity or High Availability configurations.
pg_auto_failover design includes configuration changes in a live system without downtime.

pg_auto_failover is designed to be able to handle a single PostgreSQL service using three
nodes. In this setting, the system is resilient to losing any one of three nodes.
[image: pg_auto_failover Architecture for a standalone PostgreSQL service] [image]
pg_auto_failover Architecture for a standalone PostgreSQL service.UNINDENT

It is important to understand that when using only two Postgres nodes then
pg_auto_failover is optimized for Business Continuity. In the event of losing a single
node, pg_auto_failover is capable of continuing the PostgreSQL service, and prevents any
data loss when doing so, thanks to PostgreSQL Synchronous Replication.

That said, there is a trade-off involved in this architecture. The business continuity
bias relaxes replication guarantees for asynchronous replication in the event of a
standby node failure. This allows the PostgreSQL service to accept writes when there's a
single server available, and opens the service for potential data loss if the primary
server were also to fail.

The pg_auto_failover Monitor
Each PostgreSQL node in pg_auto_failover runs a Keeper process which informs a central
Monitor node about notable local changes. Some changes require the Monitor to orchestrate
a correction across the cluster:

• New nodes

At initialization time, it's necessary to prepare the configuration of each node for
PostgreSQL streaming replication, and get the cluster to converge to the nominal
state with both a primary and a secondary node in each group. The monitor determines
each new node's role

• Node failure

The monitor orchestrates a failover when it detects an unhealthy node. The design of
pg_auto_failover allows the monitor to shut down service to a previously designated
primary node without causing a "split-brain" situation.

The monitor is the authoritative node that manages global state and makes changes in the
cluster by issuing commands to the nodes' keeper processes. A pg_auto_failover monitor
node failure has limited impact on the system. While it prevents reacting to other nodes'
failures, it does not affect replication. The PostgreSQL streaming replication setup
installed by pg_auto_failover does not depend on having the monitor up and running.

pg_auto_failover Glossary
pg_auto_failover handles a single PostgreSQL service with the following concepts:

Monitor
The pg_auto_failover monitor is a service that keeps track of one or several formations
containing groups of nodes.

The monitor is implemented as a PostgreSQL extension, so when you run the command
pg_autoctl create monitor a PostgreSQL instance is initialized, configured with the
extension, and started. The monitor service embeds a PostgreSQL instance.

Formation
A formation is a logical set of PostgreSQL services that are managed together.

It is possible to operate many formations with a single monitor instance. Each formation
has a group of Postgres nodes and the FSM orchestration implemented by the monitor applies
separately to each group.

Group
A group of two PostgreSQL nodes work together to provide a single PostgreSQL service in a
Highly Available fashion. A group consists of a PostgreSQL primary server and a secondary
server setup with Hot Standby synchronous replication. Note that pg_auto_failover can
orchestrate the whole setting-up of the replication for you.

In pg_auto_failover versions up to 1.3, a single Postgres group can contain only two
Postgres nodes. Starting with pg_auto_failover 1.4, there's no limit to the number of
Postgres nodes in a single group. Note that each Postgres instance that belongs to the
same group serves the same dataset in its data directory (PGDATA).

NOTE:
The notion of a formation that contains multiple groups in pg_auto_failover is useful
when setting up and managing a whole Citus formation, where the coordinator nodes
belong to group zero of the formation, and each Citus worker node becomes its own group
and may have Postgres standby nodes.

Keeper
The pg_auto_failover keeper is an agent that must be running on the same server where your
PostgreSQL nodes are running. The keeper controls the local PostgreSQL instance (using
both the pg_ctl command-line tool and SQL queries), and communicates with the monitor:

• it sends updated data about the local node, such as the WAL delta in between servers,
measured via PostgreSQL statistics views.

• it receives state assignments from the monitor.

Also the keeper maintains local state that includes the most recent communication
established with the monitor and the other PostgreSQL node of its group, enabling it to
detect Network Partitions.

NOTE:
In pg_auto_failover versions up to and including 1.3, the keeper process started with
pg_autoctl run manages a separate Postgres instance, running as its own process tree.

Starting in pg_auto_failover version 1.4, the keeper process (started with pg_autoctl
run) runs the Postgres instance as a sub-process of the main pg_autoctl process,
allowing tighter control over the Postgres execution. Running the sub-process also
makes the solution work better both in container environments (because it's now a
single process tree) and with systemd, because it uses a specific cgroup per service
unit.

Node
A node is a server (virtual or physical) that runs PostgreSQL instances and a keeper
service. At any given time, any node might be a primary or a secondary Postgres instance.
The whole point of pg_auto_failover is to decide this state.

As a result, refrain from naming your nodes with the role you intend for them. Their
roles can change. If they didn't, your system wouldn't need pg_auto_failover!

State
A state is the representation of the per-instance and per-group situation. The monitor
and the keeper implement a Finite State Machine to drive operations in the PostgreSQL
groups; allowing pg_auto_failover to implement High Availability with the goal of zero
data loss.

The keeper main loop enforces the current expected state of the local PostgreSQL instance,
and reports the current state and some more information to the monitor. The monitor uses
this set of information and its own health-check information to drive the State Machine
and assign a goal state to the keeper.

The keeper implements the transitions between a current state and a monitor-assigned goal
state.

Client-side HA
Implementing client-side High Availability is included in PostgreSQL's driver libpq from
version 10 onward. Using this driver, it is possible to specify multiple host names or IP
addresses in the same connection string:

$ psql -d "postgresql://host1,host2/dbname?target_session_attrs=read-write"
$ psql -d "postgresql://host1:port2,host2:port2/dbname?target_session_attrs=read-write"
$ psql -d "host=host1,host2 port=port1,port2 target_session_attrs=read-write"

When using either of the syntax above, the psql application attempts to connect to host1,
and when successfully connected, checks the target_session_attrs as per the PostgreSQL
documentation of it:
If this parameter is set to read-write, only a connection in which read-write
transactions are accepted by default is considered acceptable. The query SHOW
transaction_read_only will be sent upon any successful connection; if it returns on,
the connection will be closed. If multiple hosts were specified in the connection
string, any remaining servers will be tried just as if the connection attempt had
failed. The default value of this parameter, any, regards all connections as
acceptable.

When the connection attempt to host1 fails, or when the target_session_attrs can not be
verified, then the psql application attempts to connect to host2.

The behavior is implemented in the connection library libpq, so any application using it
can benefit from this implementation, not just psql.

When using pg_auto_failover, configure your application connection string to use the
primary and the secondary server host names, and set target_session_attrs=read-write too,
so that your application automatically connects to the current primary, even after a
failover occurred.

Monitoring protocol
The monitor interacts with the data nodes in 2 ways:

• Data nodes periodically connect and run SELECT pgautofailover.node_active(...) to
communicate their current state and obtain their goal state.

• The monitor periodically connects to all the data nodes to see if they are healthy,
doing the equivalent of pg_isready.

When a data node calls node_active, the state of the node is stored in the
pgautofailover.node table and the state machines of both nodes are progressed. The state
machines are described later in this readme. The monitor typically only moves one state
forward and waits for the node(s) to converge except in failure states.

If a node is not communicating to the monitor, it will either cause a failover (if node is
a primary), disabling synchronous replication (if node is a secondary), or cause the state
machine to pause until the node comes back (other cases). In most cases, the latter is
harmless, though in some cases it may cause downtime to last longer, e.g. if a standby
goes down during a failover.

To simplify operations, a node is only considered unhealthy if the monitor cannot connect
and it hasn't reported its state through node_active for a while. This allows, for
example, PostgreSQL to be restarted without causing a health check failure.

Synchronous vs. asynchronous replication
By default, pg_auto_failover uses synchronous replication, which means all writes block
until at least one standby node has reported receiving them. To handle cases in which the
standby fails, the primary switches between two states called wait_primary and primary
based on the health of standby nodes, and based on the replication setting
number_sync_standby.

When in the wait_primary state, synchronous replication is disabled by automatically
setting synchronous_standby_names = '' to allow writes to proceed. However doing so also
disables failover, since the standby might get arbitrarily far behind. If the standby is
responding to health checks and within 1 WAL segment of the primary (by default),
synchronous replication is enabled again on the primary by setting
synchronous_standby_names = '*' which may cause a short latency spike since writes will
then block until the standby has caught up.

When using several standby nodes with replication quorum enabled, the actual setting for
synchronous_standby_names is set to a list of those standby nodes that are set to
participate to the replication quorum.

If you wish to disable synchronous replication, you need to add the following to
postgresql.conf:

synchronous_commit = 'local'

This ensures that writes return as soon as they are committed on the primary -- under all
circumstances. In that case, failover might lead to some data loss, but failover is not
initiated if the secondary is more than 10 WAL segments (by default) behind on the
primary. During a manual failover, the standby will continue accepting writes from the old
primary. The standby will stop accepting writes only if it's fully caught up (most
common), the primary fails, or it does not receive writes for 2 minutes.

A note about performance
In some cases the performance impact on write latency when setting synchronous replication
makes the application fail to deliver expected performance. If testing or production
feedback shows this to be the case, it is beneficial to switch to using asynchronous
replication.

The way to use asynchronous replication in pg_auto_failover is to change the
synchronous_commit setting. This setting can be set per transaction, per session, or per
user. It does not have to be set globally on your Postgres instance.

One way to benefit from that would be:

alter role fast_and_loose set synchronous_commit to local;

That way performance-critical parts of the application don't have to wait for the standby
nodes. Only use this when you can also lower your data durability guarantees.

Node recovery
When bringing a node back after a failover, the keeper (pg_autoctl run) can simply be
restarted. It will also restart postgres if needed and obtain its goal state from the
monitor. If the failed node was a primary and was demoted, it will learn this from the
monitor. Once the node reports, it is allowed to come back as a standby by running
pg_rewind. If it is too far behind, the node performs a new pg_basebackup.

MULTI-NODE ARCHITECTURES

       Pg_auto_failover allows you to have more  than  one  standby  node,  and  offers  advanced
       control over your production architecture characteristics.

   Architectures with two standby nodes
       When  adding  your  second  standby  node  with  default  settings,  you get the following
       architecture:
         [image: pg_auto_failover architecture with two standby nodes]  [image]  pg_auto_failover
         architecture with two standby nodes.UNINDENT

         In  this  case,  three  nodes get set up with the same characteristics, achieving HA for
         both the Postgres service and the production dataset.  An  important  setting  for  this
         architecture is number_sync_standbys.

         The  replication  setting  number_sync_standbys  sets how many standby nodes the primary
         should wait for when committing a transaction. In order to have a good  availability  in
         your   system,   pg_auto_failover   requires  number_sync_standbys  +  1  standby  nodes
         participating in the replication quorum: this allows any standby node  to  fail  without
         impact on the system's ability to respect the replication quorum.

         When  only  two  nodes  are registered in a group on the monitor we have a primary and a
         single secondary node. Then number_sync_standbys can only be set to zero. When adding  a
         second  standby  node  to  a  pg_auto_failover  group,  then  the  monitor automatically
         increments number_sync_standbys to one, as we see in the diagram above.

         When number_sync_standbys is set to zero then pg_auto_failover implements  the  Business
         Continuity setup as seen in Architecture Basics: synchronous replication is then used as
         a way to guarantee that failover can be implemented without data loss.

         In more details:

          1. With number_sync_standbys set to one, this architecture always maintains two  copies
             of  the  dataset:  one on the current primary node (node A in the previous diagram),
             and one on the standby that acknowledges the transaction first  (either  node  B  or
             node C in the diagram).

             When  one  of  the  standby nodes is unavailable, the second copy of the dataset can
             still be maintained thanks to the remaining standby.

             When both the standby nodes  are  unavailable,  then  it's  no  longer  possible  to
             guarantee  the  replication  quorum, and thus writes on the primary are blocked. The
             Postgres primary node waits  until  at  least  one  standby  node  acknowledges  the
             transactions locally committed, thus degrading your Postgres service to read-only.

          0.  It  is possible to manually set number_sync_standbys to zero when having registered
              two standby nodes to the monitor, overriding the default behavior.

              In that case, when the second standby node becomes unhealthy at the  same  time  as
              the  first  standby  node,  the primary node is assigned the state Wait_primary. In
              that  state,  synchronous  replication  is  disabled  on  the  primary  by  setting
              synchronous_standby_names  to  an  empty string. Writes are allowed on the primary,
              even though there's no extra copy of the production dataset available at this time.

              Setting number_sync_standbys to zero allows data  to  be  written  even  when  both
              standby  nodes  are down. In this case, a single copy of the production data set is
              kept and, if the primary was then to fail, some data will be lost. How much depends
              on your backup and recovery mechanisms.

   Replication Settings and Postgres Architectures
       The  entire  flexibility  of  pg_auto_failover  can  be leveraged with the following three
       replication settings:

          • Number of sync stanbys

          • Replication quorum

          • Candidate priority

   Number Sync Standbys
       This  parameter  is  used  by  Postgres  in   the   synchronous_standby_names   parameter:
       number_sync_standby  is  the number of synchronous standbys for whose replies transactions
       must wait.

       This parameter can be set at the formation level  in  pg_auto_failover,  meaning  that  it
       applies  to the current primary, and "follows" a failover to apply to any new primary that
       might replace the current one.

       To set this parameter to the value <n>, use the following command:

          pg_autoctl set formation number-sync-standbys <n>

       The default value in pg_auto_failover is zero. When set to zero,  the  Postgres  parameter
       synchronous_standby_names can be set to either '*' or to '':

       • synchronous_standby_names   =  '*'  means  that  any  standby  may  participate  in  the
         replication quorum for transactions with synchronous_commit set to on or higher values.

         pg_autofailover uses synchronous_standby_names = '*' when there's at least  one  standby
         that is known to be healthy.

       • synchronous_standby_names  =  ''  (empty string) disables synchrous commit and makes all
         your  commits  asynchronous,  meaning  that  transaction  commits  will  not  wait   for
         replication.  In  other  words, a single copy of your production data is maintained when
         synchronous_standby_names is set that way.

         pg_autofailover uses synchronous_standby_names = '' only  when  number_sync_standbys  is
         set to zero and there's no standby node known healthy by the monitor.

       In  order  to set number_sync_standbys to a non-zero value, pg_auto_failover requires that
       at least number_sync_standbys + 1 standby nodes be registered in the system.

       When the first standby node is added to the pg_auto_failover monitor, the only  acceptable
       value  for  number_sync_standbys is zero. When a second standby is added that participates
       in the replication quorum, then number_sync_standbys is automatically set to one.

       The command pg_autoctl set formation number-sync-standbys can be used to change the  value
       of  this  parameter  in  a  formation,  even  when  all  the  nodes are already running in
       production. The pg_auto_failover monitor then sets a transition for the primary to  update
       its local value of synchronous_standby_names.

   Replication Quorum
       The replication quorum setting is a boolean and defaults to true, and can be set per-node.
       Pg_auto_failover  includes  a  given  node  in  synchronous_standby_names  only  when  the
       replication  quorum  parameter  has  been  set  to  true.  This  means  that  asynchronous
       replication will be used for nodes where replication-quorum is set to false.

       It is possible to force asynchronous replication globally by setting replication quorum to
       false  on  all  the nodes in a formation. Remember that failovers will happen, and thus to
       set your replication settings on the current primary node too when needed: it is going  to
       be a standby later.

       To set this parameter to either true or false, use one of the following commands:

          pg_autoctl set node replication-quorum true
          pg_autoctl set node replication-quorum false

   Candidate Priority
       The candidate priority setting is an integer that can be set to any value between 0 (zero)
       and 100 (one hundred). The default value is 50. When the pg_auto_failover monitor  decides
       to  orchestrate a failover, it uses each node's candidate priority to pick the new primary
       node.

       When setting the candidate priority of a node down  to  zero,  this  node  will  never  be
       selected to be promoted as the new primary when a failover is orchestrated by the monitor.
       The monitor will instead wait until another node registered is healthy and in  a  position
       to be promoted.

       To set this parameter to the value <n>, use the following command:

          pg_autoctl set node candidate-priority <n>

       When  nodes  have the same candidate priority, the monitor then picks the standby with the
       most advanced LSN position published to the monitor. When more than one node has published
       the same LSN position, a random one is chosen.

       When  the  candidate  for failover has not published the most advanced LSN position in the
       WAL, pg_auto_failover orchestrates an intermediate step in  the  failover  mechanism.  The
       candidate fetches the missing WAL bytes from one of the standby with the most advanced LSN
       position prior to being promoted. Postgres  allows  this  operation  thanks  to  cascading
       replication: any standby can be the upstream node for another standby.

       It  is required at all times that at least two nodes have a non-zero candidate priority in
       any pg_auto_failover formation. Otherwise no failover is possible.

   Auditing replication settings
       The command pg_autoctl get formation settings (also known as pg_autoctl show settings) can
       be  used  to  obtain  a  summary  of all the replication settings currently in effect in a
       formation. Still using the first diagram on this page, we get the following summary:

          $ pg_autoctl get formation settings
            Context |    Name |                   Setting | Value
          ----------+---------+---------------------------+-------------------------------------------------------------
          formation | default |      number_sync_standbys | 1
            primary |  node_A | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_3, pgautofailover_standby_2)'
               node |  node_A |        replication quorum | true
               node |  node_B |        replication quorum | true
               node |  node_C |        replication quorum | true
               node |  node_A |        candidate priority | 50
               node |  node_B |        candidate priority | 50
               node |  node_C |        candidate priority | 50

       We can see that the number_sync_standbys has been used to compute the current value of the
       synchronous_standby_names setting on the primary.

       Because  all the nodes in that example have the same default candidate priority (50), then
       pg_auto_failover is using the form ANY 1 with the list of standby nodes that are currently
       participating in the replication quorum.

       The  entries in the synchronous_standby_names list are meant to match the application_name
       connection setting used in the primary_conninfo, and the format used  by  pg_auto_failover
       there  is  the  format string "pgautofailover_standby_%d" where %d is replaced by the node
       id. This allows keeping the same connection string to the primary when the  node  name  is
       changed (using the command pg_autoctl set metadata --name).

       Here we can see the node id of each registered Postgres node with the following command:

          $ pg_autoctl show state
            Name |  Node |      Host:Port |       LSN | Reachable |       Current State |      Assigned State
          -------+-------+----------------+-----------+-----------+---------------------+--------------------
          node_A |     1 | localhost:5001 | 0/7002310 |       yes |             primary |             primary
          node_B |     2 | localhost:5002 | 0/7002310 |       yes |           secondary |           secondary
          node_C |     3 | localhost:5003 | 0/7002310 |       yes |           secondary |           secondary

       When  setting  pg_auto_failover  with  per formation number_sync_standby and then per node
       replication quorum and candidate priority replication settings, those properties are  then
       used  to  compute  the  synchronous_standby_names value on the primary node. This value is
       automatically maintained on the primary by pg_auto_failover, and is  updated  either  when
       replication settings are changed or when a failover happens.

       The other situation when the pg_auto_failover replication settings are used is a candidate
       election when a failover happens and there is more than two nodes registered in  a  group.
       Then  the  node  with the highest candidate priority is selected, as detailed above in the
       Candidate Priority section.

   Sample architectures with three standby nodes
       When setting the three parameters above, it's possible to design very  different  Postgres
       architectures for your production needs.
         [image: pg_auto_failover architecture with three standby nodes] [image] pg_auto_failover
         architecture with three standby nodes.UNINDENT

         In this case, the system is set up with three standby nodes all set the same  way,  with
         default  parameters.  The  default  parameters support setting number_sync_standbys = 2.
         This means that Postgres will maintain three copies of the production data  set  at  all
         times.

         On  the other hand, if two standby nodes were to fail at the same time, despite the fact
         that two copies of the data are still maintained, the Postgres service would be degraded
         to read-only.

         With this architecture diagram, here's the summary that we obtain:

          $ pg_autoctl show settings
            Context |    Name |                   Setting | Value
          ----------+---------+---------------------------+---------------------------------------------------------------------------------------
          formation | default |      number_sync_standbys | 2
            primary |  node_A | synchronous_standby_names | 'ANY 2 (pgautofailover_standby_2, pgautofailover_standby_4, pgautofailover_standby_3)'
               node |  node_A |        replication quorum | true
               node |  node_B |        replication quorum | true
               node |  node_C |        replication quorum | true
               node |  node_D |        replication quorum | true
               node |  node_A |        candidate priority | 50
               node |  node_B |        candidate priority | 50
               node |  node_C |        candidate priority | 50
               node |  node_D |        candidate priority | 50

   Sample architecture with three standby nodes, one async
         [image:  pg_auto_failover  architecture  with  three  standby  nodes, one async] [image]
         pg_auto_failover architecture with three standby nodes, one async.UNINDENT

         In this case, the system  is  set  up  with  two  standby  nodes  participating  in  the
         replication  quorum,  allowing for number_sync_standbys = 1. The system always maintains
         at least two copies of the data set, one on the primary, another on  either  node  B  or
         node  D. Whenever we lose one of those nodes, we can hold to the guarantee of having two
         copies of the data set.

         Additionally, we have the standby server C which has been set up to not  participate  in
         the  replication  quorum. Node C will not be found in the synchronous_standby_names list
         of nodes. Also,  node  C  is  set  up  to  never  be  a  candidate  for  failover,  with
         candidate-priority = 0.

         This  architecture would fit a situation with nodes A, B, and D are deployed in the same
         data center or availability zone and node C in another one.  Those three nodes  are  set
         up  to  support  the main production traffic and implement high availability of both the
         Postgres service and the data set.

         Node C might be set up for Business Continuity in case the first data center is lost, or
         maybe for reporting needs on another application domain.

         With this architecture diagram, here's the summary that we obtain:

          pg_autoctl show settings
            Context |    Name |                   Setting | Value
          ----------+---------+---------------------------+-------------------------------------------------------------
          formation | default |      number_sync_standbys | 1
            primary |  node_A | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_4, pgautofailover_standby_2)'
               node |  node_A |        replication quorum | true
               node |  node_B |        replication quorum | true
               node |  node_C |        replication quorum | false
               node |  node_D |        replication quorum | true
               node |  node_A |        candidate priority | 50
               node |  node_B |        candidate priority | 50
               node |  node_C |        candidate priority | 0
               node |  node_D |        candidate priority | 50

FAILOVER STATE MACHINE

Introduction
pg_auto_failover uses a state machine for highly controlled execution. As keepers inform
the monitor about new events (or fail to contact it at all), the monitor assigns each node
both a current state and a goal state. A node's current state is a strong guarantee of its
capabilities. States themselves do not cause any actions; actions happen during state
transitions. The assigned goal states inform keepers of what transitions to attempt.

Example of state transitions in a new cluster
A good way to get acquainted with the states is by examining the transitions of a cluster
from birth to high availability.

After starting a monitor and running keeper init for the first data node ("node A"), the
monitor registers the state of that node as "init" with a goal state of "single." The init
state means the monitor knows nothing about the node other than its existence because the
keeper is not yet continuously running there to report node health.

Once the keeper runs and reports its health to the monitor, the monitor assigns it the
state "single," meaning it is just an ordinary Postgres server with no failover. Because
there are not yet other nodes in the cluster, the monitor also assigns node A the goal
state of single -- there's nothing that node A's keeper needs to change.

As soon as a new node ("node B") is initialized, the monitor assigns node A the goal state
of "wait_primary." This means the node still has no failover, but there's hope for a
secondary to synchronize with it soon. To accomplish the transition from single to
wait_primary, node A's keeper adds node B's hostname to pg_hba.conf to allow a hot standby
replication connection.

At the same time, node B transitions into wait_standby with the goal initially of staying
in wait_standby. It can do nothing but wait until node A gives it access to connect. Once
node A has transitioned to wait_primary, the monitor assigns B the goal of "catchingup,"
which gives B's keeper the green light to make the transition from wait_standby to
catchingup. This transition involves running pg_basebackup, editing recovery.conf and
restarting PostgreSQL in Hot Standby node.

Node B reports to the monitor when it's in hot standby mode and able to connect to node A.
The monitor then assigns node B the goal state of "secondary" and A the goal of "primary."
Postgres ships WAL logs from node A and replays them on B. Finally B is caught up and
tells the monitor (specifically B reports its pg_stat_replication.sync_state and WAL
replay lag). At this glorious moment the monitor assigns A the state primary (goal:
primary) and B secondary (goal: secondary).

State reference
The following diagram shows the pg_auto_failover State Machine. It's missing links to the
single state, which can always been reached when removing all the other nodes.
[image: pg_auto_failover Finite State Machine diagram] [image] pg_auto_failover Finite
State Machine diagram.UNINDENT

In the previous diagram we can see that we have a list of six states where the
application can connect to a read-write Postgres service: single, wait_primary, primary,
prepare_maintenance, and apply_settings.

Init
A node is assigned the "init" state when it is first registered with the monitor. Nothing
is known about the node at this point beyond its existence. If no other node has been
registered with the monitor for the same formation and group ID then this node is assigned
a goal state of "single." Otherwise the node has the goal state of "wait_standby."

Single
There is only one node in the group. It behaves as a regular PostgreSQL instance, with no
high availability and no failover. If the administrator removes a node the other node will
revert to the single state.

Wait_primary
Applied to a node intended to be the primary but not yet in that position. The
primary-to-be at this point knows the secondary's node name or IP address, and has granted
the node hot standby access in the pg_hba.conf file.

The wait_primary state may be caused either by a new potential secondary being registered
with the monitor (good), or an existing secondary becoming unhealthy (bad). In the latter
case, during the transition from primary to wait_primary, the primary node's keeper
disables synchronous replication on the node. It also cancels currently blocked queries.

Join_primary
Applied to a primary node when another standby is joining the group. This allows the
primary node to apply necessary changes to its HBA setup before allowing the new node
joining the system to run the pg_basebackup command.

IMPORTANT:
This state has been deprecated, and is no longer assigned to nodes. Any time we would
have used join_primary before, we now use primary instead.

Primary
A healthy secondary node exists and has caught up with WAL replication. Specifically, the
keeper reports the primary state only when it has verified that the secondary is reported
"sync" in pg_stat_replication.sync_state, and with a WAL lag of 0.

The primary state is a strong assurance. It's the only state where we know we can fail
over when required.

During the transition from wait_primary to primary, the keeper also enables synchronous
replication. This means that after a failover the secondary will be fully up to date.

Wait_standby
Monitor decides this node is a standby. Node must wait until the primary has authorized it
to connect and setup hot standby replication.

Catchingup
The monitor assigns catchingup to the standby node when the primary is ready for a
replication connection (pg_hba.conf has been properly edited, connection role added, etc).

The standby node keeper runs pg_basebackup, connecting to the primary's hostname and port.
The keeper then edits recovery.conf and starts PostgreSQL in hot standby node.

Secondary
A node with this state is acting as a hot standby for the primary, and is up to date with
the WAL log there. In particular, it is within 16MB or 1 WAL segment of the primary.

Maintenance
The cluster administrator can manually move a secondary into the maintenance state to
gracefully take it offline. The primary will then transition from state primary to
wait_primary, during which time the secondary will be online to accept writes. When the
old primary reaches the wait_primary state then the secondary is safe to take offline with
minimal consequences.

Prepare_maintenance
The cluster administrator can manually move a primary node into the maintenance state to
gracefully take it offline. The primary then transitions to the prepare_maintenance state
to make sure the secondary is not missing any writes. In the prepare_maintenance state,
the primary shuts down.

Wait_maintenance
The custer administrator can manually move a secondary into the maintenance state to
gracefully take it offline. Before reaching the maintenance state though, we want to
switch the primary node to asynchronous replication, in order to avoid writes being
blocked. In the state wait_maintenance the standby waits until the primary has reached
wait_primary.

Draining
A state between primary and demoted where replication buffers finish flushing. A draining
node will not accept new client writes, but will continue to send existing data to the
secondary.

To implement that with Postgres we actually stop the service. When stopping, Postgres
ensures that the current replication buffers are flushed correctly to synchronous
standbys.

Demoted
The primary keeper or its database were unresponsive past a certain threshold. The monitor
assigns demoted state to the primary to avoid a split-brain scenario where there might be
two nodes that don't communicate with each other and both accept client writes.

In that state the keeper stops PostgreSQL and prevents it from running.

Demote_timeout
If the monitor assigns the primary a demoted goal state but the primary keeper doesn't
acknowledge transitioning to that state within a timeout window, then the monitor assigns
demote_timeout to the primary.

Most commonly may happen when the primary machine goes silent. The keeper is not reporting
to the monitor.

Stop_replication
The stop_replication state is meant to ensure that the primary goes to the demoted state
before the standby goes to single and accepts writes (in case the primary can’t contact
the monitor anymore). Before promoting the secondary node, the keeper stops PostgreSQL on
the primary to avoid split-brain situations.

For safety, when the primary fails to contact the monitor and fails to see the
pg_auto_failover connection in pg_stat_replication, then it goes to the demoted state of
its own accord.

Prepare_promotion
The prepare_promotion state is meant to prepare the standby server to being promoted. This
state allows synchronisation on the monitor, making sure that the primary has stopped
Postgres before promoting the secondary, hence preventing split brain situations.

Report_LSN
The report_lsn state is assigned to standby nodes when a failover is orchestrated and
there are several standby nodes. In order to pick the furthest standby in the replication,
pg_auto_failover first needs a fresh report of the current LSN position reached on each
standby node.

When a node reaches the report_lsn state, the replication stream is stopped, by restarting
Postgres without a primary_conninfo. This allows the primary node to detect Network
Partitions, i.e. when the primary can't connect to the monitor and there's no standby
listed in pg_stat_replication.

Fast_forward
The fast_forward state is assigned to the selected promotion candidate during a failover
when it won the election thanks to the candidate priority settings, but the selected node
is not the most advanced standby node as reported in the report_lsn state.

Missing WAL bytes are fetched from one of the most advanced standby nodes by using
Postgres cascading replication features: it is possible to use any standby node in the
primary_conninfo.

Dropped
The dropped state is assigned to a node when the pg_autoctl drop node command is used.
This allows the node to implement specific local actions before being entirely removed
from the monitor database.

When a node reports reaching the dropped state, the monitor removes its entry. If a node
is not reporting anymore, maybe because it's completely unavailable, then it's possible to
run the pg_autoctl drop node --force command, and then the node entry is removed from the
monitor.

Failover logic
This section needs to be expanded further, but below is the failover state machine for
each node that is implemented by the monitor:
[image: Node state machine] [image] Node state machine.UNINDENT

Since the state machines of the data nodes always move in tandem, a pair (group) of data
nodes also implicitly has the following state machine:
[image: Group state machine] [image] Group state machine.UNINDENT

pg_auto_failover keeper's State Machine
When built in TEST mode, it is then possible to use the following command to get a visual
representation of the Keeper's Finite State Machine:

$ PG_AUTOCTL_DEBUG=1 pg_autoctl do fsm gv | dot -Tsvg > fsm.svg

The dot program is part of the Graphviz suite and produces the following output:
[image: Keeper state machine] [image] Keeper State Machine.UNINDENT

FAILOVER AND FAULT TOLERANCE

At the heart of the pg_auto_failover implementation is a State Machine. The state machine
is driven by the monitor, and its transitions are implemented in the keeper service, which
then reports success to the monitor.

The keeper is allowed to retry transitions as many times as needed until they succeed, and
reports also failures to reach the assigned state to the monitor node. The monitor also
implements frequent health-checks targeting the registered PostgreSQL nodes.

When the monitor detects something is not as expected, it takes action by assigning a new
goal state to the keeper, that is responsible for implementing the transition to this new
state, and then reporting.

Unhealthy Nodes
The pg_auto_failover monitor is responsible for running regular health-checks with every
PostgreSQL node it manages. A health-check is successful when it is able to connect to the
PostgreSQL node using the PostgreSQL protocol (libpq), imitating the pg_isready command.

How frequent those health checks are (20s by default), the PostgreSQL connection timeout
in use (5s by default), and how many times to retry in case of a failure before marking
the node unhealthy (2 by default) are GUC variables that you can set on the Monitor node
itself. Remember, the monitor is implemented as a PostgreSQL extension, so the setup is a
set of PostgreSQL configuration settings:

SELECT name, setting
FROM pg_settings
WHERE name ~ 'pgautofailover\.health';
name | setting
-----------------------------------------+---------
pgautofailover.health_check_max_retries | 2
pgautofailover.health_check_period | 20000
pgautofailover.health_check_retry_delay | 2000
pgautofailover.health_check_timeout | 5000
(4 rows)

The pg_auto_failover keeper also reports if PostgreSQL is running as expected. This is
useful for situations where the PostgreSQL server / OS is running fine and the keeper
(pg_autoctl run) is still active, but PostgreSQL has failed. Situations might include
File System is Full on the WAL disk, some file system level corruption, missing files,
etc.

Here's what happens to your PostgreSQL service in case of any single-node failure is
observed:

• Primary node is monitored unhealthy

When the primary node is unhealthy, and only when the secondary node is itself in
good health, then the primary node is asked to transition to the DRAINING state, and
the attached secondary is asked to transition to the state PREPARE_PROMOTION. In this
state, the secondary is asked to catch-up with the WAL traffic from the primary, and
then report success.

The monitor then continues orchestrating the promotion of the standby: it stops the
primary (implementing STONITH in order to prevent any data loss), and promotes the
secondary into being a primary now.

Depending on the exact situation that triggered the primary unhealthy, it's possible
that the secondary fails to catch-up with WAL from it, in that case after the
PREPARE_PROMOTION_CATCHUP_TIMEOUT the standby reports success anyway, and the
failover sequence continues from the monitor.

• Secondary node is monitored unhealthy

When the secondary node is unhealthy, the monitor assigns to it the state CATCHINGUP,
and assigns the state WAIT_PRIMARY to the primary node. When implementing the
transition from PRIMARY to WAIT_PRIMARY, the keeper disables synchronous replication.

When the keeper reports an acceptable WAL difference in the two nodes again, then the
replication is upgraded back to being synchronous. While a secondary node is not in
the SECONDARY state, secondary promotion is disabled.

• Monitor node has failed

Then the primary and secondary node just work as if you didn't have setup
pg_auto_failover in the first place, as the keeper fails to report local state from
the nodes. Also, health checks are not performed. It means that no automated failover
may happen, even if needed.

Network Partitions
Adding to those simple situations, pg_auto_failover is also resilient to Network
Partitions. Here's the list of situation that have an impact to pg_auto_failover behavior,
and the actions taken to ensure High Availability of your PostgreSQL service:

• Primary can't connect to Monitor

Then it could be that either the primary is alone on its side of a network split, or
that the monitor has failed. The keeper decides depending on whether the secondary
node is still connected to the replication slot, and if we have a secondary,
continues to serve PostgreSQL queries.

Otherwise, when the secondary isn't connected, and after the
NETWORK_PARTITION_TIMEOUT has elapsed, the primary considers it might be alone in a
network partition: that's a potential split brain situation and with only one way to
prevent it. The primary stops, and reports a new state of DEMOTE_TIMEOUT.

The network_partition_timeout can be setup in the keeper's configuration and defaults
to 20s.

• Monitor can't connect to Primary

Once all the retries have been done and the timeouts are elapsed, then the primary
node is considered unhealthy, and the monitor begins the failover routine. This
routine has several steps, each of them allows to control our expectations and step
back if needed.

For the failover to happen, the secondary node needs to be healthy and caught-up with
the primary. Only if we timeout while waiting for the WAL delta to resorb (30s by
default) then the secondary can be promoted with uncertainty about the data
durability in the group.

• Monitor can't connect to Secondary

As soon as the secondary is considered unhealthy then the monitor changes the
replication setting to asynchronous on the primary, by assigning it the WAIT_PRIMARY
state. Also the secondary is assigned the state CATCHINGUP, which means it can't be
promoted in case of primary failure.

As the monitor tracks the WAL delta between the two servers, and they both report it
independently, the standby is eligible to promotion again as soon as it's caught-up
with the primary again, and at this time it is assigned the SECONDARY state, and the
replication will be switched back to synchronous.

Failure handling and network partition detection
If a node cannot communicate to the monitor, either because the monitor is down or because
there is a problem with the network, it will simply remain in the same state until the
monitor comes back.

If there is a network partition, it might be that the monitor and secondary can still
communicate and the monitor decides to promote the secondary since the primary is no
longer responsive. Meanwhile, the primary is still up-and-running on the other side of the
network partition. If a primary cannot communicate to the monitor it starts checking
whether the secondary is still connected. In PostgreSQL, the secondary connection
automatically times out after 30 seconds. If last contact with the monitor and the last
time a connection from the secondary was observed are both more than 30 seconds in the
past, the primary concludes it is on the losing side of a network partition and shuts
itself down. It may be that the secondary and the monitor were actually down and the
primary was the only node that was alive, but we currently do not have a way to
distinguish such a situation. As with consensus algorithms, availability can only be
correctly preserved if at least 2 out of 3 nodes are up.

In asymmetric network partitions, the primary might still be able to talk to the
secondary, while unable to talk to the monitor. During failover, the monitor therefore
assigns the secondary the stop_replication state, which will cause it to disconnect from
the primary. After that, the primary is expected to shut down after at least 30 and at
most 60 seconds. To factor in worst-case scenarios, the monitor waits for 90 seconds
before promoting the secondary to become the new primary.

INSTALLING PG_AUTO_FAILOVER

       We  provide  native  system  packages  for  pg_auto_failover   on   most   popular   Linux
       distributions.

       Use  the  steps  below  to  install pg_auto_failover on PostgreSQL 11. At the current time
       pg_auto_failover is compatible with both PostgreSQL 10 and PostgreSQL 11.

   Ubuntu or Debian
   Quick install
       The following installation method downloads a bash script that  automates  several  steps.
       The  full  script  is  available for review at our package cloud installation instructions
       page.

          # add the required packages to your system
          curl https://install.citusdata.com/community/deb.sh | sudo bash

          # install pg_auto_failover
          sudo apt-get install postgresql-11-auto-failover

          # confirm installation
          /usr/bin/pg_autoctl --version

   Manual Installation
       If you'd prefer to install your repo on your system manually, follow the instructions from
       package cloud manual installation page. This page will guide you with the specific details
       to achieve the 3 steps:

          1. install CitusData GnuPG key for its package repository

          2. install a new apt source for CitusData packages

          3. update your available package list

       Then when that's done, you can proceed with installing pg_auto_failover itself as  in  the
       previous case:

          # install pg_auto_failover
          sudo apt-get install postgresql-11-auto-failover

          # confirm installation
          /usr/bin/pg_autoctl --version

   Fedora, CentOS, or Red Hat
   Quick install
       The  following  installation  method downloads a bash script that automates several steps.
       The full script is available for review at our  package  cloud  installation  instructions
       page url.

          # add the required packages to your system
          curl https://install.citusdata.com/community/rpm.sh | sudo bash

          # install pg_auto_failover
          sudo yum install -y pg-auto-failover14_12

          # confirm installation
          /usr/pgsql-12/bin/pg_autoctl --version

   Manual installation
       If you'd prefer to install your repo on your system manually, follow the instructions from
       package cloud manual installation page. This page will guide you with the specific details
       to achieve the 3 steps:

          1. install the pygpgme yum-utils packages for your distribution

          2. install a new RPM reposiroty for CitusData packages

          3. update your local yum cache

       Then  when  that's done, you can proceed with installing pg_auto_failover itself as in the
       previous case:

          # install pg_auto_failover
          sudo yum install -y pg-auto-failover14_12

          # confirm installation
          /usr/pgsql-12/bin/pg_autoctl --version

   Installing a pgautofailover Systemd unit
       The command pg_autoctl show systemd outputs a systemd unit file that you can use to  setup
       a boot-time registered service for pg_auto_failover on your machine.

       Here's a sample output from the command:

          $ export PGDATA=/var/lib/postgresql/monitor
          $ pg_autoctl show systemd
          13:44:34 INFO  HINT: to complete a systemd integration, run the following commands:
          13:44:34 INFO  pg_autoctl -q show systemd --pgdata "/var/lib/postgresql/monitor" | sudo tee /etc/systemd/system/pgautofailover.service
          13:44:34 INFO  sudo systemctl daemon-reload
          13:44:34 INFO  sudo systemctl start pgautofailover
          [Unit]
          Description = pg_auto_failover

          [Service]
          WorkingDirectory = /var/lib/postgresql
          Environment = 'PGDATA=/var/lib/postgresql/monitor'
          User = postgres
          ExecStart = /usr/lib/postgresql/10/bin/pg_autoctl run
          Restart = always
          StartLimitBurst = 0

          [Install]
          WantedBy = multi-user.target

       Copy/pasting  the  commands  given  in  the  hint  output from the command will enable the
       pgautofailer service on your system, when using systemd.

       It is important that PostgreSQL is started by pg_autoctl rather than by systemd itself, as
       it might be that a failover has been done during a reboot, for instance, and that once the
       reboot complete we want the local Postgres to re-join as a secondary node where it used to
       be a primary node.

SECURITY SETTINGS FOR PG_AUTO_FAILOVER

       In order to be able to orchestrate fully automated failovers, pg_auto_failover needs to be
       able to establish the following Postgres connections:

          • from the monitor node to each Postgres node to check the node's “health”

          • from each Postgres node to the monitor to  implement  our  node_active  protocol  and
            fetch the current assigned state for this node

          • from the secondary node to the primary node for Postgres streaming replication.

       Postgres  Client  authentication  is controlled by a configuration file: pg_hba.conf. This
       file contains a list of rules where each rule may allow or reject a connection attempt.

       For pg_auto_failover to work as intended, some HBA rules need to be  added  to  each  node
       configuration.  You  can  choose  to  provision  the  pg_hba.conf  file yourself thanks to
       pg_autoctl options' --skip-pg-hba, or you can use the following options to  control  which
       kind of rules are going to be added for you.

   Postgres HBA rules
       For  your  application to be able to connect to the current Postgres primary servers, some
       application specific HBA rules have to be added to pg_hba.conf. There is no provision  for
       doing that in pg_auto_failover.

       In  other  words, it is expected that you have to edit pg_hba.conf to open connections for
       your application needs.

   The trust security model
       As its name suggests the trust security  model  is  not  enabling  any  kind  of  security
       validation.  This  setting  is popular for testing deployments though, as it makes it very
       easy to verify that everything works as intended before putting security  restrictions  in
       place.

       To enable a “trust” security model with pg_auto_failover, use the pg_autoctl option --auth
       trust when creating nodes:

          $ pg_autoctl create monitor --auth trust ...
          $ pg_autoctl create postgres --auth trust ...
          $ pg_autoctl create postgres --auth trust ...

       When using --auth trust pg_autoctl adds new HBA rules in  the  monitor  and  the  Postgres
       nodes to enable connections as seen above.

   Authentication with passwords
       To  setup  pg_auto_failover with password for connections, you can use one of the password
       based authentication methods supported by Postgres, such as password or scram-sha-256.  We
       recommend the latter, as in the following example:

          $ pg_autoctl create monitor --auth scram-sha-256 ...

       The  pg_autoctl  does  not set the password for you. The first step is to set the database
       user password in the monitor database thanks to the following command:

          $ psql postgres://monitor.host/pg_auto_failover
          > alter user autoctl_node password 'h4ckm3';

       Now that the monitor is ready with our password set for the autoctl_node user, we can  use
       the password in the monitor connection string used when creating Postgres nodes.

       On  the  primary  node,  we  can  create  the  Postgres  setup  as usual, and then set our
       replication password, that we will use if we are demoted and then re-join as a standby:

          $ pg_autoctl create postgres       \
                 --auth scram-sha-256        \
                 ...                         \
                 --monitor postgres://autoctl_node:h4ckm3@monitor.host/pg_auto_failover

          $ pg_autoctl config set replication.password h4ckm3m0r3

       The second Postgres node is going to be initialized as a  secondary  and  pg_autoctl  then
       calls  pg_basebackup  at create time. We need to have the replication password already set
       at this time, and we can achieve that the following way:

          $ export PGPASSWORD=h4ckm3m0r3
          $ pg_autoctl create postgres       \
                 --auth scram-sha-256        \
                 ...                         \
                 --monitor postgres://autoctl_node:h4ckm3@monitor.host/pg_auto_failover

          $ pg_autoctl config set replication.password h4ckm3m0r3

       Note that  you  can  use  The  Password  File  mechanism  as  discussed  in  the  Postgres
       documentation  in  order  to  maintain your passwords in a separate file, not in your main
       pg_auto_failover configuration file. This also avoids using passwords in  the  environment
       and in command lines.

   Encryption of network communications
       Postgres  knows  how  to  use  SSL  to  enable  network  encryption of all communications,
       including authentication with passwords and the whole data set when streaming  replication
       is used.

       To  enable  SSL  on  the  server  an SSL certificate is needed. It could be as simple as a
       self-signed certificate, and pg_autoctl creates such a  certificate  for  you  when  using
       --ssl-self-signed command line option:

          $ pg_autoctl create monitor --ssl-self-signed ...      \
                                      --auth scram-sha-256 ...   \
                                      --ssl-mode require         \
                                      ...

          $ pg_autoctl create postgres --ssl-self-signed ...      \
                                       --auth scram-sha-256 ...   \
                                       ...

          $ pg_autoctl create postgres --ssl-self-signed ...      \
                                       --auth scram-sha-256 ...   \
                                       ...

       In that example we setup SSL connections to encrypt the network traffic, and we still have
       to setup an authentication mechanism exactly as in the previous sections of this document.
       Here  scram-sha-256  has  been  selected,  and the password will be sent over an encrypted
       channel.

       When using the --ssl-self-signed option, pg_autoctl creates a self-signed certificate,  as
       per the Postgres documentation at the Creating Certificates page.

       The  certificate  subject  CN  defaults  to  the  --hostname parameter, which can be given
       explicitly or computed by pg_autoctl as either your hostname  when  you  have  proper  DNS
       resolution, or your current IP address.

       Self-signed  certificates  provide  protection  against eavesdropping; this setup does NOT
       protect against  Man-In-The-Middle  attacks  nor  Impersonation  attacks.  See  PostgreSQL
       documentation page SSL Support for details.

   Using your own SSL certificates
       In  many  cases  you  will  want  to  install certificates provided by your local security
       department and signed by a trusted Certificate Authority. In that case one solution is  to
       use --skip-pg-hba and do the whole setup yourself.

       It  is  still possible to give the certificates to pg_auto_failover and have it handle the
       Postgres setup for you:

          $ pg_autoctl create monitor --ssl-ca-file root.crt   \
                                      --ssl-crl-file root.crl  \
                                      --server-cert server.crt  \
                                      --server-key server.key  \
                                      --ssl-mode verify-full \
                                      ...

          $ pg_autoctl create postgres --ssl-ca-file root.crt   \
                                       --server-cert server.crt  \
                                       --server-key server.key  \
                                       --ssl-mode verify-full \
                                       ...

          $ pg_autoctl create postgres --ssl-ca-file root.crt   \
                                       --server-cert server.crt  \
                                       --server-key server.key  \
                                       --ssl-mode verify-full \
                                       ...

       The option --ssl-mode can be used to  force  connection  strings  used  by  pg_autoctl  to
       contain  your  preferred ssl mode. It defaults to require when using --ssl-self-signed and
       to allow when --no-ssl is used.  Here, we set --ssl-mode to verify-full which requires SSL
       Certificates Authentication, covered next.

       The default --ssl-mode when providing your own certificates (signed by your trusted CA) is
       then verify-full. This setup applies to the client connection where the server identity is
       going  to  be  checked  against  the  root certificate provided with --ssl-ca-file and the
       revocation list optionally provided with the --ssl-crl-file. Both those files are used  as
       the  respective parameters sslrootcert and sslcrl in pg_autoctl connection strings to both
       the monitor and the streaming replication primary server.

   SSL Certificates Authentication
       Given those files, it is then possible to use certificate based authentication  of  client
       connections.  For that, it is necessary to prepare client certificates signed by your root
       certificate private key and using the  target  user  name  as  its  CN,  as  per  Postgres
       documentation for Certificate Authentication:
          The  cn  (Common  Name)  attribute of the certificate will be compared to the requested
          database user name, and if they match the login will be allowed

       For enabling the cert authentication method with pg_auto_failover, you need to  prepare  a
       Client  Certificate  for  the  user postgres and used by pg_autoctl when connecting to the
       monitor,   to   place    in    ~/.postgresql/postgresql.crt    along    with    its    key
       ~/.postgresql/postgresql.key,  in  the home directory of the user that runs the pg_autoctl
       service (which defaults to postgres).

       Then you need to create a user name map as documented in Postgres page User Name  Maps  so
       that your certificate can be used to authenticate pg_autoctl users.

       The  ident  map  in  pg_ident.conf  on  the  pg_auto_failover monitor should then have the
       following entry, to allow postgres to connect as  the  autoctl_node  user  for  pg_autoctl
       operations:

          # MAPNAME       SYSTEM-USERNAME         PG-USERNAME

          # pg_autoctl runs as postgres and connects to the monitor autoctl_node user
          pgautofailover   postgres               autoctl_node

       To  enable  streaming replication, the pg_ident.conf file on each Postgres node should now
       allow   the   postgres   user   in   the   client   certificate   to   connect   as    the
       pgautofailover_replicator database user:

          # MAPNAME       SYSTEM-USERNAME         PG-USERNAME

          # pg_autoctl runs as postgres and connects to the monitor autoctl_node user
          pgautofailover  postgres                pgautofailover_replicator

       Given  that  user  name  map, you can then use the cert authentication method. As with the
       pg_ident.conf provisioning, it is best to now provision the HBA rules yourself, using  the
       --skip-pg-hba option:

          $ pg_autoctl create postgres --skip-pg-hba --ssl-ca-file ...

       The  HBA  rule  will  use the authentication method cert with a map option, and might then
       look like the following on the monitor:

          # allow certificate based authentication to the monitor
          hostssl pg_auto_failover autoctl_node 10.0.0.0/8 cert map=pgautofailover

       Then your pg_auto_failover nodes on the 10.0.0.0 network are allowed  to  connect  to  the
       monitor  with  the  user  autoctl_node  used by pg_autoctl, assuming they have a valid and
       trusted client certificate.

       The HBA rule to use on the Postgres nodes to  allow  for  Postgres  streaming  replication
       connections looks like the following:

          # allow streaming replication for pg_auto_failover nodes
          hostssl replication pgautofailover_replicator 10.0.0.0/8 cert map=pgautofailover

       Because  the  Postgres  server  runs  as  the  postgres system user, the connection to the
       primary node can be made with SSL enabled  and  will  then  use  the  client  certificates
       installed in the postgres home directory in ~/.postgresql/postgresql.{key,cert} locations.

   Postgres HBA provisioning
       While  pg_auto_failover  knows how to manage the Postgres HBA rules that are necessary for
       your stream replication needs and for  its  monitor  protocol,  it  will  not  manage  the
       Postgres HBA rules that are needed for your applications.

       If  you  have  your  own  HBA  provisioning solution, you can include the rules needed for
       pg_auto_failover and then use the --skip-pg-hba option to the pg_autoctl create commands.

   Enable SSL connections on an existing setup
       Whether you upgrade pg_auto_failover from a previous version that did not have support for
       the  SSL  features,  or  when  you started with --no-ssl and later change your mind, it is
       possible with pg_auto_failover to add SSL settings on system that has already  been  setup
       without explicit SSL support.

       In this section we detail how to upgrade to SSL settings.

       Installing  Self-Signed  certificates on-top of an already existing pg_auto_failover setup
       is done with one of the following pg_autoctl  command  variants,  depending  if  you  want
       self-signed certificates or fully verified ssl certificates:

          $ pg_autoctl enable ssl --ssl-self-signed --ssl-mode required

          $ pg_autoctl enable ssl --ssl-ca-file root.crt   \
                                  --ssl-crl-file root.crl  \
                                  --server-cert server.crt  \
                                  --server-key server.key  \
                                  --ssl-mode verify-full

       The  pg_autoctl  enable  ssl  command  edits  the  postgresql-auto-failover.conf  Postgres
       configuration file to match the command line arguments given and enable SSL as instructed,
       and then updates the pg_autoctl configuration.

       The  connection  string  to  connect  to  the monitor is also automatically updated by the
       pg_autoctl enable ssl command. You can verify your new configuration with:

          $ pg_autoctl config get pg_autoctl.monitor

       Note that an already running pg_autoctl daemon will try to reload its configuration  after
       pg_autoctl  enable  ssl  has  finished. In some cases this is not possible to do without a
       restart. So be sure to check the logs from a running daemon to  confirm  that  the  reload
       succeeded.  If  it did not you may need to restart the daemon to ensure the new connection
       string is used.

       The HBA settings are not edited, irrespective of the --skip-pg-hba that has been  used  at
       creation  time.  That's  because  the  host records match either SSL or non-SSL connection
       attempts in Postgres HBA file, so the pre-existing setup will continue to work. To enhance
       the SSL setup, you can manually edit the HBA files and change the existing lines from host
       to hostssl to dissallow unencrypted connections at the server side.

       In summary, to upgrade an existing pg_auto_failover setup to enable SSL:

          1. run the pg_autoctl enable ssl command on your monitor  and  then  all  the  Postgres
             nodes,

          2. on  the  Postgres  nodes,  review  your pg_autoctl logs to make sure that the reload
             operation has been effective, and review your Postgres settings to verify  that  you
             have the expected result,

          3. review  your  HBA  rules  setup  to  change  the pg_auto_failover rules from host to
             hostssl to disallow insecure connections.

MANUAL PAGES

       The pg_autoctl tool hosts many commands and sub-commands. Each  of  them  have  their  own
       manual page.

   pg_autoctl
       pg_autoctl - control a pg_auto_failover node

   Synopsis
       pg_autoctl provides the following commands:

          + create   Create a pg_auto_failover node, or formation
          + drop     Drop a pg_auto_failover node, or formation
          + config   Manages the pg_autoctl configuration
          + show     Show pg_auto_failover information
          + enable   Enable a feature on a formation
          + disable  Disable a feature on a formation
          + get      Get a pg_auto_failover node, or formation setting
          + set      Set a pg_auto_failover node, or formation setting
          + perform  Perform an action orchestrated by the monitor
            run      Run the pg_autoctl service (monitor or keeper)
            watch    Display a dashboard to watch monitor's events and state
            stop     signal the pg_autoctl service for it to stop
            reload   signal the pg_autoctl for it to reload its configuration
            status   Display the current status of the pg_autoctl service
            help     print help message
            version  print pg_autoctl version

          pg_autoctl create
            monitor    Initialize a pg_auto_failover monitor node
            postgres   Initialize a pg_auto_failover standalone postgres node
            formation  Create a new formation on the pg_auto_failover monitor

          pg_autoctl drop
            monitor    Drop the pg_auto_failover monitor
            node       Drop a node from the pg_auto_failover monitor
            formation  Drop a formation on the pg_auto_failover monitor

          pg_autoctl config
            check  Check pg_autoctl configuration
            get    Get the value of a given pg_autoctl configuration variable
            set    Set the value of a given pg_autoctl configuration variable

          pg_autoctl show
            uri            Show the postgres uri to use to connect to pg_auto_failover nodes
            events         Prints monitor's state of nodes in a given formation and group
            state          Prints monitor's state of nodes in a given formation and group
            settings       Print replication settings for a formation from the monitor
            standby-names  Prints synchronous_standby_names for a given group
            file           List pg_autoctl internal files (config, state, pid)
            systemd        Print systemd service file for this node

          pg_autoctl enable
            secondary    Enable secondary nodes on a formation
            maintenance  Enable Postgres maintenance mode on this node
            ssl          Enable SSL configuration on this node

          pg_autoctl disable
            secondary    Disable secondary nodes on a formation
            maintenance  Disable Postgres maintenance mode on this node
            ssl          Disable SSL configuration on this node

          pg_autoctl get
          + node       get a node property from the pg_auto_failover monitor
          + formation  get a formation property from the pg_auto_failover monitor

          pg_autoctl get node
            replication-quorum  get replication-quorum property from the monitor
            candidate-priority  get candidate property from the monitor

          pg_autoctl get formation
            settings              get replication settings for a formation from the monitor
            number-sync-standbys  get number_sync_standbys for a formation from the monitor

          pg_autoctl set
          + node       set a node property on the monitor
          + formation  set a formation property on the monitor

          pg_autoctl set node
            metadata            set metadata on the monitor
            replication-quorum  set replication-quorum property on the monitor
            candidate-priority  set candidate property on the monitor

          pg_autoctl set formation
            number-sync-standbys  set number-sync-standbys for a formation on the monitor

          pg_autoctl perform
            failover    Perform a failover for given formation and group
            switchover  Perform a switchover for given formation and group
            promotion   Perform a failover that promotes a target node

   Description
       The  pg_autoctl  tool is the client tool provided by pg_auto_failover to create and manage
       Postgres nodes and the pg_auto_failover monitor node.  The  command  is  built  with  many
       sub-commands that each have their own manual page.

   Help
       To get the full recursive list of supported commands, use:

          pg_autoctl help

   Version
       To grab the version of pg_autoctl that you're using, use:

          pg_autoctl --version
          pg_autoctl version

       A typical output would be:

          pg_autoctl version 1.4.2
          pg_autoctl extension version 1.4
          compiled with PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
          compatible with Postgres 10, 11, 12, and 13

       The version is also available as a JSON document when using the --json option:

          pg_autoctl --version --json
          pg_autoctl version --json

       A typical JSON output would be:

          {
              "pg_autoctl": "1.4.2",
              "pgautofailover": "1.4",
              "pg_major": "12",
              "pg_version": "12.3",
              "pg_version_str": "PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit",
              "pg_version_num": 120003
          }

       This  is  for version 1.4.2 of pg_auto_failover. This particular version of the pg_autoctl
       client tool has been compiled using libpq for  PostgreSQL  12.3  and  is  compatible  with
       Postgres 10, 11, 12, and 13.

   pg_autoctl create
       pg_autoctl create - Create a pg_auto_failover node, or formation

   pg_autoctl create monitor
       pg_autoctl create monitor - Initialize a pg_auto_failover monitor node

   Synopsis
       This command initializes a PostgreSQL cluster and installs the pgautofailover extension so
       that it's possible to use the new instance to monitor PostgreSQL services:

          usage: pg_autoctl create monitor  [ --pgdata --pgport --pgctl --hostname ]

          --pgctl           path to pg_ctl
          --pgdata          path to data directory
          --pgport          PostgreSQL's port number
          --hostname        hostname by which postgres is reachable
          --auth            authentication method for connections from data nodes
          --skip-pg-hba     skip editing pg_hba.conf rules
          --run             create node then run pg_autoctl service
          --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
          --ssl-mode        use that sslmode in connection strings
          --ssl-ca-file     set the Postgres ssl_ca_file to that file path
          --ssl-crl-file    set the Postgres ssl_crl_file to that file path
          --no-ssl          don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
          --server-key      set the Postgres ssl_key_file to that file path
          --server-cert     set the Postgres ssl_cert_file to that file path

   Description
       The pg_autoctl tool is the client tool provided by pg_auto_failover to create  and  manage
       Postgres  nodes  and  the  pg_auto_failover  monitor  node. The command is built with many
       sub-commands that each have their own manual page.

   Options
       The following options are available to pg_autoctl create monitor:

       --pgctl
              Path to the pg_ctl tool to use for the version of PostgreSQL you want to use.

              Defaults to the pg_ctl found in the PATH when there is a single entry for pg_ctl in
              the PATH. Check your setup using which -a pg_ctl.

              When using an RPM based distribution such as RHEL or CentOS, the path would usually
              be /usr/pgsql-13/bin/pg_ctl for Postgres 13.

              When using a debian based distribution such as debian or  ubuntu,  the  path  would
              usually  be /usr/lib/postgresql/13/bin/pg_ctl for Postgres 13.  Those distributions
              also use the package postgresql-common which provides /usr/bin/pg_config. This tool
              can be automatically used by pg_autoctl to discover the default version of Postgres
              to use on your setup.

       --pgdata
              Location where to initialize a  Postgres  database  cluster,  using  either  pg_ctl
              initdb or pg_basebackup. Defaults to the environment variable PGDATA.

       --pgport
              Postgres port to use, defaults to 5432.

       --hostname
              Hostname or IP address (both v4 and v6 are supported) to use from any other node to
              connect to this node.

              When not provided, a default value is computed by running the following algorithm.

                 1. We get this machine's "public IP" by opening a connection to  the  8.8.8.8:53
                    public  service. Then we get TCP/IP client address that has been used to make
                    that connection.

                 2. We then do a reverse DNS lookup on the IP address found in the previous  step
                    to fetch a hostname for our local machine.

                 3. If  the reverse DNS lookup is successful , then pg_autoctl does a forward DNS
                    lookup of that hostname.

              When the forward DNS lookup response in step 3. is an IP address found  in  one  of
              our local network interfaces, then pg_autoctl uses the hostname found in step 2. as
              the default --hostname. Otherwise it uses the IP address found in step 1.

              You may use the --hostname command line option to bypass the whole DNS lookup based
              process and force the local node name to a fixed value.

       --auth Authentication method used by pg_autoctl when editing the Postgres HBA file to open
              connections to other nodes. No default value, must be provided  by  the  user.  The
              value --trust is only a good choice for testing and evaluation of pg_auto_failover,
              see Security settings for pg_auto_failover for more information.

       --skip-pg-hba
              When this option is used then pg_autoctl refrains from any editing of the  Postgres
              HBA file. Please note that editing the HBA file is still needed so that other nodes
              can connect using either read privileges or replication streaming privileges.

              When --skip-pg-hba is used, pg_autoctl still outputs the HBA entries  it  needs  in
              the logs, it only skips editing the HBA file.

       --run  Immediately run the pg_autoctl service after having created this node.

       --ssl-self-signed
              Generate  SSL self-signed certificates to provide network encryption. This does not
              protect against man-in-the-middle kinds  of  attacks.  See  Security  settings  for
              pg_auto_failover for more about our SSL settings.

       --ssl-mode
              SSL  Mode  used  by  pg_autoctl  when  connecting  to  other  nodes, including when
              connecting for streaming replication.

       --ssl-ca-file
              Set the Postgres ssl_ca_file to that file path.

       --ssl-crl-file
              Set the Postgres ssl_crl_file to that file path.

       --no-ssl
              Don't enable network encryption. This is not recommended, prefer --ssl-self-signed.

       --server-key
              Set the Postgres ssl_key_file to that file path.

       --server-cert
              Set the Postgres ssl_cert_file to that file path.

   pg_autoctl create postgres
       pg_autoctl create postgres - Initialize a pg_auto_failover postgres node

   Synopsis
       The command pg_autoctl create  postgres  initializes  a  standalone  Postgres  node  to  a
       pg_auto_failover  monitor.  The  monitor  is then handling auto-failover for this Postgres
       node (as soon as a secondary has been registered too, and is known to be healthy).

          usage: pg_autoctl create postgres

            --pgctl           path to pg_ctl
            --pgdata          path to data directory
            --pghost          PostgreSQL's hostname
            --pgport          PostgreSQL's port number
            --listen          PostgreSQL's listen_addresses
            --username        PostgreSQL's username
            --dbname          PostgreSQL's database name
            --name            pg_auto_failover node name
            --hostname        hostname used to connect from the other nodes
            --formation       pg_auto_failover formation
            --monitor         pg_auto_failover Monitor Postgres URL
            --auth            authentication method for connections from monitor
            --skip-pg-hba     skip editing pg_hba.conf rules
            --pg-hba-lan      edit pg_hba.conf rules for --dbname in detected LAN
            --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
            --ssl-mode        use that sslmode in connection strings
            --ssl-ca-file     set the Postgres ssl_ca_file to that file path
            --ssl-crl-file    set the Postgres ssl_crl_file to that file path
            --no-ssl          don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
            --server-key      set the Postgres ssl_key_file to that file path
            --server-cert     set the Postgres ssl_cert_file to that file path
            --candidate-priority    priority of the node to be promoted to become primary
            --replication-quorum    true if node participates in write quorum
            --maximum-backup-rate   maximum transfer rate of data transferred from the server during initial sync

   Description
       Three different modes of initialization are supported by this command, corresponding to as
       many implementation strategies.

          1. Initialize a primary node from scratch

             This  happens  when  --pgdata  (or  the  environment  variable  PGDATA) points to an
             non-existing or empty directory. Then the given  --hostname  is  registered  to  the
             pg_auto_failover --monitor as a member of the --formation.

             The  monitor  answers  to  the  registration  call with a state to assign to the new
             member of the group, either SINGLE or  WAIT_STANDBY.  When  the  assigned  state  is
             SINGLE,  then  pg_autoctl  create  postgres  proceeds to initialize a new PostgreSQL
             instance.

          2. Initialize an already existing primary server

             This happens when --pgdata (or the environment variable PGDATA) points to an already
             existing  directory  that  belongs to a PostgreSQL instance. The standard PostgreSQL
             tool pg_controldata is  used  to  recognize  whether  the  directory  belongs  to  a
             PostgreSQL instance.

             In  that  case,  the  given --hostname is registered to the monitor in the tentative
             SINGLE state. When the given --formation and --group is currently  empty,  then  the
             monitor  accepts  the  registration  and  the pg_autoctl create prepares the already
             existing primary server for pg_auto_failover.

          3. Initialize a secondary node from scratch

             This happens when  --pgdata  (or  the  environment  variable  PGDATA)  points  to  a
             non-existing  or empty directory, and when the monitor registration call assigns the
             state WAIT_STANDBY in step 1.

             In that case, the pg_autoctl create command steps  through  the  initial  states  of
             registering  a  secondary  server,  which  includes  preparing  the  primary  server
             PostgreSQL HBA rules and creating a replication slot.

             When the command ends successfully, a PostgreSQL secondary server has  been  created
             with pg_basebackup and is now started, catching-up to the primary server.

          4. Initialize a secondary node from an existing data directory

             When  the  data  directory  pointed  to  by  the  option --pgdata or the environment
             variable PGDATA already exists,  then  pg_auto_failover  verifies  that  the  system
             identifier matches the one of the other nodes already existing in the same group.

             The  system identifier can be obtained with the command pg_controldata. All nodes in
             a physical replication setting must have the  same  system  identifier,  and  so  in
             pg_auto_failover all the nodes in a same group have that constraint too.

             When the system identifier matches the already registered system identifier of other
             nodes in the same group, then the node is  set-up  as  a  standby  and  Postgres  is
             started with the primary conninfo pointed at the current primary.

       The  --auth  option  allows  setting up authentication method to be used when monitor node
       makes a connection to data node with pgautofailover_monitor user. As with  the  pg_autoctl
       create  monitor  command, you could use --auth trust when playing with pg_auto_failover at
       first and consider something production grade later. Also, consider using --skip-pg-hba if
       you already have your own provisioning tools with a security compliance process.

       See Security settings for pg_auto_failover for notes on .pgpass

   Options
       The following options are available to pg_autoctl create postgres:

       --pgctl
              Path to the pg_ctl tool to use for the version of PostgreSQL you want to use.

              Defaults to the pg_ctl found in the PATH when there is a single entry for pg_ctl in
              the PATH. Check your setup using which -a pg_ctl.

              When using an RPM based distribution such as RHEL or CentOS, the path would usually
              be /usr/pgsql-13/bin/pg_ctl for Postgres 13.

              When  using  a  debian  based distribution such as debian or ubuntu, the path would
              usually be /usr/lib/postgresql/13/bin/pg_ctl for Postgres 13.  Those  distributions
              also use the package postgresql-common which provides /usr/bin/pg_config. This tool
              can be automatically used by pg_autoctl to discover the default version of Postgres
              to use on your setup.

       --pgdata
              Location  where  to  initialize  a  Postgres  database cluster, using either pg_ctl
              initdb or pg_basebackup. Defaults to the environment variable PGDATA.

       --pghost
              Hostname to use when connecting to the local Postgres instance from the  pg_autoctl
              process. By default, this field is left blank in the connection string, allowing to
              use Unix Domain Sockets with the default  path  compiled  in  your  libpq  version,
              usually  provided  by  the Operating System. That would be /var/run/postgresql when
              using debian or ubuntu.

       --pgport
              Postgres port to use, defaults to 5432.

       --listen
              PostgreSQL's listen_addresses to setup. At the moment only one address is supported
              in this command line option.

       --username
              PostgreSQL's  username  to  use  when  connecting to the local Postgres instance to
              manage it.

       --dbname
              PostgreSQL's database name to use in your application. Defaults to being  the  same
              as the --username, or to postgres when none of those options are used.

       --name Node  name  used  on the monitor to refer to this node. The hostname is a technical
              information, and given Postgres requirements on the HBA setup  and  DNS  resolution
              (both forward and reverse lookups), IP addresses are often used for the hostname.

              The --name option allows using a user-friendly name for your Postgres nodes.

       --hostname
              Hostname or IP address (both v4 and v6 are supported) to use from any other node to
              connect to this node.

              When not provided, a default value is computed by running the following algorithm.

                 1. We get this machine's "public IP"  by  opening  a  connection  to  the  given
                    monitor  hostname  or  IP address. Then we get TCP/IP client address that has
                    been used to make that connection.

                 2. We then do a reverse DNS lookup on the IP address found in the previous  step
                    to fetch a hostname for our local machine.

                 3. If  the reverse DNS lookup is successful , then pg_autoctl does a forward DNS
                    lookup of that hostname.

              When the forward DNS lookup response in step 3. is an IP address found  in  one  of
              our local network interfaces, then pg_autoctl uses the hostname found in step 2. as
              the default --hostname. Otherwise it uses the IP address found in step 1.

              You may use the --hostname command line option to bypass the whole DNS lookup based
              process and force the local node name to a fixed value.

       --formation
              Formation  to  register  the  node  into  on  the  monitor. Defaults to the default
              formation, that is automatically created in the monitor in  the  pg_autoctl  create
              monitor command.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node username and
              target the pg_auto_failover database name. It is possible to show the Postgres  URI
              from the monitor node using the command pg_autoctl show uri.

       --auth Authentication method used by pg_autoctl when editing the Postgres HBA file to open
              connections to other nodes. No default value, must be provided  by  the  user.  The
              value --trust is only a good choice for testing and evaluation of pg_auto_failover,
              see Security settings for pg_auto_failover for more information.

       --skip-pg-hba
              When this option is used then pg_autoctl refrains from any editing of the  Postgres
              HBA file. Please note that editing the HBA file is still needed so that other nodes
              can connect using either read privileges or replication streaming privileges.

              When --skip-pg-hba is used, pg_autoctl still outputs the HBA entries  it  needs  in
              the logs, it only skips editing the HBA file.

       --pg-hba-lan
              When this option is used pg_autoctl determines the local IP address used to connect
              to the monitor, and retrieves its netmask, and uses that to compute your local area
              network CIDR. This CIDR is then opened for connections in the Postgres HBA rules.

              For instance, when the monitor resolves to 192.168.0.1 and your local Postgres node
              uses an inferface with IP  address  192.168.0.2/255.255.255.0  to  connect  to  the
              monitor, then the LAN CIDR is computed to be 192.168.0.0/24.

       --candidate-priority
              Sets  this  node  replication  setting  for  candidate  priority to the given value
              (between 0 and 100) at node registration on the monitor. Defaults to 50.

       --replication-quorum
              Sets this node replication setting  for  replication  quorum  to  the  given  value
              (either  true  or  false)  at  node registration on the monitor.  Defaults to true,
              which enables synchronous replication.

       --maximum-backup-rate
              Sets the maximum transfer rate of data transferred from the server  during  initial
              sync. This is used by pg_basebackup.  Defaults to 100M.

       --run  Immediately run the pg_autoctl service after having created this node.

       --ssl-self-signed
              Generate  SSL self-signed certificates to provide network encryption. This does not
              protect against man-in-the-middle kinds  of  attacks.  See  Security  settings  for
              pg_auto_failover for more about our SSL settings.

       --ssl-mode
              SSL  Mode  used  by  pg_autoctl  when  connecting  to  other  nodes, including when
              connecting for streaming replication.

       --ssl-ca-file
              Set the Postgres ssl_ca_file to that file path.

       --ssl-crl-file
              Set the Postgres ssl_crl_file to that file path.

       --no-ssl
              Don't enable network encryption. This is not recommended, prefer --ssl-self-signed.

       --server-key
              Set the Postgres ssl_key_file to that file path.

       --server-cert
              Set the Postgres ssl_cert_file to that file path.

   pg_autoctl create formation
       pg_autoctl create formation - Create a new formation on the pg_auto_failover monitor

   Synopsis
       This command registers a new formation on the monitor, with the specified kind:

          usage: pg_autoctl create formation  [ --pgdata --monitor --formation --kind --dbname  --with-secondary --without-secondary ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --formation   name of the formation to create
          --kind        formation kind, either "pgsql" or "citus"
          --dbname      name for postgres database to use in this formation
          --enable-secondary     create a formation that has multiple nodes that can be
                                 used for fail over when others have issues
          --disable-secondary    create a citus formation without nodes to fail over to
          --number-sync-standbys minimum number of standbys to confirm write

   Description
       A single pg_auto_failover monitor may manage any number of formations, each composed of at
       least one Postgres service group. This commands creates a new formation so that it is then
       possible to register Postgres nodes in the new formation.

   Options
       The following options are available to pg_autoctl create formation:

       --pgdata
              Location where to initialize a  Postgres  database  cluster,  using  either  pg_ctl
              initdb or pg_basebackup. Defaults to the environment variable PGDATA.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node username and
              target the pg_auto_failover database name. It is possible to show the Postgres  URI
              from the monitor node using the command pg_autoctl show uri.

       --formation
              Name of the formation to create.

       --kind A pg_auto_failover formation could be of kind pgsql or of kind citus. At the moment
              citus  formation  kinds  are  not  managed  in   the   Open   Source   version   of
              pg_auto_failover.

       --dbname
              Name  of  the  database  to  use in the formation, mostly useful to formation kinds
              citus where the Citus extension is only installed in a single target database.

       --enable-secondary
              The formation to be created allows using standby nodes. Defaults  to  true.  Mostly
              useful for Citus formations.

       --disable-secondary
              See --enable-secondary above.

       --number-sync-standby
              Postgres  streaming  replication  uses  synchronous_standby_names to setup how many
              standby nodes should have received a copy  of  the  transaction  data.  When  using
              pg_auto_failover this setup is handled at the formation level.

              Defaults  to  zero when creating the first two Postgres nodes in a formation in the
              same group. When set to zero pg_auto_failover  uses  synchronous  replication  only
              when  a standby node is available: the idea is to allow failover, this setting does
              not allow proper HA for Postgres.

              When adding a third  node  that  participates  in  the  quorum  (one  primary,  two
              secondaries), the setting is automatically changed from zero to one.

   pg_autoctl drop
       pg_autoctl drop - Drop a pg_auto_failover node, or formation

   pg_autoctl drop monitor
       pg_autoctl drop monitor - Drop the pg_auto_failover monitor

   Synopsis
       This  command allows to review all the replication settings of a given formation (defaults
       to 'default' as usual):

          usage: pg_autoctl drop monitor [ --pgdata --destroy ]

          --pgdata      path to data directory
          --destroy     also destroy Postgres database

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --destroy
              By default the pg_autoctl drop  monitor  commands  does  not  remove  the  Postgres
              database  for  the monitor. When using --destroy, the Postgres installation is also
              deleted.

   pg_autoctl drop node
       pg_autoctl drop node - Drop a node from the pg_auto_failover monitor

   Synopsis
       This command drops a Postgres node from the pg_auto_failover monitor:

          usage: pg_autoctl drop node [ [ [ --pgdata ] [ --destroy ] ] | [ --monitor [ [ --hostname --pgport ] | [ --formation --name ] ] ] ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --formation   pg_auto_failover formation
          --name        drop the node with the given node name
          --hostname    drop the node with given hostname and pgport
          --pgport      drop the node with given hostname and pgport
          --destroy     also destroy Postgres database
          --force       force dropping the node from the monitor
          --wait        how many seconds to wait, default to 60

   Description
       Two modes of operations are implemented in the pg_autoctl drop node command.

       When removing a node that still exists,  it  is  possible  to  use  pg_autoctl  drop  node
       --destroy  to  remove  the  node  both from the monitor and also delete the local Postgres
       instance entirely.

       When removing a node that doesn't exist physically anymore, or when the VM  that  used  to
       host  the  node  has  been  lost  entirely,  use either the pair of options --hostname and
       --pgport or the pair of options --formation and --name  to  match  the  node  registration
       record  on  the  monitor  database, and get it removed from the known list of nodes on the
       monitor.

       Then option --force can be used when the target node to remove  does  not  exist  anymore.
       When  a  node  has  been  lost entirely, it's not going to be able to finish the procedure
       itself, and it is then possible to instruct the monitor of the situation.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node username and
              target  the pg_auto_failover database name. It is possible to show the Postgres URI
              from the monitor node using the command pg_autoctl show uri.

       --hostname
              Hostname of the Postgres node to remove from the  monitor.  Use  either  --name  or
              --hostname --pgport, but not both.

       --pgport
              Port  of  the  Postgres  node  to  remove  from  the  monitor. Use either --name or
              --hostname --pgport, but not both.

       --name Name of the node to remove from  the  monitor.  Use  either  --name  or  --hostname
              --pgport, but not both.

       --destroy
              By  default  the  pg_autoctl  drop  monitor  commands  does not remove the Postgres
              database for the monitor. When using --destroy, the Postgres installation  is  also
              deleted.

       --force
              By  default  a  node  is  expected  to  reach the assigned state DROPPED when it is
              removed from the monitor, and has the opportunity to  implement  clean-up  actions.
              When  the target node to remove is not available anymore, it is possible to use the
              option --force to immediately remove the node from the monitor.

       --wait How many seconds to wait for the node to be dropped  entirely.  The  command  stops
              when the target node is not to be found on the monitor anymore, or when the timeout
              has elapsed, whichever comes first. The value 0 (zero)  disables  the  timeout  and
              disables waiting entirely, making the command async.

   Examples
          $ pg_autoctl drop node --destroy --pgdata ./node3
          17:52:21 54201 INFO  Reaching assigned state "secondary"
          17:52:21 54201 INFO  Removing node with name "node3" in formation "default" from the monitor
          17:52:21 54201 WARN  Postgres is not running and we are in state secondary
          17:52:21 54201 WARN  Failed to update the keeper's state from the local PostgreSQL instance, see above for details.
          17:52:21 54201 INFO  Calling node_active for node default/4/0 with current state: PostgreSQL is running is false, sync_state is "", latest WAL LSN is 0/0.
          17:52:21 54201 INFO  FSM transition to "dropped": This node is being dropped from the monitor
          17:52:21 54201 INFO  Transition complete: current state is now "dropped"
          17:52:21 54201 INFO  This node with id 4 in formation "default" and group 0 has been dropped from the monitor
          17:52:21 54201 INFO  Stopping PostgreSQL at "/Users/dim/dev/MS/pg_auto_failover/tmux/node3"
          17:52:21 54201 INFO  /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl --pgdata /Users/dim/dev/MS/pg_auto_failover/tmux/node3 --wait stop --mode fast
          17:52:21 54201 INFO  /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl status -D /Users/dim/dev/MS/pg_auto_failover/tmux/node3 [3]
          17:52:21 54201 INFO  pg_ctl: no server running
          17:52:21 54201 INFO  pg_ctl stop failed, but PostgreSQL is not running anyway
          17:52:21 54201 INFO  Removing "/Users/dim/dev/MS/pg_auto_failover/tmux/node3"
          17:52:21 54201 INFO  Removing "/Users/dim/dev/MS/pg_auto_failover/tmux/config/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node3/pg_autoctl.cfg"

   pg_autoctl drop formation
       pg_autoctl drop formation - Drop a formation on the pg_auto_failover monitor

   Synopsis
       This command drops an existing formation on the monitor:

          usage: pg_autoctl drop formation  [ --pgdata --formation ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --formation   name of the formation to drop

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node username and
              target the pg_auto_failover database name. It is possible to show the Postgres  URI
              from the monitor node using the command pg_autoctl show uri.

       --formation
              Name of the formation to drop from the monitor.

   pg_autoctl config
       pg_autoctl config - Manages the pg_autoctl configuration

   pg_autoctl config get
       pg_autoctl config get - Get the value of a given pg_autoctl configuration variable

   Synopsis
       This command prints a pg_autoctl configuration setting:

          usage: pg_autoctl config get  [ --pgdata ] [ --json ] [ section.option ]

          --pgdata      path to data directory

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output JSON formatted data.

   Description
       When the argument section.option is used, this is the name of a configuration ooption. The
       configuration file for pg_autoctl is stored using the INI format.

       When no argument is given to pg_autoctl config get the entire configuration file is  given
       in  the  output. To figure out where the configuration file is stored, see pg_autoctl show
       file and use pg_autoctl show file --config.

   Examples
       Without arguments, we get the entire file:

          $ pg_autoctl config get --pgdata node1
          [pg_autoctl]
          role = keeper
          monitor = postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer
          formation = default
          group = 0
          name = node1
          hostname = localhost
          nodekind = standalone

          [postgresql]
          pgdata = /Users/dim/dev/MS/pg_auto_failover/tmux/node1
          pg_ctl = /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
          dbname = demo
          host = /tmp
          port = 5501
          proxyport = 0
          listen_addresses = *
          auth_method = trust
          hba_level = app

          [ssl]
          active = 1
          sslmode = require
          cert_file = /Users/dim/dev/MS/pg_auto_failover/tmux/node1/server.crt
          key_file = /Users/dim/dev/MS/pg_auto_failover/tmux/node1/server.key

          [replication]
          maximum_backup_rate = 100M
          backup_directory = /Users/dim/dev/MS/pg_auto_failover/tmux/backup/node_1

          [timeout]
          network_partition_timeout = 20
          prepare_promotion_catchup = 30
          prepare_promotion_walreceiver = 5
          postgresql_restart_failure_timeout = 20
          postgresql_restart_failure_max_retries = 3

       It is possible to pipe JSON formatted output to the jq command line and filter the  result
       down to a specific section of the file:

          $ pg_autoctl config get --pgdata node1 --json | jq .pg_autoctl
          {
            "role": "keeper",
            "monitor": "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer",
            "formation": "default",
            "group": 0,
            "name": "node1",
            "hostname": "localhost",
            "nodekind": "standalone"
          }

       Finally, a single configuration element can be listed:

          $ pg_autoctl config get --pgdata node1 ssl.sslmode --json
          require

   pg_autoctl config set
       pg_autoctl config set - Set the value of a given pg_autoctl configuration variable

   Synopsis
       This command prints a pg_autoctl configuration setting:

          usage: pg_autoctl config set  [ --pgdata ] [ --json ] section.option [ value ]

          --pgdata      path to data directory

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output JSON formatted data.

   Description
       This  commands  allows  to  set  a  pg_autoctl  configuration setting to a new value. Most
       settings are possible to change and can be reloaded online.

       Some of those commands can then be applied with a pg_autoctl reload command to an  already
       running process.

   Settings
       pg_autoctl.role
          This setting can not be changed. It can be either monitor or keeper and the rest of the
          configuration file is read depending on this value.

       pg_autoctl.monitor
          URI of the pg_autoctl monitor Postgres service. Can be changed with a reload.

          To register an existing node to a new monitor, use pg_autoctl disable monitor and  then
          pg_autoctl enable monitor.

       pg_autoctl.formation
          Formation  to  which  this  node  has  been  registered.  Changing  this setting is not
          supported.

       pg_autoctl.group
          Group in which this node has been registered. Changing this setting is not supported.

       pg_autoctl.name
          Name of the node as known to the monitor and listed in pg_autoctl show  state.  Can  be
          changed with a reload.

       pg_autoctl.hostname
          Hostname  or  IP  address  of  the node, as known to the monitor. Can be changed with a
          reload.

       pg_autoctl.nodekind
          This setting can not be changed and depends on the command that has been used to create
          this pg_autoctl node.

       postgresql.pgdata
          Directory  where the managed Postgres instance is to be created (or found) and managed.
          Can't be changed.

       postgresql.pg_ctl
          Path to the pg_ctl tool used to manage this Postgres instance.  Absolute  path  depends
          on  the major version of Postgres and looks like /usr/lib/postgresql/13/bin/pg_ctl when
          using a debian or ubuntu OS.

          Can be changed after a major upgrade of Postgres.

       postgresql.dbname
          Name of the database that is used to connect to Postgres. Can be changed, but then must
          be changed manually on the monitor's pgautofailover.formation table with a SQL command.

          WARNING:
              When  using  pg_auto_failover  enterprise  edition  with Citus support, this is the
              database where pg_autoctl maintains the list of Citus  nodes  on  the  coordinator.
              Using the same database name as your application that uses Citus is then crucial.

       postgresql.host
          Hostname to use in connection strings when connecting from the local pg_autoctl process
          to the local Postgres database. Defaults to using the Operating  System  default  value
          for  the  Unix  Domain  Socket  directory,  either  /tmp or when using debian or ubuntu
          /var/run/postgresql.

          Can be changed with a reload.

       postgresql.port
          Port on which Postgres should be managed. Can be changed offline, between a  pg_autoctl
          stop and a subsequent pg_autoctl start.

       postgresql.listen_addresses
          Value  to  set  to  Postgres  parameter of the same name. At the moment pg_autoctl only
          supports a single address for this parameter.

       postgresql.auth_method
          Authentication method to use when editing HBA rules to allow the Postgres  nodes  of  a
          formation  to  connect  to  each other, and to the monitor, and to allow the monitor to
          connect to the nodes.

          Can be changed online with a reload, but actually  adding  new  HBA  rules  requires  a
          restart of the "node-active" service.

       postgresql.hba_level
          This  setting  reflects  the choice of --skip-pg-hba or --pg-hba-lan that has been used
          when creating this pg_autoctl node. Can be changed with a reload, though the HBA  rules
          that have been previously added will not get removed.

       ssl.active, ssl.sslmode, ssl.cert_file, ssl.key_file, etc
          Please  use  the  command pg_autoctl enable ssl or pg_autoctl disable ssl to manage the
          SSL settings in the ssl  section  of  the  configuration.  Using  those  commands,  the
          settings can be changed online.

       replication.maximum_backup_rate
          Used  as  a parameter to pg_basebackup, defaults to 100M. Can be changed with a reload.
          Changing this value does not affect an already running pg_basebackup command.

          Limiting the bandwidth used by pg_basebackup makes the operation slower, and still  has
          the advantage of limiting the impact on the disks of the primary server.

       replication.backup_directory
          Target  location  of  the  pg_basebackup  command  used  by  pg_autoctl when creating a
          secondary node. When done with fetching the data over the network, then pg_autoctl uses
          the  rename(2)  system-call  to  rename  the  temporary download location to the target
          PGDATA location.

          The rename(2) system-call is known to be atomic when both the source and the target  of
          the operation are using the same file system / mount point.

          Can  be  changed  online  with  a reload, will not affect already running pg_basebackup
          sub-processes.

       replication.password
          Used as a parameter in the  connection  string  to  the  upstream  Postgres  node.  The
          "replication" connection uses the password set-up in the pg_autoctl configuration file.

          Changing  the  replication.password  of a pg_autoctl configuration has no effect on the
          Postgres database itself. The password must  match  what  the  Postgres  upstream  node
          expects,  which  can  be  set with the following SQL command run on the upstream server
          (primary or other standby node):

              alter user pgautofailover_replicator password 'h4ckm3m0r3';

          The replication.password can be changed online with a reload, but  requires  restarting
          the  Postgres  service  to  be  activated.  Postgres  only  reads  the primary_conninfo
          connection string at start-up, up to and including Postgres 12. With  Postgres  13  and
          following, it is possible to reload this Postgres paramater.

       timeout.network_partition_timeout
          Timeout  (in  seconds)  that  pg_autoctl waits before deciding that it is on the losing
          side of a network partition. When pg_autoctl fails to connect to the monitor  and  when
          the  local  Postgres  instance pg_stat_replication system view is empty, and after this
          many seconds have passed, then pg_autoctl demotes itself.

          Can be changed with a reload.

       timeout.prepare_promotion_catchup
          Currently not used in the source code. Can be changed with a reload.

       timeout.prepare_promotion_walreceiver
          Currently not used in the source code. Can be changed with a reload.

       timeout.postgresql_restart_failure_timeout
          When pg_autoctl fails to start Postgres for at  least  this  duration  from  the  first
          attempt,  then  it  starts reporting that Postgres is not running to the monitor, which
          might then decide to implement a failover.

          Can be changed with a reload.

       timeout.postgresql_restart_failure_max_retries
          When pg_autoctl fails to start Postgres for at least this many  times  then  it  starts
          reporting  that  Postgres  is  not  running  to the monitor, which them might decide to
          implement a failover.

          Can be changed with a reload.

   pg_autoctl config check
       pg_autoctl config check - Check pg_autoctl configuration

   Synopsis
       This command implements a very basic list of sanity checks for a pg_autoctl node setup:

          usage: pg_autoctl config check  [ --pgdata ] [ --json ]

          --pgdata      path to data directory
          --json        output data in the JSON format

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output JSON formatted data.

   Examples
          $ pg_autoctl config check --pgdata node1
          18:37:27 63749 INFO  Postgres setup for PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" is ok, running with PID 5501 and port 99698
          18:37:27 63749 INFO  Connection to local Postgres ok, using "port=5501 dbname=demo host=/tmp"
          18:37:27 63749 INFO  Postgres configuration settings required for pg_auto_failover are ok
          18:37:27 63749 WARN  Postgres 12.1 does not support replication slots on a standby node
          18:37:27 63749 INFO  Connection to monitor ok, using "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer"
          18:37:27 63749 INFO  Monitor is running version "1.5.0.1", as expected
          pgdata:                /Users/dim/dev/MS/pg_auto_failover/tmux/node1
          pg_ctl:                /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
          pg_version:            12.3
          pghost:                /tmp
          pgport:                5501
          proxyport:             0
          pid:                   99698
          is in recovery:        no
          Control Version:       1201
          Catalog Version:       201909212
          System Identifier:     6941034382470571312
          Latest checkpoint LSN: 0/6000098
          Postmaster status:     ready

   pg_autoctl show
       pg_autoctl show - Show pg_auto_failover information

   pg_autoctl show uri
       pg_autoctl show uri - Show the postgres uri to use to connect to pg_auto_failover nodes

   Synopsis
       This command outputs  the  monitor  or  the  coordinator  Postgres  URI  to  use  from  an
       application to connect to Postgres:

          usage: pg_autoctl show uri  [ --pgdata --monitor --formation --json ]

            --pgdata      path to data directory
            --monitor     monitor uri
            --formation   show the coordinator uri of given formation
            --json        output data in the JSON format

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node username and
              target the pg_auto_failover database name. It is possible to show the Postgres  URI
              from the monitor node using the command pg_autoctl show uri.

              Defaults to the value of the environment variable PG_AUTOCTL_MONITOR.

       --formation
              When  --formation  is  used, lists the Postgres URIs of all known formations on the
              monitor.

       --json Output a JSON formatted data instead of a table formatted list.

   Examples
          $ pg_autoctl show uri
                  Type |    Name | Connection String
          -------------+---------+-------------------------------
               monitor | monitor | postgres://autoctl_node@localhost:5500/pg_auto_failover
             formation | default | postgres://localhost:5502,localhost:5503,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer

          $ pg_autoctl show uri --formation monitor
          postgres://autoctl_node@localhost:5500/pg_auto_failover

          $ pg_autoctl show uri --formation default
          postgres://localhost:5503,localhost:5502,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer

          $ pg_autoctl show uri --json
          [
           {
               "uri": "postgres://autoctl_node@localhost:5500/pg_auto_failover",
               "name": "monitor",
               "type": "monitor"
           },
           {
               "uri": "postgres://localhost:5503,localhost:5502,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer",
               "name": "default",
               "type": "formation"
           }
          ]

   Multi-hosts Postgres connection strings
       PostgreSQL since version 10 includes support for multiple hosts in its  connection  driver
       libpq, with the special target_session_attrs connection property.

       This  multi-hosts  connection  string  facility allows applications to keep using the same
       stable connection string over server-side failovers. That's why pg_autoctl show  uri  uses
       that format.

   pg_autoctl show events
       pg_autoctl show events - Prints monitor's state of nodes in a given formation and group

   Synopsis
       This  command  outputs  the  events  that  the pg_auto_failover events records about state
       changes of the pg_auto_failover nodes managed by the monitor:

          usage: pg_autoctl show events  [ --pgdata --formation --group --count ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --formation   formation to query, defaults to 'default'
          --group       group to query formation, defaults to all
          --count       how many events to fetch, defaults to 10
          --watch       display an auto-updating dashboard
          --json        output data in the JSON format

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node username and
              target  the pg_auto_failover database name. It is possible to show the Postgres URI
              from the monitor node using the command pg_autoctl show uri.

       --formation
              List the events recorded for nodes in the given formation. Defaults to default.

       --count
              By default only the last 10 events are printed.

       --watch
              Take control of the terminal and display the current state of the  system  and  the
              last  events  from  the  monitor.  The  display  is updated automatically every 500
              milliseconds (half a second) and reacts properly to window size change.

              Depending on the terminal window size, a different set of columns is visible in the
              state part of the output. See pg_autoctl watch.

       --json Output a JSON formatted data instead of a table formatted list.

   Examples
          $ pg_autoctl show events --count 2 --json
          [
           {
               "nodeid": 1,
               "eventid": 15,
               "groupid": 0,
               "nodehost": "localhost",
               "nodename": "node1",
               "nodeport": 5501,
               "eventtime": "2021-03-18T12:32:36.103467+01:00",
               "goalstate": "primary",
               "description": "Setting goal state of node 1 \"node1\" (localhost:5501) to primary now that at least one secondary candidate node is healthy.",
               "formationid": "default",
               "reportedlsn": "0/4000060",
               "reportedstate": "wait_primary",
               "reportedrepstate": "async",
               "candidatepriority": 50,
               "replicationquorum": true
           },
           {
               "nodeid": 1,
               "eventid": 16,
               "groupid": 0,
               "nodehost": "localhost",
               "nodename": "node1",
               "nodeport": 5501,
               "eventtime": "2021-03-18T12:32:36.215494+01:00",
               "goalstate": "primary",
               "description": "New state is reported by node 1 \"node1\" (localhost:5501): \"primary\"",
               "formationid": "default",
               "reportedlsn": "0/4000110",
               "reportedstate": "primary",
               "reportedrepstate": "quorum",
               "candidatepriority": 50,
               "replicationquorum": true
           }
          ]

   pg_autoctl show state
       pg_autoctl show state - Prints monitor's state of nodes in a given formation and group

   Synopsis
       This  command  outputs  the  current  state  of the formation and groups registered to the
       pg_auto_failover monitor:

          usage: pg_autoctl show state  [ --pgdata --formation --group ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --formation   formation to query, defaults to 'default'
          --group       group to query formation, defaults to all
          --local       show local data, do not connect to the monitor
          --watch       display an auto-updating dashboard
          --json        output data in the JSON format

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node username and
              target  the pg_auto_failover database name. It is possible to show the Postgres URI
              from the monitor node using the command pg_autoctl show uri.

       --formation
              List the events recorded for nodes in the given formation. Defaults to default.

       --group
              Limit output to a single group in the formation. Default to  including  all  groups
              registered in the target formation.

       --local
              Print the local state information without connecting to the monitor.

       --watch
              Take  control  of  the terminal and display the current state of the system and the
              last events from the monitor.  The  display  is  updated  automatically  every  500
              milliseconds (half a second) and reacts properly to window size change.

              Depending on the terminal window size, a different set of columns is visible in the
              state part of the output. See pg_autoctl watch.

       --json Output a JSON formatted data instead of a table formatted list.

   Description
       The pg_autoctl show state output includes the following columns:

          • Name
                Name of the node.

          • Node
                Node information. When the formation has a single group (group zero),  then  this
                column only contains the nodeId.

                Only Citus formations allow several groups. When using a Citus formation the Node
                column contains the groupId and the nodeId, separated by a colon, such as 0:1 for
                the first coordinator node.

          • Host:Port
                Hostname and port number used to connect to the node.

          • TLI: LSN
                Timeline identifier (TLI) and Postgres Log Sequence Number (LSN).

                The LSN is the current position in the Postgres WAL stream. This is a hexadecimal
                number. See pg_lsn for more information.

                The current timeline is incremented each time a failover happens, or  when  doing
                Point  In  Time Recovery. A node can only reach the secondary state when it is on
                the same timeline as its primary node.

          • Connection
                This  output  field  contains  two  bits  of  information.  First,  the  Postgres
                connection  type that the node provides, either read-write or read-only. Then the
                mark ! is added when the monitor has failed to connect to this node, and  ?  when
                the monitor didn't connect to the node yet.

          • Reported State
                The  latest  reported  FSM  state,  as  reported to the monitor by the pg_autoctl
                process running on the Postgres node.

          • Assigned State
                The assigned FSM state on the monitor. When the assigned state is not the same as
                the  reported  start,  then  the  pg_autoctl process running on the Postgres node
                might have not retrieved the assigned state yet, or might still  be  implementing
                the FSM transition from the current state to the assigned state.

   Examples
          $ pg_autoctl show state
           Name |  Node |      Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
          ------+-------+----------------+----------------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 |   1: 0/4000678 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 |   1: 0/4000678 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 |   1: 0/4000678 |    read-only |           secondary |           secondary

          $ pg_autoctl show state --local
           Name |  Node |      Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
          ------+-------+----------------+----------------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 |   1: 0/4000678 | read-write ? |             primary |             primary

          $ pg_autoctl show state --json
          [
              {
                  "health": 1,
                  "node_id": 1,
                  "group_id": 0,
                  "nodehost": "localhost",
                  "nodename": "node1",
                  "nodeport": 5501,
                  "reported_lsn": "0/4000678",
                  "reported_tli": 1,
                  "formation_kind": "pgsql",
                  "candidate_priority": 50,
                  "replication_quorum": true,
                  "current_group_state": "primary",
                  "assigned_group_state": "primary"
              },
              {
                  "health": 1,
                  "node_id": 2,
                  "group_id": 0,
                  "nodehost": "localhost",
                  "nodename": "node2",
                  "nodeport": 5502,
                  "reported_lsn": "0/4000678",
                  "reported_tli": 1,
                  "formation_kind": "pgsql",
                  "candidate_priority": 50,
                  "replication_quorum": true,
                  "current_group_state": "secondary",
                  "assigned_group_state": "secondary"
              },
              {
                  "health": 1,
                  "node_id": 3,
                  "group_id": 0,
                  "nodehost": "localhost",
                  "nodename": "node3",
                  "nodeport": 5503,
                  "reported_lsn": "0/4000678",
                  "reported_tli": 1,
                  "formation_kind": "pgsql",
                  "candidate_priority": 50,
                  "replication_quorum": true,
                  "current_group_state": "secondary",
                  "assigned_group_state": "secondary"
              }
          ]

   pg_autoctl show settings
       pg_autoctl show settings - Print replication settings for a formation from the monitor

   Synopsis
       This  command allows to review all the replication settings of a given formation (defaults
       to 'default' as usual):

          usage: pg_autoctl show settings  [ --pgdata ] [ --json ] [ --formation ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --json        output data in the JSON format
          --formation   pg_auto_failover formation

   Description
       See also pg_autoctl get formation settings which is a synonym.

       The output contains setting and values that apply at different  contexts,  as  shown  here
       with  a  formation  of  four  nodes,  where node_4 is not participating in the replication
       quorum and also not a candidate for failover:

          $ pg_autoctl show settings
             Context |    Name |                   Setting | Value
           ----------+---------+---------------------------+-------------------------------------------------------------
           formation | default |      number_sync_standbys | 1
             primary |  node_1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_3, pgautofailover_standby_2)'
                node |  node_1 |        replication quorum | true
                node |  node_2 |        replication quorum | true
                node |  node_3 |        replication quorum | true
                node |  node_4 |        replication quorum | false
                node |  node_1 |        candidate priority | 50
                node |  node_2 |        candidate priority | 50
                node |  node_3 |        candidate priority | 50
                node |  node_4 |        candidate priority | 0

       Three replication settings context are listed:

          1. The "formation" context contains a single entry, the value  of  number_sync_standbys
             for the target formation.

          2. The  "primary"  context  contains  one  entry  per  group  of  Postgres nodes in the
             formation, and shows the current value  of  the  synchronous_standby_names  Postgres
             setting  as  computed  by  the  monitor. It should match what's currently set on the
             primary node unless while applying a change, as shown by the primary  being  in  the
             APPLY_SETTING state.

          3. The  "node"  context  contains  two  entry per nodes, one line shows the replication
             quorum setting of nodes, and another line shows the candidate priority of nodes.

       This command gives an overview of all the settings that apply to the current formation.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node username and
              target  the pg_auto_failover database name. It is possible to show the Postgres URI
              from the monitor node using the command pg_autoctl show uri.

              Defaults to the value of the environment variable PG_AUTOCTL_MONITOR.

       --formation
              Show the current replication settings for the  given  formation.  Defaults  to  the
              default formation.

       --json Output a JSON formatted data instead of a table formatted list.

   Examples
          $ pg_autoctl show settings
               Context |    Name |                   Setting | Value
             ----------+---------+---------------------------+-------------------------------------------------------------
             formation | default |      number_sync_standbys | 1
               primary |   node1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
                  node |   node1 |        candidate priority | 50
                  node |   node2 |        candidate priority | 50
                  node |   node3 |        candidate priority | 50
                  node |   node1 |        replication quorum | true
                  node |   node2 |        replication quorum | true
                  node |   node3 |        replication quorum | true

   pg_autoctl show standby-names
       pg_autoctl show standby-names - Prints synchronous_standby_names for a given group

   Synopsis
       This  command  prints  the  current  value  for  synchronous_standby_names for the primary
       Postgres server of the target group (default 0) in the target formation (default default),
       as computed by the monitor:

          usage: pg_autoctl show standby-names  [ --pgdata ] --formation --group

            --pgdata      path to data directory
            --monitor     pg_auto_failover Monitor Postgres URL
            --formation   formation to query, defaults to 'default'
            --group       group to query formation, defaults to all
            --json        output data in the JSON format

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node username and
              target the pg_auto_failover database name. It is possible to show the Postgres  URI
              from the monitor node using the command pg_autoctl show uri.

              Defaults to the value of the environment variable PG_AUTOCTL_MONITOR.

       --formation
              Show  the current synchronous_standby_names value for the given formation. Defaults
              to the default formation.

       --group
              Show the current synchronous_standby_names value for the given group in  the  given
              formation. Defaults to group 0.

       --json Output a JSON formatted data instead of a table formatted list.

   Examples
          $ pg_autoctl show standby-names
          'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'

          $ pg_autoctl show standby-names --json
          {
              "formation": "default",
              "group": 0,
              "synchronous_standby_names": "ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)"
          }

   pg_autoctl show file
       pg_autoctl show file - List pg_autoctl internal files (config, state, pid)

   Synopsis
       This  command  the files that pg_autoctl uses internally for its own configuration, state,
       and pid:

          usage: pg_autoctl show file  [ --pgdata --all --config | --state | --init | --pid --contents ]

          --pgdata      path to data directory
          --all         show all pg_autoctl files
          --config      show pg_autoctl configuration file
          --state       show pg_autoctl state file
          --init        show pg_autoctl initialisation state file
          --pid         show pg_autoctl PID file
          --contents    show selected file contents
          --json        output data in the JSON format

   Description
       The pg_autoctl command follows  the  XDG  Base  Directory  Specification  and  places  its
       internal  and  configuration  files  by default in places such as ~/.config/pg_autoctl and
       ~/.local/share/pg_autoctl.

       It is possible to change the default XDG locations  by  using  the  environment  variables
       XDG_CONFIG_HOME, XDG_DATA_HOME, and XDG_RUNTIME_DIR.

       Also,  pg_config  uses  sub-directories  that  are  specific  to a given PGDATA, making it
       possible to run several Postgres nodes on the same machine, which is  very  practical  for
       testing and development purposes, though not advised for production setups.

   Configuration File
       The  pg_autoctl  configuration  file  for  an  instance  serving  the  data  directory  at
       /data/pgsql is found at ~/.config/pg_autoctl/data/pgsql/pg_autoctl.cfg, written in the INI
       format.

       It  is  possible  to  get  the  location  of  the  configuration file by using the command
       pg_autoctl show file --config --pgdata /data/pgsql and to output its content by using  the
       command pg_autoctl show file --config --contents --pgdata /data/pgsql.

       See also pg_autoctl config get and pg_autoctl config set.

   State File
       The  pg_autoctl  state  file  for an instance serving the data directory at /data/pgsql is
       found at  ~/.local/share/pg_autoctl/data/pgsql/pg_autoctl.state,  written  in  a  specific
       binary format.

       This  file  is not intended to be written by anything else than pg_autoctl itself. In case
       of state corruption, see the trouble shooting section of the documentation.

       It is possible to get the location of the state file by using the command pg_autoctl  show
       file  --state  --pgdata  /data/pgsql  and  to  output  its  content  by  using the command
       pg_autoctl show file --state --contents --pgdata /data/pgsql.

   Init State File
       The pg_autoctl init state file for an instance serving the data directory  at  /data/pgsql
       is  found  at  ~/.local/share/pg_autoctl/data/pgsql/pg_autoctl.init, written in a specific
       binary format.

       This file is not intended to be written by anything else than pg_autoctl itself.  In  case
       of state corruption, see the trouble shooting section of the documentation.

       This initialization state file only exists during the initialization of a pg_auto_failover
       node. In normal operations, this file does not exist.

       It is possible to get the location of the state file by using the command pg_autoctl  show
       file --init --pgdata /data/pgsql and to output its content by using the command pg_autoctl
       show file --init --contents --pgdata /data/pgsql.

   PID File
       The pg_autoctl PID file for an instance serving the data directory at /data/pgsql is found
       at /tmp/pg_autoctl/data/pgsql/pg_autoctl.pid, written in a specific text format.

       The  PID  file  is  located in a temporary directory by default, or in the XDG_RUNTIME_DIR
       directory when this is setup.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --all  List all the files that belong to this pg_autoctl node.

       --config
              Show only the configuration file.

       --state
              Show only the state file.

       --init Show only the init state file, which  only  exists  while  the  command  pg_autoctl
              create  postgres  or the command pg_autoctl create monitor is running, or when than
              command failed (and can then be retried).

       --pid  Show only the pid file.

       --contents
              When one of the options to show a specific file is in use,  then  --contents  shows
              the contents of the selected file instead of showing its absolute file path.

       --json Output JSON formatted data.

   Examples
       The  following  examples  are taken from a QA environment that has been prepared thanks to
       the make cluster command made available to the pg_auto_failover contributors. As a result,
       the XDG environment variables have been tweaked to obtain a self-contained test:

          $  tmux show-env | grep XDG
          XDG_CONFIG_HOME=/Users/dim/dev/MS/pg_auto_failover/tmux/config
          XDG_DATA_HOME=/Users/dim/dev/MS/pg_auto_failover/tmux/share
          XDG_RUNTIME_DIR=/Users/dim/dev/MS/pg_auto_failover/tmux/run

       Within that self-contained test location, we can see the following examples.

          $ pg_autoctl show file --pgdata ./node1
             File | Path
          --------+----------------
           Config | /Users/dim/dev/MS/pg_auto_failover/tmux/config/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.cfg
            State | /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.state
             Init | /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.init
              Pid | /Users/dim/dev/MS/pg_auto_failover/tmux/run/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.pid
             'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'

          $ pg_autoctl show file --pgdata node1 --state
          /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.state

          $ pg_autoctl show file --pgdata node1 --state --contents
          Current Role:             primary
          Assigned Role:            primary
          Last Monitor Contact:     Thu Mar 18 17:32:25 2021
          Last Secondary Contact:   0
          pg_autoctl state version: 1
          group:                    0
          node id:                  1
          nodes version:            0
          PostgreSQL Version:       1201
          PostgreSQL CatVersion:    201909212
          PostgreSQL System Id:     6940955496243696337

          pg_autoctl show file --pgdata node1 --config --contents --json | jq .pg_autoctl
          {
            "role": "keeper",
            "monitor": "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer",
            "formation": "default",
            "group": 0,
            "name": "node1",
            "hostname": "localhost",
            "nodekind": "standalone"
          }

   pg_autoctl show systemd
       pg_autoctl show systemd - Print systemd service file for this node

   Synopsis
       This command outputs a configuration unit that is suitable for registering pg_autoctl as a
       systemd service.

   Examples
          $ pg_autoctl show systemd --pgdata node1
          17:38:29 99778 INFO  HINT: to complete a systemd integration, run the following commands:
          17:38:29 99778 INFO  pg_autoctl -q show systemd --pgdata "node1" | sudo tee /etc/systemd/system/pgautofailover.service
          17:38:29 99778 INFO  sudo systemctl daemon-reload
          17:38:29 99778 INFO  sudo systemctl enable pgautofailover
          17:38:29 99778 INFO  sudo systemctl start pgautofailover
          [Unit]
          Description = pg_auto_failover

          [Service]
          WorkingDirectory = /Users/dim
          Environment = 'PGDATA=node1'
          User = dim
          ExecStart = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl run
          Restart = always
          StartLimitBurst = 0
          ExecReload = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl reload

          [Install]
          WantedBy = multi-user.target

       To avoid the logs output, use the -q option:

          $ pg_autoctl show systemd --pgdata node1 -q
          [Unit]
          Description = pg_auto_failover

          [Service]
          WorkingDirectory = /Users/dim
          Environment = 'PGDATA=node1'
          User = dim
          ExecStart = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl run
          Restart = always
          StartLimitBurst = 0
          ExecReload = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl reload

          [Install]
          WantedBy = multi-user.target

   pg_autoctl enable
       pg_autoctl enable - Enable a feature on a formation

   pg_autoctl enable secondary
       pg_autoctl enable secondary - Enable secondary nodes on a formation

   Synopsis
       This feature makes the most sense when using the Enterprise Edition  of  pg_auto_failover,
       which  is  fully  compatible with Citus formations. When secondary are enabled, then Citus
       workers creation policy is to assign a primary node then a standby node  for  each  group.
       When secondary is disabled the Citus workers creation policy is to assign only the primary
       nodes.

           usage: pg_autoctl enable secondary  [ --pgdata --formation ]

          --pgdata      path to data directory
          --formation   Formation to enable secondary on

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --formation
              Target formation where to enable secondary feature.

   pg_autoctl enable maintenance
       pg_autoctl enable maintenance - Enable Postgres maintenance mode on this node

   Synopsis
       A pg_auto_failover can be put to a maintenance state. The  Postgres  node  is  then  still
       registered  to the monitor, and is known to be unreliable until maintenance is disabled. A
       node in the maintenance state is not a candidate for promotion.

       Typical use of the maintenance state include Operating System  or  Postgres  reboot,  e.g.
       when applying security upgrades.

           usage: pg_autoctl enable maintenance  [ --pgdata --allow-failover ]

          --pgdata      path to data directory

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --formation
              Target formation where to enable secondary feature.

   Examples
          pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000760 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000760 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000760 |    read-only |           secondary |           secondary

          $ pg_autoctl enable maintenance --pgdata node3
          12:06:12 47086 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          12:06:12 47086 INFO  Following table displays times when notifications are received
              Time |  Name |  Node |      Host:Port |       Current State |      Assigned State
          ---------+-------+-------+----------------+---------------------+--------------------
          12:06:12 | node1 |     1 | localhost:5501 |             primary |        join_primary
          12:06:12 | node3 |     3 | localhost:5503 |           secondary |    wait_maintenance
          12:06:12 | node3 |     3 | localhost:5503 |    wait_maintenance |    wait_maintenance
          12:06:12 | node1 |     1 | localhost:5501 |        join_primary |        join_primary
          12:06:12 | node3 |     3 | localhost:5503 |    wait_maintenance |         maintenance
          12:06:12 | node1 |     1 | localhost:5501 |        join_primary |             primary
          12:06:13 | node3 |     3 | localhost:5503 |         maintenance |         maintenance

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000810 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000810 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000810 |         none |         maintenance |         maintenance

   pg_autoctl enable ssl
       pg_autoctl enable ssl - Enable SSL configuration on this node

   Synopsis
       It  is  possible  to  manage  Postgres  SSL  settings with the pg_autoctl command, both at
       pg_autoctl create postgres time and then again to change your  mind  and  update  the  SSL
       settings at run-time.

           usage: pg_autoctl enable ssl  [ --pgdata ] [ --json ]

          --pgdata      path to data directory
          --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
          --ssl-mode        use that sslmode in connection strings
          --ssl-ca-file     set the Postgres ssl_ca_file to that file path
          --ssl-crl-file    set the Postgres ssl_crl_file to that file path
          --no-ssl          don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
          --server-key      set the Postgres ssl_key_file to that file path
          --server-cert     set the Postgres ssl_cert_file to that file path

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --ssl-self-signed
              Generate  SSL self-signed certificates to provide network encryption. This does not
              protect against man-in-the-middle kinds  of  attacks.  See  Security  settings  for
              pg_auto_failover for more about our SSL settings.

       --ssl-mode
              SSL  Mode  used  by  pg_autoctl  when  connecting  to  other  nodes, including when
              connecting for streaming replication.

       --ssl-ca-file
              Set the Postgres ssl_ca_file to that file path.

       --ssl-crl-file
              Set the Postgres ssl_crl_file to that file path.

       --no-ssl
              Don't enable network encryption. This is not recommended, prefer --ssl-self-signed.

       --server-key
              Set the Postgres ssl_key_file to that file path.

       --server-cert
              Set the Postgres ssl_cert_file to that file path.

   pg_autoctl enable monitor
       pg_autoctl enable monitor - Enable a monitor for this node to be orchestrated from

   Synopsis
       It is possible to disable the pg_auto_failover monitor and enable it  again  online  in  a
       running  pg_autoctl  Postgres  node.  The main use-cases where this operation is useful is
       when the monitor node has to be replaced, either  after  a  full  crash  of  the  previous
       monitor node, of for migrating to a new monitor node (hardware replacement, region or zone
       migration, etc).

           usage: pg_autoctl enable monitor  [ --pgdata --allow-failover ] postgres://autoctl_node@new.monitor.add.ress/pg_auto_failover

          --pgdata      path to data directory

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

   Examples
          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000760 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000760 |    read-only |           secondary |           secondary

          $ pg_autoctl enable monitor --pgdata node3 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=require'
          12:42:07 43834 INFO  Registered node 3 (localhost:5503) with name "node3" in formation "default", group 0, state "wait_standby"
          12:42:07 43834 INFO  Successfully registered to the monitor with nodeId 3
          12:42:08 43834 INFO  Still waiting for the monitor to drive us to state "catchingup"
          12:42:08 43834 WARN  Please make sure that the primary node is currently running `pg_autoctl run` and contacting the monitor.

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000810 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000810 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000810 |    read-only |           secondary |           secondary

   pg_autoctl disable
       pg_autoctl disable - Disable a feature on a formation

   pg_autoctl disable secondary
       pg_autoctl disable secondary - Disable secondary nodes on a formation

   Synopsis
       This feature makes the most sense when using the Enterprise Edition  of  pg_auto_failover,
       which  is  fully compatible with Citus formations. When secondary are disabled, then Citus
       workers creation policy is to assign a primary node then a standby node  for  each  group.
       When secondary is disabled the Citus workers creation policy is to assign only the primary
       nodes.

           usage: pg_autoctl disable secondary  [ --pgdata --formation ]

          --pgdata      path to data directory
          --formation   Formation to disable secondary on

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --formation
              Target formation where to disable secondary feature.

   pg_autoctl disable maintenance
       pg_autoctl disable maintenance - Disable Postgres maintenance mode on this node

   Synopsis
       A pg_auto_failover can be put to a maintenance state. The  Postgres  node  is  then  still
       registered  to the monitor, and is known to be unreliable until maintenance is disabled. A
       node in the maintenance state is not a candidate for promotion.

       Typical use of the maintenance state include Operating System  or  Postgres  reboot,  e.g.
       when applying security upgrades.

           usage: pg_autoctl disable maintenance  [ --pgdata --allow-failover ]

          --pgdata      path to data directory

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --formation
              Target formation where to disable secondary feature.

   Examples
          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000810 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000810 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000810 |         none |         maintenance |         maintenance

          $ pg_autoctl disable maintenance --pgdata node3
          12:06:37 47542 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          12:06:37 47542 INFO  Following table displays times when notifications are received
              Time |  Name |  Node |      Host:Port |       Current State |      Assigned State
          ---------+-------+-------+----------------+---------------------+--------------------
          12:06:37 | node3 |     3 | localhost:5503 |         maintenance |          catchingup
          12:06:37 | node3 |     3 | localhost:5503 |          catchingup |          catchingup
          12:06:37 | node3 |     3 | localhost:5503 |          catchingup |           secondary
          12:06:37 | node3 |     3 | localhost:5503 |           secondary |           secondary

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000848 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000848 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000000 |    read-only |           secondary |           secondary

   pg_autoctl disable ssl
       pg_autoctl disable ssl - Disable SSL configuration on this node

   Synopsis
       It  is  possible  to  manage  Postgres  SSL  settings with the pg_autoctl command, both at
       pg_autoctl create postgres time and then again to change your  mind  and  update  the  SSL
       settings at run-time.

           usage: pg_autoctl disable ssl  [ --pgdata ] [ --json ]

          --pgdata      path to data directory
          --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
          --ssl-mode        use that sslmode in connection strings
          --ssl-ca-file     set the Postgres ssl_ca_file to that file path
          --ssl-crl-file    set the Postgres ssl_crl_file to that file path
          --no-ssl          don't disable network encryption (NOT recommended, prefer --ssl-self-signed)
          --server-key      set the Postgres ssl_key_file to that file path
          --server-cert     set the Postgres ssl_cert_file to that file path

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --ssl-self-signed
              Generate  SSL self-signed certificates to provide network encryption. This does not
              protect against man-in-the-middle kinds  of  attacks.  See  Security  settings  for
              pg_auto_failover for more about our SSL settings.

       --ssl-mode
              SSL  Mode  used  by  pg_autoctl  when  connecting  to  other  nodes, including when
              connecting for streaming replication.

       --ssl-ca-file
              Set the Postgres ssl_ca_file to that file path.

       --ssl-crl-file
              Set the Postgres ssl_crl_file to that file path.

       --no-ssl
              Don't   disable   network   encryption.   This   is   not    recommended,    prefer
              --ssl-self-signed.

       --server-key
              Set the Postgres ssl_key_file to that file path.

       --server-cert
              Set the Postgres ssl_cert_file to that file path.

   pg_autoctl disable monitor
       pg_autoctl disable monitor - Disable the monitor for this node

   Synopsis
       It  is  possible  to  disable the pg_auto_failover monitor and enable it again online in a
       running pg_autoctl Postgres node. The main use-cases where this  operation  is  useful  is
       when  the  monitor  node  has  to  be  replaced, either after a full crash of the previous
       monitor node, of for migrating to a new monitor node (hardware replacement, region or zone
       migration, etc).

           usage: pg_autoctl disable monitor  [ --pgdata --force ]

          --pgdata      path to data directory
          --force       force unregistering from the monitor

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --force
              The --force covers the two following situations:

                 1. By default, the command expects to be able to connect to the current monitor.
                    When the current known monitor in the  setup  is  not  running  anymore,  use
                    --force to skip this step.

                 2. When  pg_autoctl  could  connect  to the monitor and the node is found there,
                    this is normally an error that prevents from  disabling  the  monitor.  Using
                    --force  allows  the  command  to drop the node from the monitor and continue
                    with disabling the monitor.

   Examples
          $ pg_autoctl show state
              Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000148 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000148 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000148 |    read-only |           secondary |           secondary

          $ pg_autoctl disable monitor --pgdata node3
          12:41:21 43039 INFO  Found node 3 "node3" (localhost:5503) on the monitor
          12:41:21 43039 FATAL Use --force to remove the node from the monitor

          $ pg_autoctl disable monitor --pgdata node3 --force
          12:41:32 43219 INFO  Removing node 3 "node3" (localhost:5503) from monitor

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000760 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000760 |    read-only |           secondary |           secondary

   pg_autoctl get
       pg_autoctl get - Get a pg_auto_failover node, or formation setting

   pg_autoctl get formation settings
       pg_autoctl get formation settings - get replication settings  for  a  formation  from  the
       monitor

   Synopsis
       This command prints a pg_autoctl replication settings:

          usage: pg_autoctl get formation settings  [ --pgdata ] [ --json ] [ --formation ]

          --pgdata      path to data directory
          --json        output data in the JSON format
          --formation   pg_auto_failover formation

   Description
       See also pg_autoctl show settings which is a synonym.

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output JSON formatted data.

       --formation
              Show replication settings for given formation. Defaults to default.

   Examples
          $ pg_autoctl get formation settings
            Context |    Name |                   Setting | Value
          ----------+---------+---------------------------+-------------------------------------------------------------
          formation | default |      number_sync_standbys | 1
            primary |   node1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
               node |   node1 |        candidate priority | 50
               node |   node2 |        candidate priority | 50
               node |   node3 |        candidate priority | 50
               node |   node1 |        replication quorum | true
               node |   node2 |        replication quorum | true
               node |   node3 |        replication quorum | true

          $ pg_autoctl get formation settings --json
          {
              "nodes": [
                  {
                      "value": "true",
                      "context": "node",
                      "node_id": 1,
                      "setting": "replication quorum",
                      "group_id": 0,
                      "nodename": "node1"
                  },
                  {
                      "value": "true",
                      "context": "node",
                      "node_id": 2,
                      "setting": "replication quorum",
                      "group_id": 0,
                      "nodename": "node2"
                  },
                  {
                      "value": "true",
                      "context": "node",
                      "node_id": 3,
                      "setting": "replication quorum",
                      "group_id": 0,
                      "nodename": "node3"
                  },
                  {
                      "value": "50",
                      "context": "node",
                      "node_id": 1,
                      "setting": "candidate priority",
                      "group_id": 0,
                      "nodename": "node1"
                  },
                  {
                      "value": "50",
                      "context": "node",
                      "node_id": 2,
                      "setting": "candidate priority",
                      "group_id": 0,
                      "nodename": "node2"
                  },
                  {
                      "value": "50",
                      "context": "node",
                      "node_id": 3,
                      "setting": "candidate priority",
                      "group_id": 0,
                      "nodename": "node3"
                  }
              ],
              "primary": [
                  {
                      "value": "'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'",
                      "context": "primary",
                      "node_id": 1,
                      "setting": "synchronous_standby_names",
                      "group_id": 0,
                      "nodename": "node1"
                  }
              ],
              "formation": {
                  "value": "1",
                  "context": "formation",
                  "node_id": null,
                  "setting": "number_sync_standbys",
                  "group_id": null,
                  "nodename": "default"
              }
          }

   pg_autoctl get formation number-sync-standbys
       pg_autoctl  get  formation number-sync-standbys - get number_sync_standbys for a formation
       from the monitor

   Synopsis
       This command prints a pg_autoctl replication settings for number sync standbys:

          usage: pg_autoctl get formation number-sync-standbys  [ --pgdata ] [ --json ] [ --formation ]

          --pgdata      path to data directory
          --json        output data in the JSON format
          --formation   pg_auto_failover formation

   Description
       See also pg_autoctl show settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output JSON formatted data.

       --formation
              Show replication settings for given formation. Defaults to default.

   Examples
          $ pg_autoctl get formation number-sync-standbys
          1

          $ pg_autoctl get formation number-sync-standbys --json
          {
              "number-sync-standbys": 1
          }

   pg_autoctl get node replication-quorum
       pg_autoctl get replication-quorum - get replication-quorum property from the monitor

   Synopsis
       This command prints pg_autoctl replication quorum for a given node:

          usage: pg_autoctl get node replication-quorum  [ --pgdata ] [ --json ] [ --formation ] [ --name ]

          --pgdata      path to data directory
          --formation   pg_auto_failover formation
          --name        pg_auto_failover node name
          --json        output data in the JSON format

   Description
       See also pg_autoctl show settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output JSON formatted data.

       --formation
              Show replication settings for given formation. Defaults to default.

       --name Show replication settings for given node, selected by name.

   Examples
          $ pg_autoctl get node replication-quorum --name node1
          true

          $ pg_autoctl get node replication-quorum --name node1 --json
          {
              "name": "node1",
              "replication-quorum": true
          }

   pg_autoctl get node candidate-priority
       pg_autoctl get candidate-priority - get candidate-priority property from the monitor

   Synopsis
       This command prints pg_autoctl candidate priority for a given node:

          usage: pg_autoctl get node candidate-priority  [ --pgdata ] [ --json ] [ --formation ] [ --name ]

          --pgdata      path to data directory
          --formation   pg_auto_failover formation
          --name        pg_auto_failover node name
          --json        output data in the JSON format

   Description
       See also pg_autoctl show settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output JSON formatted data.

       --formation
              Show replication settings for given formation. Defaults to default.

       --name Show replication settings for given node, selected by name.

   Examples
          $ pg_autoctl get node candidate-priority --name node1
          50

          $ pg_autoctl get node candidate-priority --name node1 --json
          {
              "name": "node1",
              "candidate-priority": 50
          }

   pg_autoctl set
       pg_autoctl set - Set a pg_auto_failover node, or formation setting

   pg_autoctl set formation number-sync-standbys
       pg_autoctl set formation number-sync-standbys - set number_sync_standbys for  a  formation
       from the monitor

   Synopsis
       This command set a pg_autoctl replication settings for number sync standbys:

          usage: pg_autoctl set formation number-sync-standbys  [ --pgdata ] [ --json ] [ --formation ] <number_sync_standbys>

          --pgdata      path to data directory
          --formation   pg_auto_failover formation
          --json        output data in the JSON format

   Description
       The  pg_auto_failover  monitor  ensures  that  at  least  N+1  candidate standby nodes are
       registered when number-sync-standbys is N. This means that to be able to run the following
       command, at least 3 standby nodes with a non-zero candidate priority must be registered to
       the monitor:

          $ pg_autoctl set formation number-sync-standbys 2

       See also pg_autoctl show settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output JSON formatted data.

       --formation
              Show replication settings for given formation. Defaults to default.

   pg_autoctl set node replication-quorum
       pg_autoctl set replication-quorum - set replication-quorum property from the monitor

   Synopsis
       This command sets pg_autoctl replication quorum for a given node:

          usage: pg_autoctl set node replication-quorum  [ --pgdata ] [ --json ] [ --formation ] [ --name ] <true|false>

          --pgdata      path to data directory
          --formation   pg_auto_failover formation
          --name        pg_auto_failover node name
          --json        output data in the JSON format

   Description
       See also pg_autoctl show settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output JSON formatted data.

       --formation
              Show replication settings for given formation. Defaults to default.

       --name Show replication settings for given node, selected by name.

   Examples
          $ pg_autoctl set node replication-quorum --name node1 false
          12:49:37 94092 INFO  Waiting for the settings to have been applied to the monitor and primary node
          12:49:37 94092 INFO  New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
          12:49:37 94092 INFO  Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
          12:49:37 94092 INFO  New state is reported by node 1 "node1" (localhost:5501): "primary"
          false

          $ pg_autoctl set node replication-quorum --name node1 true --json
          12:49:42 94199 INFO  Waiting for the settings to have been applied to the monitor and primary node
          12:49:42 94199 INFO  New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
          12:49:42 94199 INFO  Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
          12:49:43 94199 INFO  New state is reported by node 1 "node1" (localhost:5501): "primary"
          {
              "replication-quorum": true
          }

   pg_autoctl set node candidate-priority
       pg_autoctl set candidate-priority - set candidate-priority property from the monitor

   Synopsis
       This command sets the pg_autoctl candidate priority for a given node:

          usage: pg_autoctl set node candidate-priority  [ --pgdata ] [ --json ] [ --formation ] [ --name ] <priority: 0..100>

          --pgdata      path to data directory
          --formation   pg_auto_failover formation
          --name        pg_auto_failover node name
          --json        output data in the JSON format

   Description
       See also pg_autoctl show settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output JSON formatted data.

       --formation
              Show replication settings for given formation. Defaults to default.

       --name Show replication settings for given node, selected by name.

   Examples
          $ pg_autoctl set node candidate-priority --name node1 65
          12:47:59 92326 INFO  Waiting for the settings to have been applied to the monitor and primary node
          12:47:59 92326 INFO  New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
          12:47:59 92326 INFO  Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
          12:47:59 92326 INFO  New state is reported by node 1 "node1" (localhost:5501): "primary"
          65

          $ pg_autoctl set node candidate-priority --name node1 50 --json
          12:48:05 92450 INFO  Waiting for the settings to have been applied to the monitor and primary node
          12:48:05 92450 INFO  New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
          12:48:05 92450 INFO  Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
          12:48:05 92450 INFO  New state is reported by node 1 "node1" (localhost:5501): "primary"
          {
              "candidate-priority": 50
          }

   pg_autoctl perform
       pg_autoctl perform - Perform an action orchestrated by the monitor

   pg_autoctl perform failover
       pg_autoctl perform failover - Perform a failover for given formation and group

   Synopsis
       This command starts a Postgres failover orchestration from the pg_auto_failover monitor:

          usage: pg_autoctl perform failover  [ --pgdata --formation --group ]

          --pgdata      path to data directory
          --formation   formation to target, defaults to 'default'
          --group       group to target, defaults to 0
          --wait        how many seconds to wait, default to 60

   Description
       The pg_auto_failover monitor can be used to orchestrate a manual failover, sometimes  also
       known  as  a  switchover.  When doing so, split-brain are prevented thanks to intermediary
       states being used in the Finite State Machine.

       The pg_autoctl perform failover command waits until the failover is known complete on  the
       monitor, or until the hard-coded 60s timeout has passed.

       The  failover  orchestration  is  done  in  the  background by the monitor, so even if the
       pg_autoctl perform failover stops on the timeout, the failover orchestration continues  at
       the monitor.

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --formation
              Formation to target for the operation. Defaults to default.

       --group
              Postgres  group  to  target for the operation. Defaults to 0, only Citus formations
              may have more than one group.

       --wait How many seconds to wait for notifications about the promotion. The  command  stops
              when  the  promotion  is  finished  (a  node  is  primary), or when the timeout has
              elapsed, whichever comes first. The value 0 (zero) disables the timeout and  allows
              the command to wait forever.

   Examples
          $ pg_autoctl perform failover
          12:57:30 3635 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          12:57:30 3635 INFO  Following table displays times when notifications are received
              Time |  Name |  Node |      Host:Port |       Current State |      Assigned State
          ---------+-------+-------+----------------+---------------------+--------------------
          12:57:30 | node1 |     1 | localhost:5501 |             primary |            draining
          12:57:30 | node1 |     1 | localhost:5501 |            draining |            draining
          12:57:30 | node2 |     2 | localhost:5502 |           secondary |          report_lsn
          12:57:30 | node3 |     3 | localhost:5503 |           secondary |          report_lsn
          12:57:36 | node3 |     3 | localhost:5503 |          report_lsn |          report_lsn
          12:57:36 | node2 |     2 | localhost:5502 |          report_lsn |          report_lsn
          12:57:36 | node2 |     2 | localhost:5502 |          report_lsn |   prepare_promotion
          12:57:36 | node2 |     2 | localhost:5502 |   prepare_promotion |   prepare_promotion
          12:57:36 | node2 |     2 | localhost:5502 |   prepare_promotion |    stop_replication
          12:57:36 | node1 |     1 | localhost:5501 |            draining |      demote_timeout
          12:57:36 | node3 |     3 | localhost:5503 |          report_lsn |      join_secondary
          12:57:36 | node1 |     1 | localhost:5501 |      demote_timeout |      demote_timeout
          12:57:36 | node3 |     3 | localhost:5503 |      join_secondary |      join_secondary
          12:57:37 | node2 |     2 | localhost:5502 |    stop_replication |    stop_replication
          12:57:37 | node2 |     2 | localhost:5502 |    stop_replication |        wait_primary
          12:57:37 | node1 |     1 | localhost:5501 |      demote_timeout |             demoted
          12:57:37 | node1 |     1 | localhost:5501 |             demoted |             demoted
          12:57:37 | node2 |     2 | localhost:5502 |        wait_primary |        wait_primary
          12:57:37 | node3 |     3 | localhost:5503 |      join_secondary |           secondary
          12:57:37 | node1 |     1 | localhost:5501 |             demoted |          catchingup
          12:57:38 | node3 |     3 | localhost:5503 |           secondary |           secondary
          12:57:38 | node2 |     2 | localhost:5502 |        wait_primary |             primary
          12:57:38 | node1 |     1 | localhost:5501 |          catchingup |          catchingup
          12:57:38 | node2 |     2 | localhost:5502 |             primary |             primary

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000F50 |    read-only |           secondary |           secondary
          node2 |     2 | localhost:5502 | 0/4000F50 |   read-write |             primary |             primary
          node3 |     3 | localhost:5503 | 0/4000F50 |    read-only |           secondary |           secondary

   pg_autoctl perform switchover
       pg_autoctl perform switchover - Perform a switchover for given formation and group

   Synopsis
       This  command  starts  a  Postgres  switchover  orchestration  from the pg_auto_switchover
       monitor:

          usage: pg_autoctl perform switchover  [ --pgdata --formation --group ]

          --pgdata      path to data directory
          --formation   formation to target, defaults to 'default'
          --group       group to target, defaults to 0

   Description
       The pg_auto_switchover monitor can be used to orchestrate a manual  switchover,  sometimes
       also  known  as  a  switchover.  When  doing  so,  split-brain  are  prevented  thanks  to
       intermediary states being used in the Finite State Machine.

       The pg_autoctl perform switchover command waits until the switchover is known complete  on
       the monitor, or until the hard-coded 60s timeout has passed.

       The  switchover  orchestration  is  done  in the background by the monitor, so even if the
       pg_autoctl perform switchover stops on the timeout, the switchover orchestration continues
       at the monitor.

       See also pg_autoctl perform failover, a synonym for this command.

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --formation
              Formation to target for the operation. Defaults to default.

       --group
              Postgres  group  to  target for the operation. Defaults to 0, only Citus formations
              may have more than one group.

   pg_autoctl perform promotion
       pg_autoctl perform promotion - Perform a failover that promotes a target node

   Synopsis
       This command starts a Postgres failover orchestration from the  pg_auto_promotion  monitor
       and targets given node:

          usage: pg_autoctl perform promotion  [ --pgdata --formation --group ]

          --pgdata      path to data directory
          --formation   formation to target, defaults to 'default'
          --name        node name to target, defaults to current node
          --wait        how many seconds to wait, default to 60

   Description
       The  pg_auto_promotion  monitor  can  be used to orchestrate a manual promotion, sometimes
       also  known  as  a  switchover.  When  doing  so,  split-brain  are  prevented  thanks  to
       intermediary states being used in the Finite State Machine.

       The  pg_autoctl  perform  promotion command waits until the promotion is known complete on
       the monitor, or until the hard-coded 60s timeout has passed.

       The promotion orchestration is done in the background by  the  monitor,  so  even  if  the
       pg_autoctl  perform  promotion stops on the timeout, the promotion orchestration continues
       at the monitor.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --formation
              Formation to target for the operation. Defaults to default.

       --name Name of the node that should be elected as the new primary node.

       --wait How many seconds to wait for notifications about the promotion. The  command  stops
              when  the  promotion  is  finished  (a  node  is  primary), or when the timeout has
              elapsed, whichever comes first. The value 0 (zero) disables the timeout and  allows
              the command to wait forever.

   Examples
          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000F88 |    read-only |           secondary |           secondary
          node2 |     2 | localhost:5502 | 0/4000F88 |   read-write |             primary |             primary
          node3 |     3 | localhost:5503 | 0/4000F88 |    read-only |           secondary |           secondary

          $ pg_autoctl perform promotion --name node1
          13:08:13 15297 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          13:08:13 15297 INFO  Following table displays times when notifications are received
              Time |  Name |  Node |      Host:Port |       Current State |      Assigned State
          ---------+-------+-------+----------------+---------------------+--------------------
          13:08:13 | node1 |   0/1 | localhost:5501 |           secondary |           secondary
          13:08:13 | node2 |   0/2 | localhost:5502 |             primary |            draining
          13:08:13 | node2 |   0/2 | localhost:5502 |            draining |            draining
          13:08:13 | node1 |   0/1 | localhost:5501 |           secondary |          report_lsn
          13:08:13 | node3 |   0/3 | localhost:5503 |           secondary |          report_lsn
          13:08:19 | node3 |   0/3 | localhost:5503 |          report_lsn |          report_lsn
          13:08:19 | node1 |   0/1 | localhost:5501 |          report_lsn |          report_lsn
          13:08:19 | node1 |   0/1 | localhost:5501 |          report_lsn |   prepare_promotion
          13:08:19 | node1 |   0/1 | localhost:5501 |   prepare_promotion |   prepare_promotion
          13:08:19 | node1 |   0/1 | localhost:5501 |   prepare_promotion |    stop_replication
          13:08:19 | node2 |   0/2 | localhost:5502 |            draining |      demote_timeout
          13:08:19 | node3 |   0/3 | localhost:5503 |          report_lsn |      join_secondary
          13:08:19 | node2 |   0/2 | localhost:5502 |      demote_timeout |      demote_timeout
          13:08:19 | node3 |   0/3 | localhost:5503 |      join_secondary |      join_secondary
          13:08:20 | node1 |   0/1 | localhost:5501 |    stop_replication |    stop_replication
          13:08:20 | node1 |   0/1 | localhost:5501 |    stop_replication |        wait_primary
          13:08:20 | node2 |   0/2 | localhost:5502 |      demote_timeout |             demoted
          13:08:20 | node1 |   0/1 | localhost:5501 |        wait_primary |        wait_primary
          13:08:20 | node3 |   0/3 | localhost:5503 |      join_secondary |           secondary
          13:08:20 | node2 |   0/2 | localhost:5502 |             demoted |             demoted
          13:08:20 | node2 |   0/2 | localhost:5502 |             demoted |          catchingup
          13:08:21 | node3 |   0/3 | localhost:5503 |           secondary |           secondary
          13:08:21 | node1 |   0/1 | localhost:5501 |        wait_primary |             primary
          13:08:21 | node2 |   0/2 | localhost:5502 |          catchingup |          catchingup
          13:08:21 | node1 |   0/1 | localhost:5501 |             primary |             primary

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/40012F0 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/40012F0 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/40012F0 |    read-only |           secondary |           secondary

   pg_autoctl do
       pg_autoctl do - Internal commands and internal QA tooling

       The  debug  commands  for  pg_autoctl  are  only  available  when the environment variable
       PG_AUTOCTL_DEBUG is set (to any value).

       When testing pg_auto_failover, it is helpful to be able to play with the local nodes using
       the same lower-level API as used by the pg_auto_failover Finite State Machine transitions.
       Some commands could be useful in contexts other than pg_auto_failover development  and  QA
       work, so some documentation has been made available.

   pg_autoctl do tmux
       pg_autoctl do tmux - Set of facilities to handle tmux interactive sessions

   Synopsis
       pg_autoctl do tmux provides the following commands:

          pg_autoctl do tmux
           script   Produce a tmux script for a demo or a test case (debug only)
           session  Run a tmux session for a demo or a test case
           stop     Stop pg_autoctl processes that belong to a tmux session
           wait     Wait until a given node has been registered on the monitor
           clean    Clean-up a tmux session processes and root dir

   Description
       An  easy way to get started with pg_auto_failover in a localhost only formation with three
       nodes is to run the following command:

          $ PG_AUTOCTL_DEBUG=1 pg_autoctl do tmux session \
               --root /tmp/pgaf \
                   --first-pgport 9000 \
                   --nodes 4 \
                   --layout tiled

       This requires the command tmux to be available  in  your  PATH.  The  pg_autoctl  do  tmux
       session commands prepares a self-contained root directory where to create pg_auto_failover
       nodes and their configuration, then prepares a tmux script, and then runs the script  with
       a command such as:

          /usr/local/bin/tmux -v start-server ; source-file /tmp/pgaf/script-9000.tmux

       The tmux session contains a single tmux window multiple panes:

          • one pane for the monitor

          • one pane per Postgres nodes, here 4 of them

          • one pane for running watch pg_autoctl show state

          • one extra pane for an interactive shell.

       Usually  the  first  two  commands  to run in the interactive shell, once the formation is
       stable (one node is primary, the other ones are all secondary), are the following:

          $ pg_autoctl get formation settings
          $ pg_autoctl perform failover

   pg_autoctl do demo
       pg_autoctl do demo - Use a demo application for pg_auto_failover

   Synopsis
       pg_autoctl do demo provides the following commands:

          pg_autoctl do demo
           run      Run the pg_auto_failover demo application
           uri      Grab the application connection string from the monitor
           ping     Attempt to connect to the application URI
           summary  Display a summary of the previous demo app run

       To run a demo, use pg_autoctl do demo run:

          usage: pg_autoctl do demo run [option ...]

          --monitor        Postgres URI of the pg_auto_failover monitor
          --formation      Formation to use (default)
          --group          Group Id to failover (0)
          --username       PostgreSQL's username
          --clients        How many client processes to use (1)
          --duration       Duration of the demo app, in seconds (30)
          --first-failover Timing of the first failover (10)
          --failover-freq  Seconds between subsequent failovers (45)

   Description
       The pg_autoctl debug tooling includes a demo application.

       The demo prepare its Postgres schema on the  target  database,  and  then  starts  several
       clients (see --clients) that concurrently connect to the target application URI and record
       the time it took to establish the Postgres connection to the current read-write node, with
       information about the retry policy metrics.

   Example
          $ pg_autoctl do demo run --monitor 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer' --clients 10
          14:43:35 19660 INFO  Using application connection string "postgres://localhost:5502,localhost:5503,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer"
          14:43:35 19660 INFO  Using Postgres user PGUSER "dim"
          14:43:35 19660 INFO  Preparing demo schema: drop schema if exists demo cascade
          14:43:35 19660 WARN  NOTICE:  schema "demo" does not exist, skipping
          14:43:35 19660 INFO  Preparing demo schema: create schema demo
          14:43:35 19660 INFO  Preparing demo schema: create table demo.tracking(ts timestamptz default now(), client integer, loop integer, retries integer, us bigint, recovery bool)
          14:43:36 19660 INFO  Preparing demo schema: create table demo.client(client integer, pid integer, retry_sleep_ms integer, retry_cap_ms integer, failover_count integer)
          14:43:36 19660 INFO  Starting 10 concurrent clients as sub-processes
          14:43:36 19675 INFO  Failover client is started, will failover in 10s and every 45s after that
          ...

          $ pg_autoctl do demo summary --monitor 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer' --clients 10
          14:44:27 22789 INFO  Using application connection string "postgres://localhost:5503,localhost:5501,localhost:5502/demo?target_session_attrs=read-write&sslmode=prefer"
          14:44:27 22789 INFO  Using Postgres user PGUSER "dim"
          14:44:27 22789 INFO  Summary for the demo app running with 10 clients for 30s
                  Client        | Connections | Retries | Min Connect Time (ms) |   max    |   p95   |   p99
          ----------------------+-------------+---------+-----------------------+----------+---------+---------
           Client 1             |         136 |      14 |                58.318 | 2601.165 | 244.443 | 261.809
           Client 2             |         136 |       5 |                55.199 | 2514.968 | 242.362 | 259.282
           Client 3             |         134 |       6 |                55.815 | 2974.247 | 241.740 | 262.908
           Client 4             |         135 |       7 |                56.542 | 2970.922 | 238.995 | 251.177
           Client 5             |         136 |       8 |                58.339 | 2758.106 | 238.720 | 252.439
           Client 6             |         134 |       9 |                58.679 | 2813.653 | 244.696 | 254.674
           Client 7             |         134 |      11 |                58.737 | 2795.974 | 243.202 | 253.745
           Client 8             |         136 |      12 |                52.109 | 2354.952 | 242.664 | 254.233
           Client 9             |         137 |      19 |                59.735 | 2628.496 | 235.668 | 253.582
           Client 10            |         133 |       6 |                57.994 | 3060.489 | 242.156 | 256.085
           All Clients Combined |        1351 |      97 |                52.109 | 3060.489 | 241.848 | 258.450
          (11 rows)

           Min Connect Time (ms) |   max    | freq |                      bar
          -----------------------+----------+------+-----------------------------------------------
                          52.109 |  219.105 | 1093 | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
                         219.515 |  267.168 |  248 | ▒▒▒▒▒▒▒▒▒▒
                        2354.952 | 2354.952 |    1 |
                        2514.968 | 2514.968 |    1 |
                        2601.165 | 2628.496 |    2 |
                        2758.106 | 2813.653 |    3 |
                        2970.922 | 2974.247 |    2 |
                        3060.489 | 3060.489 |    1 |
          (8 rows)

   pg_autoctl do service restart
       pg_autoctl do service restart - Run pg_autoctl sub-processes (services)

   Synopsis
       pg_autoctl do service restart provides the following commands:

          pg_autoctl do service restart
           postgres     Restart the pg_autoctl postgres controller service
           listener     Restart the pg_autoctl monitor listener service
           node-active  Restart the pg_autoctl keeper node-active service

   Description
       It  is  possible  to  restart the pg_autoctl or the Postgres service without affecting the
       other running service. Typically,  to  restart  the  pg_autoctl  parts  without  impacting
       Postgres:

          $ pg_autoctl do service restart node-active --pgdata node1
          14:52:06 31223 INFO  Sending the TERM signal to service "node-active" with pid 26626
          14:52:06 31223 INFO  Service "node-active" has been restarted with pid 31230
          31230

       The Postgres service has not been impacted by the restart of the pg_autoctl process.

   pg_autoctl do show
       pg_autoctl do show - Show some debug level information

   Synopsis
       The  commands pg_autoctl create monitor and pg_autoctl create postgres both implement some
       level of automated detection of the node network settings when the  option  --hostname  is
       not used.

       Adding  to  those commands, when a new node is registered to the monitor, other nodes also
       edit their Postgres HBA rules to  allow  the  new  node  to  connect,  unless  the  option
       --skip-pg-hba has been used.

       The  debug  sub-commands  for pg_autoctl do show can be used to see in details the network
       discovery done by pg_autoctl.

       pg_autoctl do show provides the following commands:

          pg_autoctl do show
            ipaddr    Print this node's IP address information
            cidr      Print this node's CIDR information
            lookup    Print this node's DNS lookup information
            hostname  Print this node's default hostname
            reverse   Lookup given hostname and check reverse DNS setup

   pg_autoctl do show ipaddr
       Connects to an external IP address and uses getsockname(2) to retrieve the current address
       to which the socket is bound.

       The  external  IP  address defaults to 8.8.8.8, the IP address of a Google provided public
       DNS server, or to the monitor IP address or hostname in the context of  pg_autoctl  create
       postgres.

          $ pg_autoctl do show ipaddr
          16:42:40 62631 INFO  ipaddr.c:107: Connecting to 8.8.8.8 (port 53)
          192.168.1.156

   pg_autoctl do show cidr
       Connects  to  an external IP address in the same way as the previous command pg_autoctl do
       show ipaddr and then matches the  local  socket  name  with  the  list  of  local  network
       interfaces.  When  a match is found, uses the netmask of the interface to compute the CIDR
       notation from the IP address.

       The computed CIDR notation is then used in HBA rules.

          $ pg_autoctl do show cidr
          16:43:19 63319 INFO  Connecting to 8.8.8.8 (port 53)
          192.168.1.0/24

   pg_autoctl do show hostname
       Uses either its first (and only) argument or the result of gethostname(2) as the candidate
       hostname  to  use in HBA rules, and then check that the hostname resolves to an IP address
       that belongs to one of the machine network interfaces.

       When the hostname forward-dns lookup resolves to an IP address that is local to  the  node
       where  the  command is run, then a reverse-lookup from the IP address is made to see if it
       matches with the candidate hostname.

          $ pg_autoctl do show hostname
          DESKTOP-IC01GOOS.europe.corp.microsoft.com

          $ pg_autoctl -vv do show hostname 'postgres://autoctl_node@localhost:5500/pg_auto_failover'
          13:45:00 93122 INFO  cli_do_show.c:256: Using monitor hostname "localhost" and port 5500
          13:45:00 93122 INFO  ipaddr.c:107: Connecting to ::1 (port 5500)
          13:45:00 93122 DEBUG cli_do_show.c:272: cli_show_hostname: ip ::1
          13:45:00 93122 DEBUG cli_do_show.c:283: cli_show_hostname: host localhost
          13:45:00 93122 DEBUG cli_do_show.c:294: cli_show_hostname: ip ::1
          localhost

   pg_autoctl do show lookup
       Checks that the given argument is an hostname that resolves to a local IP address, that is
       an IP address associated with a local network interface.

          $ pg_autoctl do show lookup DESKTOP-IC01GOOS.europe.corp.microsoft.com
          DESKTOP-IC01GOOS.europe.corp.microsoft.com: 192.168.1.156

   pg_autoctl do show reverse
       Implements  the  same  DNS  checks as Postgres HBA matching code: first does a forward DNS
       lookup of the given hostname,  and  then  a  reverse-lookup  from  all  the  IP  addresses
       obtained. Success is reached when at least one of the IP addresses from the forward lookup
       resolves back to the given hostname (as the first answer to the reverse DNS lookup).

          $ pg_autoctl do show reverse DESKTOP-IC01GOOS.europe.corp.microsoft.com
          16:44:49 64910 FATAL Failed to find an IP address for hostname "DESKTOP-IC01GOOS.europe.corp.microsoft.com" that matches hostname again in a reverse-DNS lookup.
          16:44:49 64910 INFO  Continuing with IP address "192.168.1.156"

          $ pg_autoctl -vv do show reverse DESKTOP-IC01GOOS.europe.corp.microsoft.com
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 192.168.1.156
          16:44:45 64832 DEBUG ipaddr.c:733: reverse lookup for "192.168.1.156" gives "desktop-ic01goos.europe.corp.microsoft.com" first
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 192.168.1.156
          16:44:45 64832 DEBUG ipaddr.c:733: reverse lookup for "192.168.1.156" gives "desktop-ic01goos.europe.corp.microsoft.com" first
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 2a01:110:10:40c::2ad
          16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 2a01:110:10:40c::2ad
          16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 100.64.34.213
          16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 100.64.34.213
          16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
          16:44:45 64832 FATAL cli_do_show.c:333: Failed to find an IP address for hostname "DESKTOP-IC01GOOS.europe.corp.microsoft.com" that matches hostname again in a reverse-DNS lookup.
          16:44:45 64832 INFO  cli_do_show.c:334: Continuing with IP address "192.168.1.156"

   pg_autoctl do pgsetup
       pg_autoctl do pgsetup - Manage a local Postgres setup

   Synopsis
       The main pg_autoctl commands implement low-level management tooling for a  local  Postgres
       instance.  Some  of  the low-level Postgres commands can be used as their own tool in some
       cases.

       pg_autoctl do pgsetup provides the following commands:

          pg_autoctl do pgsetup
            pg_ctl    Find a non-ambiguous pg_ctl program and Postgres version
            discover  Discover local PostgreSQL instance, if any
            ready     Return true is the local Postgres server is ready
            wait      Wait until the local Postgres server is ready
            logs      Outputs the Postgres startup logs
            tune      Compute and log some Postgres tuning options

   pg_autoctl do pgsetup pg_ctl
       In a similar way to which -a, this commands scans your PATH for pg_ctl commands.  Then  it
       runs  the  pg_ctl  --version  command  and  parses  the output to determine the version of
       Postgres that is available in the path.

          $ pg_autoctl do pgsetup pg_ctl --pgdata node1
          16:49:18 69684 INFO  Environment variable PG_CONFIG is set to "/Applications/Postgres.app//Contents/Versions/12/bin/pg_config"
          16:49:18 69684 INFO  `pg_autoctl create postgres` would use "/Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl" for Postgres 12.3
          16:49:18 69684 INFO  `pg_autoctl create monitor` would use "/Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl" for Postgres 12.3

   pg_autoctl do pgsetup discover
       Given a PGDATA or --pgdata option, the command discovers if  a  running  Postgres  service
       matches  the  pg_autoctl setup, and prints the information that pg_autoctl typically needs
       when managing a Postgres instance.

          $ pg_autoctl do pgsetup discover --pgdata node1
          pgdata:                /Users/dim/dev/MS/pg_auto_failover/tmux/node1
          pg_ctl:                /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
          pg_version:            12.3
          pghost:                /tmp
          pgport:                5501
          proxyport:             0
          pid:                   21029
          is in recovery:        no
          Control Version:       1201
          Catalog Version:       201909212
          System Identifier:     6942422768095393833
          Latest checkpoint LSN: 0/4059C18
          Postmaster status:     ready

   pg_autoctl do pgsetup ready
       Similar to the pg_isready command, though uses the Postgres specifications  found  in  the
       pg_autoctl node setup.

          $ pg_autoctl do pgsetup ready --pgdata node1
          16:50:08 70582 INFO  Postgres status is: "ready"

   pg_autoctl do pgsetup wait
       When  pg_autoctl  do  pgsetup  ready would return false because Postgres is not ready yet,
       this command continues probing every second for 30 seconds, and exists as soon as Postgres
       is ready.

          $ pg_autoctl do pgsetup wait --pgdata node1
          16:50:22 70829 INFO  Postgres is now serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 21029
          16:50:22 70829 INFO  Postgres status is: "ready"

   pg_autoctl do pgsetup logs
       Outputs the Postgres logs from the most recent log file in the PGDATA/log directory.

          $ pg_autoctl do pgsetup logs --pgdata node1
          16:50:39 71126 WARN  Postgres logs from "/Users/dim/dev/MS/pg_auto_failover/tmux/node1/startup.log":
          16:50:39 71126 INFO  2021-03-22 14:43:48.911 CET [21029] LOG:  starting PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
          16:50:39 71126 INFO  2021-03-22 14:43:48.913 CET [21029] LOG:  listening on IPv6 address "::", port 5501
          16:50:39 71126 INFO  2021-03-22 14:43:48.913 CET [21029] LOG:  listening on IPv4 address "0.0.0.0", port 5501
          16:50:39 71126 INFO  2021-03-22 14:43:48.913 CET [21029] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5501"
          16:50:39 71126 INFO  2021-03-22 14:43:48.931 CET [21029] LOG:  redirecting log output to logging collector process
          16:50:39 71126 INFO  2021-03-22 14:43:48.931 CET [21029] HINT:  Future log output will appear in directory "log".
          16:50:39 71126 WARN  Postgres logs from "/Users/dim/dev/MS/pg_auto_failover/tmux/node1/log/postgresql-2021-03-22_144348.log":
          16:50:39 71126 INFO  2021-03-22 14:43:48.937 CET [21033] LOG:  database system was shut down at 2021-03-22 14:43:46 CET
          16:50:39 71126 INFO  2021-03-22 14:43:48.937 CET [21033] LOG:  entering standby mode
          16:50:39 71126 INFO  2021-03-22 14:43:48.942 CET [21033] LOG:  consistent recovery state reached at 0/4022E88
          16:50:39 71126 INFO  2021-03-22 14:43:48.942 CET [21033] LOG:  invalid record length at 0/4022E88: wanted 24, got 0
          16:50:39 71126 INFO  2021-03-22 14:43:48.946 CET [21029] LOG:  database system is ready to accept read only connections
          16:50:39 71126 INFO  2021-03-22 14:43:49.032 CET [21038] LOG:  fetching timeline history file for timeline 4 from primary server
          16:50:39 71126 INFO  2021-03-22 14:43:49.037 CET [21038] LOG:  started streaming WAL from primary at 0/4000000 on timeline 3
          16:50:39 71126 INFO  2021-03-22 14:43:49.046 CET [21038] LOG:  replication terminated by primary server
          16:50:39 71126 INFO  2021-03-22 14:43:49.046 CET [21038] DETAIL:  End of WAL reached on timeline 3 at 0/4022E88.
          16:50:39 71126 INFO  2021-03-22 14:43:49.047 CET [21033] LOG:  new target timeline is 4
          16:50:39 71126 INFO  2021-03-22 14:43:49.049 CET [21038] LOG:  restarted WAL streaming at 0/4000000 on timeline 4
          16:50:39 71126 INFO  2021-03-22 14:43:49.210 CET [21033] LOG:  redo starts at 0/4022E88
          16:50:39 71126 INFO  2021-03-22 14:52:06.692 CET [21029] LOG:  received SIGHUP, reloading configuration files
          16:50:39 71126 INFO  2021-03-22 14:52:06.906 CET [21029] LOG:  received SIGHUP, reloading configuration files
          16:50:39 71126 FATAL 2021-03-22 15:34:24.920 CET [21038] FATAL:  terminating walreceiver due to timeout
          16:50:39 71126 INFO  2021-03-22 15:34:24.973 CET [21033] LOG:  invalid record length at 0/4059CC8: wanted 24, got 0
          16:50:39 71126 INFO  2021-03-22 15:34:25.105 CET [35801] LOG:  started streaming WAL from primary at 0/4000000 on timeline 4
          16:50:39 71126 FATAL 2021-03-22 16:12:56.918 CET [35801] FATAL:  terminating walreceiver due to timeout
          16:50:39 71126 INFO  2021-03-22 16:12:57.086 CET [38741] LOG:  started streaming WAL from primary at 0/4000000 on timeline 4
          16:50:39 71126 FATAL 2021-03-22 16:23:39.349 CET [38741] FATAL:  terminating walreceiver due to timeout
          16:50:39 71126 INFO  2021-03-22 16:23:39.497 CET [41635] LOG:  started streaming WAL from primary at 0/4000000 on timeline 4

   pg_autoctl do pgsetup tune
       Outputs the pg_autoclt automated tuning options. Depending on the number of CPU and amount
       of RAM detected in the environment where it is run, pg_autoctl can adjust some very  basic
       Postgres tuning knobs to get started.

          $ pg_autoctl do pgsetup tune --pgdata node1 -vv
          13:25:25 77185 DEBUG pgtuning.c:85: Detected 12 CPUs and 16 GB total RAM on this server
          13:25:25 77185 DEBUG pgtuning.c:225: Setting autovacuum_max_workers to 3
          13:25:25 77185 DEBUG pgtuning.c:228: Setting shared_buffers to 4096 MB
          13:25:25 77185 DEBUG pgtuning.c:231: Setting work_mem to 24 MB
          13:25:25 77185 DEBUG pgtuning.c:235: Setting maintenance_work_mem to 512 MB
          13:25:25 77185 DEBUG pgtuning.c:239: Setting effective_cache_size to 12 GB
          # basic tuning computed by pg_auto_failover
          track_functions = pl
          shared_buffers = '4096 MB'
          work_mem = '24 MB'
          maintenance_work_mem = '512 MB'
          effective_cache_size = '12 GB'
          autovacuum_max_workers = 3
          autovacuum_vacuum_scale_factor = 0.08
          autovacuum_analyze_scale_factor = 0.02

       The  low-level  API  is  made available through the following pg_autoctl do commands, only
       available in debug environments:

          pg_autoctl do
          + monitor  Query a pg_auto_failover monitor
          + fsm      Manually manage the keeper's state
          + primary  Manage a PostgreSQL primary server
          + standby  Manage a PostgreSQL standby server
          + show     Show some debug level information
          + pgsetup  Manage a local Postgres setup
          + pgctl    Signal the pg_autoctl postgres service
          + service  Run pg_autoctl sub-processes (services)
          + tmux     Set of facilities to handle tmux interactive sessions
          + azure    Manage a set of Azure resources for a pg_auto_failover demo
          + demo     Use a demo application for pg_auto_failover

          pg_autoctl do monitor
          + get                 Get information from the monitor
            register            Register the current node with the monitor
            active              Call in the pg_auto_failover Node Active protocol
            version             Check that monitor version is 1.5.0.1; alter extension update if not
            parse-notification  parse a raw notification message

          pg_autoctl do monitor get
            primary      Get the primary node from pg_auto_failover in given formation/group
            others       Get the other nodes from the pg_auto_failover group of hostname/port
            coordinator  Get the coordinator node from the pg_auto_failover formation

          pg_autoctl do fsm
            init    Initialize the keeper's state on-disk
            state   Read the keeper's state from disk and display it
            list    List reachable FSM states from current state
            gv      Output the FSM as a .gv program suitable for graphviz/dot
            assign  Assign a new goal state to the keeper
            step    Make a state transition if instructed by the monitor
          + nodes   Manually manage the keeper's nodes list

          pg_autoctl do fsm nodes
            get  Get the list of nodes from file (see --disable-monitor)
            set  Set the list of nodes to file (see --disable-monitor)

          pg_autoctl do primary
          + slot      Manage replication slot on the primary server
          + adduser   Create users on primary
            defaults  Add default settings to postgresql.conf
            identify  Run the IDENTIFY_SYSTEM replication command on given host

          pg_autoctl do primary slot
            create  Create a replication slot on the primary server
            drop    Drop a replication slot on the primary server

          pg_autoctl do primary adduser
            monitor  add a local user for queries from the monitor
            replica  add a local user with replication privileges

          pg_autoctl do standby
            init     Initialize the standby server using pg_basebackup
            rewind   Rewind a demoted primary server using pg_rewind
            promote  Promote a standby server to become writable

          pg_autoctl do show
            ipaddr    Print this node's IP address information
            cidr      Print this node's CIDR information
            lookup    Print this node's DNS lookup information
            hostname  Print this node's default hostname
            reverse   Lookup given hostname and check reverse DNS setup

          pg_autoctl do pgsetup
            pg_ctl    Find a non-ambiguous pg_ctl program and Postgres version
            discover  Discover local PostgreSQL instance, if any
            ready     Return true is the local Postgres server is ready
            wait      Wait until the local Postgres server is ready
            logs      Outputs the Postgres startup logs
            tune      Compute and log some Postgres tuning options

          pg_autoctl do pgctl
            on   Signal pg_autoctl postgres service to ensure Postgres is running
            off  Signal pg_autoctl postgres service to ensure Postgres is stopped

          pg_autoctl do service
          + getpid        Get the pid of pg_autoctl sub-processes (services)
          + restart       Restart pg_autoctl sub-processes (services)
            pgcontroller  pg_autoctl supervised postgres controller
            postgres      pg_autoctl service that start/stop postgres when asked
            listener      pg_autoctl service that listens to the monitor notifications
            node-active   pg_autoctl service that implements the node active protocol

          pg_autoctl do service getpid
            postgres     Get the pid of the pg_autoctl postgres controller service
            listener     Get the pid of the pg_autoctl monitor listener service
            node-active  Get the pid of the pg_autoctl keeper node-active service

          pg_autoctl do service restart
            postgres     Restart the pg_autoctl postgres controller service
            listener     Restart the pg_autoctl monitor listener service
            node-active  Restart the pg_autoctl keeper node-active service

          pg_autoctl do tmux
            script   Produce a tmux script for a demo or a test case (debug only)
            session  Run a tmux session for a demo or a test case
            stop     Stop pg_autoctl processes that belong to a tmux session
            wait     Wait until a given node has been registered on the monitor
            clean    Clean-up a tmux session processes and root dir

          pg_autoctl do azure
          + provision  provision azure resources for a pg_auto_failover demo
          + tmux       Run a tmux session with an Azure setup for QA/testing
          + show       show azure resources for a pg_auto_failover demo
            deploy     Deploy a pg_autoctl VMs, given by name
            create     Create an azure QA environment
            drop       Drop an azure QA environment: resource group, network, VMs
            ls         List resources in a given azure region
            ssh        Runs ssh -l ha-admin <public ip address> for a given VM name
            sync       Rsync pg_auto_failover sources on all the target region VMs

          pg_autoctl do azure provision
            region  Provision an azure region: resource group, network, VMs
            nodes   Provision our pre-created VM with pg_autoctl Postgres nodes

          pg_autoctl do azure tmux
            session  Create or attach a tmux session for the created Azure VMs
            kill     Kill an existing tmux session for Azure VMs

          pg_autoctl do azure show
            ips    Show public and private IP addresses for selected VMs
            state  Connect to the monitor node to show the current state

          pg_autoctl do demo
            run      Run the pg_auto_failover demo application
            uri      Grab the application connection string from the monitor
            ping     Attempt to connect to the application URI
            summary  Display a summary of the previous demo app run

   pg_autoctl run
       pg_autoctl run - Run the pg_autoctl service (monitor or keeper)

   Synopsis
       This commands starts the processes needed  to  run  a  monitor  node  or  a  keeper  node,
       depending  on  the  configuration  file  that  belongs  to  the  --pgdata option or PGDATA
       environment variable.

          usage: pg_autoctl run  [ --pgdata --name --hostname --pgport ]

          --pgdata      path to data directory
          --name        pg_auto_failover node name
          --hostname    hostname used to connect from other nodes
          --pgport      PostgreSQL's port number

   Description
       When registering Postgres nodes to  the  pg_auto_failover  monitor  using  the  pg_autoctl
       create  postgres  command, the nodes are registered with metadata: the node name, hostname
       and Postgres port.

       The node name is used mostly in the logs and pg_autoctl  show  state  commands  and  helps
       human administrators of the formation.

       The  node  hostname  and  pgport  are  used by other nodes, including the pg_auto_failover
       monitor, to open a Postgres connection.

       Both the node name and  the  node  hostname  and  port  can  be  changed  after  the  node
       registration  by  using  either this command (pg_autoctl run) or the pg_autoctl config set
       command.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --name Node name used on the monitor to refer to this node. The hostname  is  a  technical
              information,  and  given  Postgres requirements on the HBA setup and DNS resolution
              (both forward and reverse lookups), IP addresses are often used for the hostname.

              The --name option allows using a user-friendly name for your Postgres nodes.

       --hostname
              Hostname or IP address (both v4 and v6 are supported) to use from any other node to
              connect to this node.

              When not provided, a default value is computed by running the following algorithm.

                 1. We  get  this  machine's  "public  IP"  by  opening a connection to the given
                    monitor hostname or IP address. Then we get TCP/IP client  address  that  has
                    been used to make that connection.

                 2. We  then do a reverse DNS lookup on the IP address found in the previous step
                    to fetch a hostname for our local machine.

                 3. If the reverse DNS lookup is successful , then pg_autoctl does a forward  DNS
                    lookup of that hostname.

              When  the  forward  DNS lookup response in step 3. is an IP address found in one of
              our local network interfaces, then pg_autoctl uses the hostname found in step 2. as
              the default --hostname. Otherwise it uses the IP address found in step 1.

              You may use the --hostname command line option to bypass the whole DNS lookup based
              process and force the local node name to a fixed value.

       --pgport
              Postgres port to use, defaults to 5432.

   pg_autoctl watch
       pg_autoctl watch - Display an auto-updating dashboard

   Synopsis
       This command outputs the events that  the  pg_auto_failover  events  records  about  state
       changes of the pg_auto_failover nodes managed by the monitor:

          usage: pg_autoctl watch  [ --pgdata --formation --group ]

          --pgdata      path to data directory
          --monitor     show the monitor uri
          --formation   formation to query, defaults to 'default'
          --group       group to query formation, defaults to all
          --json        output data in the JSON format

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node username and
              target the pg_auto_failover database name. It is possible to show the Postgres  URI
              from the monitor node using the command pg_autoctl show uri.

       --formation
              List the events recorded for nodes in the given formation. Defaults to default.

       --group
              Limit  output  to  a single group in the formation. Default to including all groups
              registered in the target formation.

   Description
       The pg_autoctl watch output is divided in 3 sections.

       The first section is a single header  line  which  includes  the  name  of  the  currently
       selected  formation,  the  formation replication setting Number Sync Standbys, and then in
       the right most position the current time.

       The second section displays one line per node, and each line contains a  list  of  columns
       that  describe  the  current  state  for  the  node.  This list can includes the following
       columns, and which columns are part of the output depends on  the  terminal  window  size.
       This choice is dynamic and changes if your terminal window size changes:

          • Name
                Name of the node.

          • Node, or Id
                Node  information.  When the formation has a single group (group zero), then this
                column only contains the nodeId.

                Only Citus formations allow several groups. When using a Citus formation the Node
                column contains the groupId and the nodeId, separated by a colon, such as 0:1 for
                the first coordinator node.

          • Last Report, or Report
                Time interval between now and the last known time when a node has reported to the
                monitor, using the node_active protocol.

                This value is expected to stay under 2s or abouts, and is known to increment when
                either the pg_autoctl run service is not running, or  when  there  is  a  network
                split.

          • Last Check, or Check
                Time  inverval between now and the last known time when the monitor could connect
                to a node's Postgres instance, via its health check mechanism.

                This value is known to increment when either the Postgres service is not  running
                on the target node, when there is a network split, or when the internal machinery
                (the health check worker background process) implements jitter.

          • Host:Port
                Hostname and port number used to connect to the node.

          • TLI: LSN
                Timeline identifier (TLI) and Postgres Log Sequence Number (LSN).

                The LSN is the current position in the Postgres WAL stream. This is a hexadecimal
                number. See pg_lsn for more information.

                The  current  timeline is incremented each time a failover happens, or when doing
                Point In Time Recovery. A node can only reach the secondary state when it  is  on
                the same timeline as its primary node.

          • Connection
                This  output  field  contains  two  bits  of  information.  First,  the  Postgres
                connection type that the node provides, either read-write or read-only. Then  the
                mark  !  is added when the monitor has failed to connect to this node, and ? when
                the monitor didn't connect to the node yet.

          • Reported State
                The current FSM state as reported  to  the  monitor  by  the  pg_autoctl  process
                running on the Postgres node.

          • Assigned State
                The assigned FSM state on the monitor. When the assigned state is not the same as
                the reported start, then the pg_autoctl process  running  on  the  Postgres  node
                might  have  not retrieved the assigned state yet, or might still be implementing
                the FSM transition from the current state to the assigned state.

       The third and last section lists the most recent events that the monitor  has  registered,
       the more recent event is found at the bottom of the screen.

       To quit the command hit either the F1 key or the q key.

   pg_autoctl stop
       pg_autoctl stop - signal the pg_autoctl service for it to stop

   Synopsis
       This commands stops the processes needed to run a monitor node or a keeper node, depending
       on the configuration file that belongs  to  the  --pgdata  option  or  PGDATA  environment
       variable.

          usage: pg_autoctl stop  [ --pgdata --fast --immediate ]

          --pgdata      path to data directory
          --fast        fast shutdown mode for the keeper
          --immediate   immediate shutdown mode for the keeper

   Description
       The  pg_autoctl stop commands finds the PID of the running service for the given --pgdata,
       and if the process is still running, sends a SIGTERM signal to the process.

       When pg_autoclt receives a shutdown signal a shutdown sequence is triggered. Depending  on
       the  signal  received,  an operation that has been started (such as a state transition) is
       either run to completion, stopped as the next opportunity,  or  stopped  immediately  even
       when in the middle of the transition.

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --fast Fast  Shutdown mode for pg_autoctl. Sends the SIGINT signal to the running service,
              which is the same as using C-c on an interactive process running  as  a  foreground
              shell job.

       --immediate
              Immediate  Shutdown  mode  for  pg_autoctl. Sends the SIGQUIT signal to the running
              service.

   pg_autoctl reload
       pg_autoctl reload - signal the pg_autoctl for it to reload its configuration

   Synopsis
       This commands signals a running pg_autoctl process to reload its configuration from  disk,
       and also signal the managed Postgres service to reload its configuration.

          usage: pg_autoctl reload  [ --pgdata ] [ --json ]

          --pgdata      path to data directory

   Description
       The  pg_autoctl  reload  commands  finds  the  PID  of  the  running service for the given
       --pgdata, and if the process is still running, sends a SIGHUP signal to the process.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults  to  the  environment
              variable  PGDATA.  Use --monitor to connect to a monitor from anywhere, rather than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

   pg_autoctl status
       pg_autoctl status - Display the current status of the pg_autoctl service

   Synopsis
       This commands outputs the current process status for the pg_autoctl  service  running  for
       the given --pgdata location.

          usage: pg_autoctl status  [ --pgdata ] [ --json ]

          --pgdata      path to data directory
          --json        output data in the JSON format

   Options
       --pgdata
              Location  of  the  Postgres node being managed locally. Defaults to the environment
              variable PGDATA. Use --monitor to connect to a monitor from anywhere,  rather  than
              the monitor URI used by a local Postgres node managed with pg_autoctl.

       --json Output a JSON formatted data instead of a table formatted list.

   Example
          $ pg_autoctl status --pgdata node1
          11:26:30 27248 INFO  pg_autoctl is running with pid 26618
          11:26:30 27248 INFO  Postgres is serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 26725

          $ pg_autoctl status --pgdata node1 --json
          11:26:37 27385 INFO  pg_autoctl is running with pid 26618
          11:26:37 27385 INFO  Postgres is serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 26725
          {
              "postgres": {
                  "pgdata": "\/Users\/dim\/dev\/MS\/pg_auto_failover\/tmux\/node1",
                  "pg_ctl": "\/Applications\/Postgres.app\/Contents\/Versions\/12\/bin\/pg_ctl",
                  "version": "12.3",
                  "host": "\/tmp",
                  "port": 5501,
                  "proxyport": 0,
                  "pid": 26725,
                  "in_recovery": false,
                  "control": {
                      "version": 0,
                      "catalog_version": 0,
                      "system_identifier": "0"
                  },
                  "postmaster": {
                      "status": "ready"
                  }
              },
              "pg_autoctl": {
                  "pid": 26618,
                  "status": "running",
                  "pgdata": "\/Users\/dim\/dev\/MS\/pg_auto_failover\/tmux\/node1",
                  "version": "1.5.0",
                  "semId": 196609,
                  "services": [
                      {
                          "name": "postgres",
                          "pid": 26625,
                          "status": "running",
                          "version": "1.5.0",
                          "pgautofailover": "1.5.0.1"
                      },
                      {
                          "name": "node-active",
                          "pid": 26626,
                          "status": "running",
                          "version": "1.5.0",
                          "pgautofailover": "1.5.0.1"
                      }
                  ]
              }
          }

CONFIGURING PG_AUTO_FAILOVER

       Several defaults settings of pg_auto_failover can be reviewed and changed depending on the
       trade-offs you want to implement in your own production setup. The settings that  you  can
       change will have an impact of the following operations:

          • Deciding when to promote the secondary

            pg_auto_failover  decides  to  implement  a  failover  to  the secondary node when it
            detects that the primary node is unhealthy. Changing the following settings will have
            an  impact  on  when  the  pg_auto_failover  monitor decides to promote the secondary
            PostgreSQL node:

                pgautofailover.health_check_max_retries
                pgautofailover.health_check_period
                pgautofailover.health_check_retry_delay
                pgautofailover.health_check_timeout
                pgautofailover.node_considered_unhealthy_timeout

          • Time taken to promote the secondary

            At secondary promotion time, pg_auto_failover waits for the following timeout to make
            sure  that  all  pending  writes  on  the  primary server made it to the secondary at
            shutdown time, thus preventing data loss.:

                pgautofailover.primary_demote_timeout

          • Preventing promotion of the secondary

            pg_auto_failover implements  a  trade-off  where  data  availability  trumps  service
            availability.  When  the  primary node of a PostgreSQL service is detected unhealthy,
            the secondary is only promoted if it was known to be eligible at the moment when  the
            primary is lost.

            In  the  case  when synchronous replication was in use at the moment when the primary
            node is lost, then we know we can switch to the secondary safely, and the wal lag  is
            0 in that case.

            In  the  case  when the secondary server had been detected unhealthy before, then the
            pg_auto_failover  monitor  switches  it  from  the  state  SECONDARY  to  the   state
            CATCHING-UP and promotion is prevented then.

            The following setting allows to still promote the secondary, allowing for a window of
            data loss:

                pgautofailover.promote_wal_log_threshold

   pg_auto_failover Monitor
       The configuration for the behavior of the monitor happens in the PostgreSQL database where
       the extension has been deployed:

          pg_auto_failover=> select name, setting, unit, short_desc from pg_settings where name ~ 'pgautofailover.';
          -[ RECORD 1 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.enable_sync_wal_log_threshold
          setting    | 16777216
          unit       |
          short_desc | Don't enable synchronous replication until secondary xlog is within this many bytes of the primary's
          -[ RECORD 2 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.health_check_max_retries
          setting    | 2
          unit       |
          short_desc | Maximum number of re-tries before marking a node as failed.
          -[ RECORD 3 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.health_check_period
          setting    | 5000
          unit       | ms
          short_desc | Duration between each check (in milliseconds).
          -[ RECORD 4 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.health_check_retry_delay
          setting    | 2000
          unit       | ms
          short_desc | Delay between consecutive retries.
          -[ RECORD 5 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.health_check_timeout
          setting    | 5000
          unit       | ms
          short_desc | Connect timeout (in milliseconds).
          -[ RECORD 6 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.node_considered_unhealthy_timeout
          setting    | 20000
          unit       | ms
          short_desc | Mark node unhealthy if last ping was over this long ago
          -[ RECORD 7 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.primary_demote_timeout
          setting    | 30000
          unit       | ms
          short_desc | Give the primary this long to drain before promoting the secondary
          -[ RECORD 8 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.promote_wal_log_threshold
          setting    | 16777216
          unit       |
          short_desc | Don't promote secondary unless xlog is with this many bytes of the master
          -[ RECORD 9 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.startup_grace_period
          setting    | 10000
          unit       | ms
          short_desc | Wait for at least this much time after startup before initiating a failover.

       You  can  edit the parameters as usual with PostgreSQL, either in the postgresql.conf file
       or using ALTER DATABASE pg_auto_failover SET parameter = value; commands, then  issuing  a
       reload.

   pg_auto_failover Keeper Service
       For  an  introduction  to  the pg_autoctl commands relevant to the pg_auto_failover Keeper
       configuration, please see pg_autoctl config.

       An example configuration file looks like the following:

          [pg_autoctl]
          role = keeper
          monitor = postgres://autoctl_node@192.168.1.34:6000/pg_auto_failover
          formation = default
          group = 0
          hostname = node1.db
          nodekind = standalone

          [postgresql]
          pgdata = /data/pgsql/
          pg_ctl = /usr/pgsql-10/bin/pg_ctl
          dbname = postgres
          host = /tmp
          port = 5000

          [replication]
          slot = pgautofailover_standby
          maximum_backup_rate = 100M
          backup_directory = /data/backup/node1.db

          [timeout]
          network_partition_timeout = 20
          postgresql_restart_failure_timeout = 20
          postgresql_restart_failure_max_retries = 3

       To output, edit and check  entries  of  the  configuration,  the  following  commands  are
       provided:

          pg_autoctl config check [--pgdata <pgdata>]
          pg_autoctl config get [--pgdata <pgdata>] section.option
          pg_autoctl config set [--pgdata <pgdata>] section.option value

       The  [postgresql] section is discovered automatically by the pg_autoctl command and is not
       intended to be changed manually.

       pg_autoctl.monitor

       PostgreSQL service URL of the pg_auto_failover monitor, as given  in  the  output  of  the
       pg_autoctl show uri command.

       pg_autoctl.formation

       A  single  pg_auto_failover  monitor  may  handle several postgres formations. The default
       formation name default is usually fine.

       pg_autoctl.group

       This information is retrieved by the pg_auto_failover keeper when registering  a  node  to
       the monitor, and should not be changed afterwards. Use at your own risk.

       pg_autoctl.hostname

       Node  hostname  used  by  all  the  other  nodes  in  the cluster to contact this node. In
       particular, if this node is a  primary  then  its  standby  uses  that  address  to  setup
       streaming replication.

       replication.slot

       Name  of  the  PostgreSQL  replication  slot  used  in  the  streaming  replication  setup
       automatically  deployed  by  pg_auto_failover.  Replication  slots  can't  be  renamed  in
       PostgreSQL.

       replication.maximum_backup_rate

       When  pg_auto_failover  (re-)builds  a  standby node using the pg_basebackup command, this
       parameter is given to pg_basebackup to throttle the network bandwidth  used.  Defaults  to
       100Mbps.

       replication.backup_directory

       When  pg_auto_failover  (re-)builds  a  standby node using the pg_basebackup command, this
       parameter is the target directory where to copy the bits from the primary server. When the
       copy has been successful, then the directory is renamed to postgresql.pgdata.

       The  default  value is computed from ${PGDATA}/../backup/${hostname} and can be set to any
       value of your preference. Remember that the directory renaming is an atomic operation only
       when  both  the  source and the target of the copy are in the same filesystem, at least in
       Unix systems.

       timeout

       This section allows to setup the behavior of the pg_auto_failover  keeper  in  interesting
       scenarios.

       timeout.network_partition_timeout

       Timeout  in seconds before we consider failure to communicate with other nodes indicates a
       network partition. This check is only done on a PRIMARY server, so other nodes  mean  both
       the monitor and the standby.

       When  a  PRIMARY  node  is  detected  to be on the losing side of a network partition, the
       pg_auto_failover keeper enters the DEMOTE state and stops the PostgreSQL instance in order
       to protect against split brain situations.

       The default is 20s.

       timeout.postgresql_restart_failure_timeout

       timeout.postgresql_restart_failure_max_retries

       When PostgreSQL is not running, the first thing the pg_auto_failover keeper does is try to
       restart it. In case of a transient failure (e.g. file system is full, or other dynamic  OS
       resource  constraint), the best course of action is to try again for a little while before
       reaching out to the monitor and ask for a failover.

       The      pg_auto_failover       keeper       tries       to       restart       PostgreSQL
       timeout.postgresql_restart_failure_max_retries  times  in  a  row  (default  3)  or  up to
       timeout.postgresql_restart_failure_timeout  (defaults  20s)   since   it   detected   that
       PostgreSQL is not running, whichever comes first.

OPERATING PG_AUTO_FAILOVER

       This section is not yet complete. Please contact us with any questions.

   Deployment
       pg_auto_failover  is a general purpose tool for setting up PostgreSQL replication in order
       to implement High Availability of the PostgreSQL service.

   Provisioning
       It is also possible to register pre-existing PostgreSQL instances with a  pg_auto_failover
       monitor.  The pg_autoctl create command honors the PGDATA environment variable, and checks
       whether PostgreSQL is already running. If Postgres is detected, the new node is registered
       in SINGLE mode, bypassing the monitor's role assignment policy.

   Upgrading pg_auto_failover, from versions 1.4 onward
       When  upgrading a pg_auto_failover setup, the procedure is different on the monitor and on
       the Postgres nodes:

          • on the monitor, the internal pg_auto_failover database schema might have changed  and
            needs  to  be  upgraded  to  its  new definition, porting the existing data over. The
            pg_auto_failover database contains the registration of every node in the  system  and
            their current state.
                It  is  not  possible  to trigger a failover during the monitor update.  Postgres
                operations on the Postgres nodes continue normally.

                During the restart of the monitor, the other nodes might have trouble  connecting
                to  the  monitor.  The  pg_autoctl command is designed to retry connecting to the
                monitor and handle errors gracefully.

          • on the Postgres nodes, the pg_autoctl command connects to the monitor every once in a
            while  (every  second  by default), and then calls the node_active protocol, a stored
            procedure in the monitor databases.
                The pg_autoctl also verifies at each connection to the monitor that it's  running
                the   expected   version  of  the  extension.  When  that's  not  the  case,  the
                "node-active" sub-process quits, to be restarted with the possibly new version of
                the pg_autoctl binary found on-disk.

       As a result, here is the standard upgrade plan for pg_auto_failover:

          1. Upgrade the pg_auto_failover package on the all the nodes, monitor included.
                 When  using  a  debian based OS, this looks like the following command when from
                 1.4 to 1.5:

                    sudo apt-get remove pg-auto-failover-cli-1.4 postgresql-11-auto-failover-1.4
                    sudo apt-get install -q -y pg-auto-failover-cli-1.5 postgresql-11-auto-failover-1.5

          2. Restart the pgautofailover service on the monitor.
                 When using the systemd integration, all we need to do is:

                    sudo systemctl restart pgautofailover

                 Then we may use the following commands to make sure that the service is  running
                 as expected:

                    sudo systemctl status pgautofailover
                    sudo journalctl -u pgautofailover

                 At  this  point it is expected that the pg_autoctl logs show that an upgrade has
                 been performed by  using  the  ALTER  EXTENSION  pgautofailover  UPDATE  TO  ...
                 command. The monitor is ready with the new version of pg_auto_failover.

       When  the Postgres nodes pg_autoctl process connects to the new monitor version, the check
       for version compatibility  fails,  and  the  "node-active"  sub-process  exits.  The  main
       pg_autoctl  process supervisor then restart the "node-active" sub-process from its on-disk
       binary executable file, which has been upgraded to the new version. That's  why  we  first
       install  the  new  packages  for pg_auto_failover on every node, and only then restart the
       monitor.

       IMPORTANT:
          Before upgrading the monitor, which is a simple restart of the pg_autoctl  process,  it
          is  important  that  the  OS packages for pgautofailover be updated on all the Postgres
          nodes.

          When that's not the case, pg_autoctl on the Postgres nodes will still detect a  version
          mismatch  with  the monitor extension, and the "node-active" sub-process will exit. And
          when  restarted  automatically,  the  same  version  of  the  local  pg_autoctl  binary
          executable  is  found  on-disk,  leading  to the same version mismatch with the monitor
          extension.

          After restarting the "node-active" process  5  times,  pg_autoctl  quits  retrying  and
          stops.  This  includes  stopping the Postgres service too, and a service downtime might
          then occur.

       And when the upgrade is done we can use pg_autoctl show state on the monitor to  see  that
       eveything is as expected.

   Upgrading from previous pg_auto_failover versions
       The  new  upgrade  procedure described in the previous section is part of pg_auto_failover
       since version 1.4. When upgrading from a previous version of pg_auto_failover, up  to  and
       including version 1.3, then all the pg_autoctl processes have to be restarted fully.

       To prevent triggering a failover during the upgrade, it's best to put your secondary nodes
       in maintenance. The procedure then looks like the following:

          1. Enable maintenance on your secondary node(s):

                 pg_autoctl enable maintenance

          2. Upgrade the OS packages for pg_auto_failover on every node, as per previous section.

          3. Restart the monitor to upgrade it to the new pg_auto_failover version:
                 When using the systemd integration, all we need to do is:

                    sudo systemctl restart pgautofailover

                 Then we may use the following commands to make sure that the service is  running
                 as expected:

                    sudo systemctl status pgautofailover
                    sudo journalctl -u pgautofailover

                 At  this  point it is expected that the pg_autoctl logs show that an upgrade has
                 been performed by  using  the  ALTER  EXTENSION  pgautofailover  UPDATE  TO  ...
                 command. The monitor is ready with the new version of pg_auto_failover.

          4. Restart pg_autoctl on all Postgres nodes on the cluster.
                 When using the systemd integration, all we need to do is:

                    sudo systemctl restart pgautofailover

                 As  in  the previous point in this list, make sure the service is now running as
                 expected.

          5. Disable maintenance on your secondary nodes(s):

                 pg_autoctl disable maintenance

   Extension dependencies when upgrading the monitor
       Since version 1.4.0 the pgautofailover extension requires the Postgres  contrib  extension
       btree_gist.  The  pg_autoctl command arranges for the creation of this dependency, and has
       been buggy in some releases.

       As a result, you might have trouble upgrade  the  pg_auto_failover  monitor  to  a  recent
       version.  It  is  possible to fix the error by connecting to the monitor Postgres database
       and running the create extension command manually:

          # create extension btree_gist;

   Cluster Management and Operations
       It is possible to  operate  pg_auto_failover  formations  and  groups  directly  from  the
       monitor.  All  that  is  needed is an access to the monitor Postgres database as a client,
       such as psql. It's also possible to add those management SQL function calls  in  your  own
       ops application if you have one.

       For  security  reasons, the autoctl_node is not allowed to perform maintenance operations.
       This user is limited to what pg_autoctl needs.  You can either create a specific user  and
       authentication  rule  to  expose  for  management,  or  edit the default HBA rules for the
       autoctl user. In the following examples we're directly connecting as the autoctl role.

       The main operations with pg_auto_failover are node maintenance and manual  failover,  also
       known as a controlled switchover.

   Maintenance of a secondary node
       It  is  possible  to put a secondary node in any group in a MAINTENANCE state, so that the
       Postgres server is not doing synchronous replication anymore and can  be  taken  down  for
       maintenance purposes, such as security kernel upgrades or the like.

       The  command line tool pg_autoctl exposes an API to schedule maintenance operations on the
       current node, which must be a secondary node at the moment when maintenance is requested.

       Here's an example of using the maintenance commands on a  secondary  node,  including  the
       output.  Of  course,  when you try that on your own nodes, dates and PID information might
       differ:

          $ pg_autoctl enable maintenance
          17:49:19 14377 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          17:49:19 14377 INFO  Following table displays times when notifications are received
              Time |  ID |      Host |   Port |       Current State |      Assigned State
          ---------+-----+-----------+--------+---------------------+--------------------
          17:49:19 |   1 | localhost |   5001 |             primary |        wait_primary
          17:49:19 |   2 | localhost |   5002 |           secondary |    wait_maintenance
          17:49:19 |   2 | localhost |   5002 |    wait_maintenance |    wait_maintenance
          17:49:20 |   1 | localhost |   5001 |        wait_primary |        wait_primary
          17:49:20 |   2 | localhost |   5002 |    wait_maintenance |         maintenance
          17:49:20 |   2 | localhost |   5002 |         maintenance |         maintenance

       The command listens to the state changes in the current node's formation and group on  the
       monitor  and  displays  those  changes as it receives them. The operation is done when the
       node has reached the maintenance state.

       It is now possible to disable maintenance to allow pg_autoctl to manage this standby  node
       again:

          $ pg_autoctl disable maintenance
          17:49:26 14437 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          17:49:26 14437 INFO  Following table displays times when notifications are received
              Time |  ID |      Host |   Port |       Current State |      Assigned State
          ---------+-----+-----------+--------+---------------------+--------------------
          17:49:27 |   2 | localhost |   5002 |         maintenance |          catchingup
          17:49:27 |   2 | localhost |   5002 |          catchingup |          catchingup
          17:49:28 |   2 | localhost |   5002 |          catchingup |           secondary
          17:49:28 |   1 | localhost |   5001 |        wait_primary |             primary
          17:49:28 |   2 | localhost |   5002 |           secondary |           secondary
          17:49:29 |   1 | localhost |   5001 |             primary |             primary

       When  a  standby  node is in maintenance, the monitor sets the primary node replication to
       WAIT_PRIMARY: in this role, the PostgreSQL streaming replication is now  asynchronous  and
       the standby PostgreSQL server may be stopped, rebooted, etc.

   Maintenance of a primary node
       A   primary  node  must  be  available  at  all  times  in  any  formation  and  group  in
       pg_auto_failover, that is the invariant provided by the whole solution. With that in mind,
       the  only way to allow a primary node to go to a maintenance mode is to first failover and
       promote the secondary node.

       The same command pg_autoctl enable maintenance implements that operation  when  run  on  a
       primary node with the option --allow-failover. Here is an example of such an operation:

          $ pg_autoctl enable maintenance
          11:53:03 50526 WARN  Enabling maintenance on a primary causes a failover
          11:53:03 50526 FATAL Please use --allow-failover to allow the command proceed

       As we can see the option allow-failover is mandatory. In the next example we use it:

          $ pg_autoctl enable maintenance --allow-failover
          13:13:42 1614 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          13:13:42 1614 INFO  Following table displays times when notifications are received
              Time |  ID |      Host |   Port |       Current State |      Assigned State
          ---------+-----+-----------+--------+---------------------+--------------------
          13:13:43 |   2 | localhost |   5002 |             primary | prepare_maintenance
          13:13:43 |   1 | localhost |   5001 |           secondary |   prepare_promotion
          13:13:43 |   1 | localhost |   5001 |   prepare_promotion |   prepare_promotion
          13:13:43 |   2 | localhost |   5002 | prepare_maintenance | prepare_maintenance
          13:13:44 |   1 | localhost |   5001 |   prepare_promotion |    stop_replication
          13:13:45 |   1 | localhost |   5001 |    stop_replication |    stop_replication
          13:13:46 |   1 | localhost |   5001 |    stop_replication |        wait_primary
          13:13:46 |   2 | localhost |   5002 | prepare_maintenance |         maintenance
          13:13:46 |   1 | localhost |   5001 |        wait_primary |        wait_primary
          13:13:47 |   2 | localhost |   5002 |         maintenance |         maintenance

       When  the  operation is done we can have the old primary re-join the group, this time as a
       secondary:

          $ pg_autoctl disable maintenance
          13:14:46 1985 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          13:14:46 1985 INFO  Following table displays times when notifications are received
              Time |  ID |      Host |   Port |       Current State |      Assigned State
          ---------+-----+-----------+--------+---------------------+--------------------
          13:14:47 |   2 | localhost |   5002 |         maintenance |          catchingup
          13:14:47 |   2 | localhost |   5002 |          catchingup |          catchingup
          13:14:52 |   2 | localhost |   5002 |          catchingup |           secondary
          13:14:52 |   1 | localhost |   5001 |        wait_primary |             primary
          13:14:52 |   2 | localhost |   5002 |           secondary |           secondary
          13:14:53 |   1 | localhost |   5001 |             primary |             primary

   Triggering a failover
       It is possible to trigger a manual failover, or a switchover, using the command pg_autoctl
       perform failover. Here's an example of what happens when running the command:

          $ pg_autoctl perform failover
          11:58:00 53224 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          11:58:00 53224 INFO  Following table displays times when notifications are received
              Time |  ID |      Host |   Port |      Current State |     Assigned State
          ---------+-----+-----------+--------+--------------------+-------------------
          11:58:01 |   1 | localhost |   5001 |            primary |           draining
          11:58:01 |   2 | localhost |   5002 |          secondary |  prepare_promotion
          11:58:01 |   1 | localhost |   5001 |           draining |           draining
          11:58:01 |   2 | localhost |   5002 |  prepare_promotion |  prepare_promotion
          11:58:02 |   2 | localhost |   5002 |  prepare_promotion |   stop_replication
          11:58:02 |   1 | localhost |   5001 |           draining |     demote_timeout
          11:58:03 |   1 | localhost |   5001 |     demote_timeout |     demote_timeout
          11:58:04 |   2 | localhost |   5002 |   stop_replication |   stop_replication
          11:58:05 |   2 | localhost |   5002 |   stop_replication |       wait_primary
          11:58:05 |   1 | localhost |   5001 |     demote_timeout |            demoted
          11:58:05 |   2 | localhost |   5002 |       wait_primary |       wait_primary
          11:58:05 |   1 | localhost |   5001 |            demoted |            demoted
          11:58:06 |   1 | localhost |   5001 |            demoted |         catchingup
          11:58:06 |   1 | localhost |   5001 |         catchingup |         catchingup
          11:58:08 |   1 | localhost |   5001 |         catchingup |          secondary
          11:58:08 |   2 | localhost |   5002 |       wait_primary |            primary
          11:58:08 |   1 | localhost |   5001 |          secondary |          secondary
          11:58:08 |   2 | localhost |   5002 |            primary |            primary

       Again, timings and PID numbers are not expected to be the same when you run the command on
       your own setup.

       Also note in the output that the command shows the whole set of transitions including when
       the  old primary is now a secondary node. The database is available for read-write traffic
       as soon as we reach the state wait_primary.

   Implementing a controlled switchover
       It is generally useful to  distinguish  a  controlled  switchover  to  a  failover.  In  a
       controlled switchover situation it is possible to organise the sequence of events in a way
       to avoid data loss and lower downtime to a minimum.

       In the case of pg_auto_failover, because we use synchronous  replication,  we  don't  face
       data loss risks when triggering a manual failover. Moreover, our monitor knows the current
       primary health at the time when  the  failover  is  triggered,  and  drives  the  failover
       accordingly.

       So  to  trigger  a controlled switchover with pg_auto_failover you can use the same API as
       for a manual failover:

          $ pg_autoctl perform switchover

       Because the subtelties of orchestrating either a controlled  switchover  or  an  unplanned
       failover  are all handled by the monitor, rather than the client side command line, at the
       client level the two command pg_autoctl perform failover and pg_autoctl perform switchover
       are synonyms, or aliases.

   Current state, last events
       The  following  commands  display  information  from  the  pg_auto_failover monitor tables
       pgautofailover.node and pgautofailover.event:

          $ pg_autoctl show state
          $ pg_autoctl show events

       When run on the monitor, the commands outputs all the known  states  and  events  for  the
       whole set of formations handled by the monitor. When run on a PostgreSQL node, the command
       connects to the monitor and outputs the information relevant to the service group  of  the
       local node only.

       For interactive debugging it is helpful to run the following command from the monitor node
       while e.g. initializing a formation from scratch, or performing a manual failover:

          $ watch pg_autoctl show state

   Monitoring pg_auto_failover in Production
       The monitor reports every state change decision to a LISTEN/NOTIFY  channel  named  state.
       PostgreSQL  logs  on  the  monitor  are  also stored in a table, pgautofailover.event, and
       broadcast by NOTIFY in the channel log.

   Replacing the monitor online
       When the monitor node is not available anymore, it is possible to  create  a  new  monitor
       node and then switch existing nodes to a new monitor by using the following commands.

          1. Apply the STONITH approach on the old monitor to make sure this node is not going to
             show up again during the procedure. This step is sometimes referred to as “fencing”.

          2. On every node, ending with the (current)  Postgres  primary  node  for  each  group,
             disable the monitor while pg_autoctl is still running:

                 $ pg_autoctl disable monitor --force

          3. Create a new monitor node:

                 $ pg_autoctl create monitor ...

          4. On  the  current  primary node first, so that it's registered first and as a primary
             still, for each group in your formation(s), enable the monitor online again:

                 $ pg_autoctl enable monitor postgresql://autoctl_node@.../pg_auto_failover

          5. On every other (secondary) node, enable the monitor online again:

                 $ pg_autoctl enable monitor postgresql://autoctl_node@.../pg_auto_failover

       See pg_autoctl disable monitor and pg_autoctl  enable  monitor  for  details  about  those
       commands.

       This operation relies on the fact that a pg_autoctl can be operated without a monitor, and
       when reconnecting to a new monitor, this process reset the parts of the  node  state  that
       comes from the monitor, such as the node identifier.

   Trouble-Shooting Guide
       pg_auto_failover commands can be run repeatedly. If initialization fails the first time --
       for instance because a firewall  rule  hasn't  yet  activated  --  it's  possible  to  try
       pg_autoctl  create  again.  pg_auto_failover  will review its previous progress and repeat
       idempotent operations (create database, create extension etc), gracefully handling errors.

FREQUENTLY ASKED QUESTIONS

Those questions have been asked in GitHub issues for the project by several people. If you
have more questions, feel free to open a new issue, and your question and its answer might
make it to this FAQ.

I stopped the primary and no failover is happening for 20s to 30s, why?
In order to avoid spurious failovers when the network connectivity is not stable,
pg_auto_failover implements a timeout of 20s before acting on a node that is known
unavailable. This needs to be added to the delay between health checks and the retry
policy.

See the Configuring pg_auto_failover part for more information about how to setup the
different delays and timeouts that are involved in the decision making.

See also pg_autoctl watch to have a dashboard that helps understanding the system and
what's going on in the moment.

The secondary is blocked in the CATCHING_UP state, what should I do?
In the pg_auto_failover design, the following two things are needed for the monitor to be
able to orchestrate nodes integration completely:

1. Health Checks must be successful

The monitor runs periodic health checks with all the nodes registered in the system.
Those health checks are Postgres connections from the monitor to the registered
Postgres nodes, and use the hostname and port as registered.

The pg_autoctl show state commands column Reachable contains "yes" when the monitor
could connect to a specific node, "no" when this connection failed, and "unknown"
when no connection has been attempted yet, since the last startup time of the
monitor.

The Reachable column from pg_autoctl show state command output must show a "yes"
entry before a new standby node can be orchestrated up to the "secondary" goal
state.

2. pg_autoctl service must be running

The pg_auto_failover monitor works by assigning goal states to individual Postgres
nodes. The monitor will not assign a new goal state until the current one has been
reached.

To implement a transition from the current state to the goal state assigned by the
monitor, the pg_autoctl service must be running on every node.

When your new standby node stays in the "catchingup" state for a long time, please check
that the node is reachable from the monitor given its hostname and port known on the
monitor, and check that the pg_autoctl run command is running for this node.

When things are not obvious, the next step is to go read the logs. Both the output of the
pg_autoctl command and the Postgres logs are relevant. See the Should I read the logs?
Where are the logs? question for details.

Should I read the logs? Where are the logs?
Yes. If anything seems strange to you, please do read the logs.

As maintainers of the pg_autoctl tool, we can't foresee everything that may happen to your
production environment. Still, a lot of efforts is spent on having a meaningful output. So
when you're in a situation that's hard to understand, please make sure to read the
pg_autoctl logs and the Postgres logs.

When using systemd integration, the pg_autoctl logs are then handled entirely by the
journal facility of systemd. Please then refer to journalctl for viewing the logs.

The Postgres logs are to be found in the $PGDATA/log directory with the default
configuration deployed by pg_autoctl create .... When a custom Postgres setup is used,
please refer to your actual setup to find Postgres logs.

The state of the system is blocked, what should I do?
This question is a general case situation that is similar in nature to the previous
situation, reached when adding a new standby to a group of Postgres nodes. Please check
the same two elements: the monitor health checks are successful, and the pg_autoctl run
command is running.

The monitor is a SPOF in pg_auto_failover design, how should we handle that?
When using pg_auto_failover, the monitor is needed to make decisions and orchestrate
changes in all the registered Postgres groups. Decisions are transmitted to the Postgres
nodes by the monitor assigning nodes a goal state which is different from their current
state.

Consequences of the monitor being unavailable
Nodes contact the monitor each second and call the node_active stored procedure, which
returns a goal state that is possibly different from the current state.

The monitor only assigns Postgres nodes with a new goal state when a cluster wide
operation is needed. In practice, only the following operations require the monitor to
assign a new goal state to a Postgres node:

• a new node is registered

• a failover needs to happen, either triggered automatically or manually

• a node is being put to maintenance

• a node replication setting is being changed.

When the monitor node is not available, the pg_autoctl processes on the Postgres nodes
will fail to contact the monitor every second, and log about this failure. Adding to that,
no orchestration is possible.

The Postgres streaming replication does not need the monitor to be available in order to
deliver its service guarantees to your application, so your Postgres service is still
available when the monitor is not available.

To repair your installation after having lost a monitor, the following scenarios are to be
considered.

The monitor node can be brought up again without data having been lost
This is typically the case in Cloud Native environments such as Kubernetes, where you
could have a service migrated to another pod and re-attached to its disk volume. This
scenario is well supported by pg_auto_failover, and no intervention is needed.

It is also possible to use synchronous archiving with the monitor so that it's possible to
recover from the current archives and continue operating without intervention on the
Postgres nodes, except for updating their monitor URI. This requires an archiving setup
that uses synchronous replication so that any transaction committed on the monitor is
known to have been replicated in your WAL archive.

At the moment, you have to take care of that setup yourself. Here's a quick summary of
what needs to be done:

1. Schedule base backups

Use pg_basebackup every once in a while to have a full copy of the monitor Postgres
database available.

2. Archive WAL files in a synchronous fashion

Use pg_receivewal --sync ... as a service to keep a WAL archive in sync with the
monitor Postgres instance at all time.

3. Prepare a recovery tool on top of your archiving strategy

Write a utility that knows how to create a new monitor node from your most recent
pg_basebackup copy and the WAL files copy.

Bonus points if that tool/script is tested at least once a day, so that you avoid
surprises on the unfortunate day that you actually need to use it in production.

A future version of pg_auto_failover will include this facility, but the current versions
don't.

The monitor node can only be built from scratch again
If you don't have synchronous archiving for the monitor set-up, then you might not be able
to restore a monitor database with the expected up-to-date node metadata. Specifically we
need the nodes state to be in sync with what each pg_autoctl process has received the last
time they could contact the monitor, before it has been unavailable.

It is possible to register nodes that are currently running to a new monitor without
restarting Postgres on the primary. For that, the procedure mentionned in Replacing the
monitor online must be followed, using the following commands:

$ pg_autoctl disable monitor
$ pg_autoctl enable monitor

AUTHOR

       Microsoft

COPYRIGHT

       Copyright (c) Microsoft Corporation. All rights reserved.