Provided by: ganeti_2.9.3-1_all bug


       ganeti-watcher - Ganeti cluster watcher


       ganeti-watcher [--debug] [--job-age=age] [--ignore-pause]


       The  ganeti-watcher  is  a  periodically  run  script which is responsible for keeping the
       instances in the correct status.  It has two separate functions, one for the  master  node
       and another one that runs on every node.

       If  the  watcher is disabled at cluster level (via the gnt-cluster watcher pause command),
       it will exit without doing anything.  The cluster-level pause can be  overridden  via  the
       --ignore-pause  option,  for  example  if  during  a  maintenance  the watcher needs to be
       disabled in general, but the administrator wants to run it just once.

       The --debug option will increase the verbosity of the watcher and also activate logging to
       the standard error.

   Master operations
       Its primary function is to try to keep running all instances which are marked as up in the
       configuration file, by trying to start them a limited number of times.

       Another function is to "repair" DRBD links by reactivating the block devices of  instances
       which have secondaries on nodes that have been rebooted.

       The watcher will also archive old jobs (older than the age given via the --job-age option,
       which defaults to 6 hours), in order to keep the job queue manageable.

   Node operations
       The watcher will restart any down daemons that are appropriate for the current node.

       In addition, it will execute any scripts which exist under the "watcher" directory in  the
       Ganeti  hooks directory (/etc/ganeti/hooks).  This should be used for lightweight actions,
       like starting any extra daemons.

       If the cluster parameter maintain_node_health is  enabled,  then  the  watcher  will  also
       shutdown  instances  and  DRBD  devices if the node is declared as offline by known master

       The watcher does synchronous queries but will submit jobs for executing the changes.   Due
       to locking, it could be that the jobs execute much later than the watcher submits them.


       The    command    has    a   set   of   state   files   (one   per   group)   located   at
       /var/lib/ganeti/ (only used on  the  master)  and  a  log  file  at
       /var/log/ganeti/watcher.log.  Removal of either file(s) will not affect correct operation;
       the removal of the state file will just cause the restart counters for  the  instances  to
       reset  to zero, and mark nodes as freshly rebooted (so for example DRBD minors will be re-

       In some cases, it's  even  desirable  to  reset  the  watcher  state,  for  example  after
       maintenance  actions,  or  when  you  want to simulate the reboot of all nodes, so in this
       case, you can remove all state files:

              rm -f /var/lib/ganeti/watcher.*.data
              rm -f /var/lib/ganeti/watcher.*.instance-status
              rm -f /var/lib/ganeti/instance-status

       And then re-run the watcher.


       Report  bugs  to  project  website  (  or   contact   the
       developers using the Ganeti mailing list (


       Ganeti  overview  and specifications: ganeti(7) (general overview), ganeti-os-interface(7)
       (guest OS definitions), ganeti-extstorage-interface(7) (external storage providers).

       Ganeti  commands:  gnt-cluster(8)   (cluster-wide   commands),   gnt-job(8)   (job-related
       commands),  gnt-node(8) (node-related commands), gnt-instance(8) (instance commands), gnt-
       os(8) (guest OS commands), gnt-storage(8) (storage  commands),  gnt-group(8)  (node  group
       commands), gnt-backup(8) (instance import/export commands), gnt-debug(8) (debug commands).

       Ganeti  daemons:  ganeti-watcher(8) (automatic instance restarter), ganeti-cleaner(8) (job
       queue cleaner), ganeti-noded(8) (node daemon), ganeti-masterd(8) (master daemon),  ganeti-
       rapi(8) (remote API daemon).

       Ganeti htools: htools(1) (generic binary), hbal(1) (cluster balancer), hspace(1) (capacity
       calculation), hail(1) (IAllocator plugin), hscan(1) (data gatherer from remote  clusters),
       hinfo(1) (cluster information printer), mon-collector(7) (data collectors interface).


       Copyright  (C) 2006, 2007, 2008, 2009, 2010, 2011, 2012 Google Inc.  Permission is granted
       to copy, distribute and/or modify under the terms of the GNU  General  Public  License  as
       published  by  the  Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       On Debian systems, the complete text of the GNU General Public License  can  be  found  in