bionic (1) harep.1.gz

Provided by: ganeti-htools-2.16_2.16.0~rc2-1build1_amd64 bug

NAME

       harep - Ganeti auto-repair tool

SYNOPSIS

       harep [ [-L | --luxi ] = socket ] [ --job-delay = seconds ] [ --dry-run ]

       harep --version

DESCRIPTION

       Harep  is the Ganeti auto-repair tool.  It is able to detect that an instance is broken and to generate a
       sequence of jobs that will fix it, in accordance to the  policies  set  by  the  administrator.   At  the
       moment, only repairs for instances using the disk templates plain or drbd are supported.

       Harep  is  able  to  recognize  what  state  an  instance is in (healthy, suspended, needs repair, repair
       disallowed, pending repair, repair failed) and to lead it through a sequence of steps that will bring the
       instance  back to the healthy state.  Therefore, harep is mainly meant to be run regularly and frequently
       using a cron job, so that it can actually follow the instance along all the process.  At every run, harep
       will  update  the  tags  it  adds to instances that describe its repair status, and will submit jobs that
       actually perform the required repair operations.

       By default, harep only reports on the health status of instances, but doesn't perform any action, as they
       might  be  potentially dangerous.  Therefore, harep will only touch instances that it has been explicitly
       authorized to work on.

       The tags enabling harep, can be associated to single instances,  or  to  a  nodegroup  or  to  the  whole
       cluster,  therefore  affecting  all  the  instances  they  contain.   The  possible tags share the common
       structure:

              ganeti:watcher:autorepair:<type>

       where <type> can have the following values:

       • fix-storage: allow disk replacement or fix the backend without affecting the  instance  itself  (broken
         DRBD secondary)

       • migrate: allow instance migration.  Note, however, that current harep does not submit migrate jobs; so,
         currently, this permission level is equivalent to fix-storage.

       • failover: allow instance reboot on the secondary; this action is taken, if the primary node is offline.

       • reinstall: allow disks to be recreated and the instance to be reinstalled

       Each element in the list of tags, includes all the authorizations of the previous one,  with  fix-storage
       being the least powerful and reinstall being the most powerful.

       In case multiple autorepair tags act on the same instance, only one can actually be active.  The conflict
       is solved according to the following rules:

       1. if multiple tags are in the same object, the least destructive takes precedence.

       2. if the tags are across objects, the nearest tag wins.

       Example: A cluster has instances I1 and I2, where I1 has the failover  tag,  and  the  cluster  has  both
       fix-storage  and  reinstall.   The  I1  instance  will  be  allowed  to failover, the I2 instance only to
       fix-storage.

LIMITATIONS

       Harep doesn't do any hardware failure detection on its own, it relies on nodes being marked as offline by
       the administrator.

       Also harep currently works only for instances with the drbd and plain disk templates.

       Using  the  data  model  of  htools(1),  harep  cannot distinguish between drained and offline nodes.  In
       particular, it will (permission provided) failover instances also in situations where a  migration  would
       have  been  enough.   In  particular,  handling of node draining is better done using hbal(1), which will
       always submit migration jobs, however is the permission to fall back to failover.

       These issues will be addressed by a  new  maintenance  daemon  in  future  Ganeti  versions,  which  will
       supersede harep.

OPTIONS

       The options that can be passed to the program are as follows:

       -L socket, --luxi=*socket*
              collect data via Luxi, optionally using the given socket path.

       --job-delay=*seconds*
              insert  this  much  delay  before  the  execution  of  repair  jobs  to allow the tool to continue
              processing instances.

       --dry-run
              only show which operations would be carried out, but do nothing,  even  on  instances  where  tags
              grant  the  appropriate  permissions.   Note  that  harep  keeps the state of repair operations in
              instance tags; therefore, only the operations of the next round of actions can be inspected.

REPORTING BUGS

       Report bugs to project website (http://code.google.com/p/ganeti/) or contact  the  developers  using  the
       Ganeti mailing list (ganeti@googlegroups.com).

SEE ALSO

       Ganeti  overview  and  specifications:  ganeti(7)  (general  overview),  ganeti-os-interface(7) (guest OS
       definitions), ganeti-extstorage-interface(7) (external storage providers).

       Ganeti commands: gnt-cluster(8) (cluster-wide commands), gnt-job(8) (job-related  commands),  gnt-node(8)
       (node-related   commands),   gnt-instance(8)   (instance   commands),   gnt-os(8)  (guest  OS  commands),
       gnt-storage(8)  (storage  commands),  gnt-group(8)  (node  group   commands),   gnt-backup(8)   (instance
       import/export commands), gnt-debug(8) (debug commands).

       Ganeti  daemons: ganeti-watcher(8) (automatic instance restarter), ganeti-cleaner(8) (job queue cleaner),
       ganeti-noded(8) (node daemon), ganeti-rapi(8) (remote API daemon).

       Ganeti htools: htools(1) (generic binary), hbal(1) (cluster balancer), hspace(1) (capacity  calculation),
       hail(1) (IAllocator plugin), hscan(1) (data gatherer from remote clusters), hinfo(1) (cluster information
       printer), mon-collector(7) (data collectors interface).

       Copyright (C) 2006-2015 Google Inc.  All rights reserved.

       Redistribution and use in source and binary forms, with or without modification, are  permitted  provided
       that the following conditions are met:

       1.   Redistributions  of  source code must retain the above copyright notice, this list of conditions and
       the following disclaimer.

       2.  Redistributions in binary form must reproduce the above copyright notice, this list of conditions and
       the following disclaimer in the documentation and/or other materials provided with the distribution.

       THIS  SOFTWARE  IS  PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
       WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND  FITNESS  FOR  A
       PARTICULAR  PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
       ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,  EXEMPLARY,  OR  CONSEQUENTIAL  DAMAGES  (INCLUDING,  BUT  NOT
       LIMITED  TO,  PROCUREMENT  OF  SUBSTITUTE  GOODS  OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
       INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,  STRICT  LIABILITY,  OR
       TORT  (INCLUDING  NEGLIGENCE  OR  OTHERWISE)  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
       ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.