Provided by: sanlock_2.2-2_amd64 bug

NAME

       sanlock - shared storage lock manager

SYNOPSIS

       sanlock [COMMAND] [ACTION] ...

DESCRIPTION

       The  sanlock  daemon  manages  leases for applications running on a cluster of hosts with shared storage.
       All lease management and coordination is done through reading and writing blocks on the  shared  storage.
       Two types of leases are used, each based on a different algorithm:

       "delta  leases" are slow to acquire and require regular i/o to shared storage.  A delta lease exists in a
       single sector of storage.  Acquiring a delta lease involves reads and writes to that sector separated  by
       specific delays.  Once acquired, a lease must be renewed by updating a timestamp in the sector regularly.
       sanlock  uses  a  delta  lease internally to hold a lease on a host_id.  host_id leases prevent two hosts
       from using the same host_id and provide basic host liveness information based on the renewals.

       "paxos leases" are generally fast to acquire and sanlock makes them available to applications as  general
       purpose  resource leases.  A paxos lease exists in 1MB of shared storage (8MB for 4k sectors).  Acquiring
       a paxos lease involves reads and writes to max_hosts (2000) sectors in a specific sequence  specified  by
       the  Disk Paxos algorithm.  paxos leases use host_id's internally to indicate the owner of the lease, and
       the algorithm fails if different hosts use the  same  host_id.   So,  delta  leases  provide  the  unique
       host_id's used in paxos leases.  paxos leases also refer to delta leases to check if a host_id is alive.

       Before  sanlock  can  be  used, the user must assign each host a host_id, which is a number between 1 and
       2000.  Two hosts should not be given the same host_id (even though delta leases attempt  to  detect  this
       mistake.)

       sanlock  views  a  pool  of storage as a "lockspace".  Each distinct pool of storage, e.g. from different
       sources, would typically be defined as a separate lockspace, with a unique lockspace name.

       Part of this storage space must be reserved and initialized for sanlock to store delta leases.  Each host
       that wants to use the lockspace must first acquire a  delta  lease  on  its  host_id  number  within  the
       lockspace.   (See  the  add_lockspace  action/api.)   The  space  required  for  2000 delta leases in the
       lockspace (for 2000 possible host_id's) is 1MB (8MB for 4k sectors).  (This is the same size required for
       a single paxos lease.)

       More storage space must be reserved and initialized for paxos leases,  according  to  the  needs  of  the
       applications using sanlock.

       The  following  steps  illustrate  these  concepts using the command line.  Applications may choose to do
       these same steps through libsanlock.

       1. Create storage pools and reserve and initialize host_id leases
       two different LUNs on a SAN: /dev/sdb, /dev/sdc
       # vgcreate pool1 /dev/sdb
       # vgcreate pool2 /dev/sdc
       # lvcreate -n hostid_leases -L 1MB pool1
       # lvcreate -n hostid_leases -L 1MB pool2
       # sanlock direct init -s LS1:0:/dev/pool1/hostid_leases:0
       # sanlock direct init -s LS2:0:/dev/pool2/hostid_leases:0

       2. Start the sanlock daemon on each host
       # sanlock daemon

       3. Add each lockspace to be used
       host1:
       # sanlock client add_lockspace -s LS1:1:/dev/pool1/hostid_leases:0
       # sanlock client add_lockspace -s LS2:1:/dev/pool2/hostid_leases:0
       host2:
       # sanlock client add_lockspace -s LS1:2:/dev/pool1/hostid_leases:0
       # sanlock client add_lockspace -s LS2:2:/dev/pool2/hostid_leases:0

       4. Applications can now reserve/initialize space for resource leases, and then acquire the leases as they
       need to access the resources.

       The resource leases that are created and how they are used depends on the application.  For example,  say
       application  A,  running  on  host1  and  host2,  needs  to  synchronize  access  to  data  it  stores on
       /dev/pool1/Adata.  A could use a resource lease as follows:

       5. Reserve and initialize a single resource lease for Adata
       # lvcreate -n Adata_lease -L 1MB pool1
       # sanlock direct init -r LS1:Adata:/dev/pool1/Adata_lease:0

       6. Acquire the lease from the app using libsanlock (see sanlock_register, sanlock_acquire).  If  the  app
       is  already running as pid 123, and has registered with the sanlock daemon, the lease can be added for it
       manually.
       # sanlock client acquire -r LS1:Adata:/dev/pool1/Adata_lease:0 -p 123

       offsets

       offsets must be 1MB aligned for disks with 512 byte sectors, and 8MB aligned for  disks  with  4096  byte
       sectors.

       offsets may be used to place leases on the same device rather than using separate devices and offset 0 as
       shown in examples above, e.g. these commands above:
       # sanlock direct init -s LS1:0:/dev/pool1/hostid_leases:0
       # sanlock direct init -r LS1:Adata:/dev/pool1/Adata_lease:0
       could be replaced by:
       # sanlock direct init -s LS1:0:/dev/pool1/leases:0
       # sanlock direct init -r LS1:Adata:/dev/pool1/leases:1048576

       failures

       If  a  process  holding resource leases fails or exits without releasing its leases, sanlock will release
       the leases for it automatically.

       If the sanlock daemon cannot renew a lockspace host_id for a specific period  of  time  (usually  because
       storage access is lost), sanlock will kill any process holding a resource lease within the lockspace.

       If  the  sanlock daemon crashes or gets stuck, it will no longer renew the expiry time of its per-host_id
       connections to the wdmd daemon, and the watchdog device will reset the host.

       watchdog

       sanlock uses the wdmd(8) daemon to access /dev/watchdog.  A separate wdmd connection is  maintained  with
       wdmd  for each host_id being renewed.  Each host_id connection has an expiry time for some seconds in the
       future.  After each successful host_id renewal, sanlock updates the associated expiry time in  wdmd.   If
       wdmd finds any connection expired, it will not pet /dev/watchdog.  After enough successive expired/failed
       checks, the watchdog device will fire and reset the host.

       After  a  number  of  failed attempts to renew a host_id, sanlock kills any process using that lockspace.
       Once all those processes have exited, sanlock will unregister the associated wdmd connection.  wdmd  will
       no  longer find the expired connection, and will resume petting /dev/watchdog (assuming it finds no other
       failed/expired tests.)  If the killed processes did not exit quickly enough, the expired wdmd  connection
       will not be unregistered, and /dev/watchdog will reset the host.

       Based  on  these  known  timeout values, sanlock on another host can calculate, based on the last host_id
       renewal, when the failed host will have  been  reset  by  its  watchdog  (or  killed  all  the  necessary
       processes).

       If  the sanlock daemon itself fails, crashes, get stuck, it will no longer update the expiry time for its
       host_id connections to wdmd, which will also lead to the watchdog resetting the host.

       safety

       sanlock leases are meant to guarantee that two process on two hosts are never allowed to  hold  the  same
       resource  lease  at  once.  If they were, the resource being protected may be corrupted.  There are three
       levels of protection built into sanlock itself:

       1. The paxos leases and delta leases themselves.

       2. If the leases cannot function because storage access  is  lost  (host_id's  cannot  be  renewed),  the
       sanlock daemon kills any pids using resource leases in the lockspace.

       3. If the pids do not exit after being killed, or if the sanlock daemon fails, the watchdog device resets
       the host.

OPTIONS

       COMMAND can be one of three primary top level choices

       sanlock daemon start daemon
       sanlock client send request to daemon (default command if none given)
       sanlock direct access storage directly (no coordination with daemon)

       sanlock daemon [options]

       -D no fork and print all logging to stderr

       -Q 0|1 quiet error messages for common lock contention

       -R 0|1 renewal debugging, log debug info for each renewal

       -L pri write logging at priority level and up to logfile (-1 none)

       -S pri write logging at priority level and up to syslog (-1 none)

       -U uid user id

       -G gid group id

       -t num max worker threads

       -w 0|1 use watchdog through wdmd

       -h 0|1 use high priority features (realtime scheduling, mlockall)

       -a 0|1 use async i/o

       -o sec io timeout in seconds

       sanlock client action [options]

       sanlock client status

       Print  processes,  lockspaces,  and  resources  being manged by the sanlock daemon.  Add -D to show extra
       internal daemon status for debugging.  Add -o p to show resources by pid, or -o s to  show  resources  by
       lockspace.

       sanlock client host_status -s LOCKSPACE

       Print  state  of host_id delta leases read during the last renewal.  Only lockspace_name is used from the
       LOCKSPACE argument.  Add -D to show extra internal daemon status for debugging.

       sanlock client log_dump

       Print the sanlock daemon internal debug log.

       sanlock client shutdown

       Ask the sanlock daemon to exit.  Without the force option (-f 0), the command  will  be  ignored  if  any
       lockspaces  exist.  With the force option (-f 1), any registered processes will be killed, their resource
       leases released, and lockspaces removed.

       sanlock client init -s LOCKSPACE
       sanlock client init -r RESOURCE

       Tell the sanlock daemon to initialize storage for lease areas.  (See sanlock direct init.)

       sanlock client align -s LOCKSPACE

       Tell the sanlock daemon to report the required lease alignment for a storage path.   Only  path  is  used
       from the LOCKSPACE argument.

       sanlock client add_lockspace -s LOCKSPACE

       Tell  the sanlock daemon to acquire the specified host_id in the lockspace.  This will allow resources to
       be acquired in the lockspace.

       sanlock client inq_lockspace -s LOCKSPACE

       Ask to the sanlock daemon weather the lockspace is acquired or not.

       sanlock client rem_lockspace -s LOCKSPACE

       Tell the sanlock daemon to release the  specified  host_id  in  the  lockspace.   Any  processes  holding
       resource leases in this lockspace will be killed, and the resource leases not released.

       sanlock client command -r RESOURCE -c path args

       Register with the sanlock daemon, acquire the specified resource lease, and exec the command at path with
       args.  When the command exits, the sanlock daemon will release the lease.  -c must be the final option.

       sanlock client acquire -r RESOURCE -p pid
       sanlock client release -r RESOURCE -p pid

       Tell  the  sanlock  daemon to acquire or release the specified resource lease for the given pid.  The pid
       must be registered with the sanlock daemon.  acquire can optionally  take  a  versioned  RESOURCE  string
       RESOURCE:lver, where lver is the version of the lease that must be acquired, or fail.

       sanlock client inquire -p pid

       Print  the resource leases held the given pid.  The format is a versioned RESOURCE string "RESOURCE:lver"
       where lver is the version of the lease held.

       sanlock client request -r RESOURCE -f force_mode

       Request the owner of a resource do something specified by force_mode.  A versioned  RESOURCE:lver  string
       must be used with a greater version than is presently held.  Zero lver and force_mode clears the request.

       sanlock client examine -r RESOURCE

       Examine  the  request  record for the currently held resource lease and carry out the action specified by
       the requested force_mode.

       sanlock client examine -s LOCKSPACE

       Examine requests for all resource leases currently held in the named lockspace.  Only  lockspace_name  is
       used from the LOCKSPACE argument.

       sanlock direct action [options]

       -a 0|1 use async i/o

       -o sec io timeout in seconds

       sanlock direct init -s LOCKSPACE
       sanlock direct init -r RESOURCE

       Initialize storage for 2000 host_id (delta) leases for the given lockspace, or initialize storage for one
       resource  (paxos)  lease.  Both options require 1MB of space.  The host_id in the LOCKSPACE string is not
       relevant to initialization, so the value is ignored.  (The default of 2000 host_ids can  be  changed  for
       special cases using the -n num_hosts and -m max_hosts options.)

       sanlock direct read_leader -s LOCKSPACE
       sanlock direct read_leader -r RESOURCE

       Read  a  leader record from disk and print the fields.  The leader record is the single sector of a delta
       lease, or the first sector of a paxos lease.

       sanlock direct read_id -s LOCKSPACE
       sanlock direct live_id -s LOCKSPACE

       read_id reads a host_id and prints the owner.  live_id reads  a  host_id  once  a  second  until  it  the
       timestamp   or   owner   change   (prints   live   1),   or  until  host_dead_seconds  (prints  live  0).
       (host_dead_seconds is derived from the io_timeout option.  The live 0|1 conclusion  will  not  match  the
       sanlock daemon's conclusion unless the configured timeouts match.)

       sanlock direct dump path[:offset]

       Read  disk  sectors  and  print  leader records for delta or paxos leases.  Add -f 1 to print the request
       record values for paxos leases, and host_ids set in delta lease bitmaps.

   LOCKSPACE option string
       -s lockspace_name:host_id:path:offset

       lockspace_name name of lockspace
       host_id local host identifier in lockspace
       path path to storage reserved for leases
       offset offset on path (bytes)

   RESOURCE option string
       -r lockspace_name:resource_name:path:offset

       lockspace_name name of lockspace
       resource_name name of resource
       path path to storage reserved for leases
       offset offset on path (bytes)

   RESOURCE option string with version
       -r lockspace_name:resource_name:path:offset:lver

       lver leader version or SH for shared lease

   Defaults
       sanlock help shows the default values for the options above.

       sanlock version shows the build version.

USAGE

   Request/Examine
       The first part of making a request for a resource is writing the request  record  of  the  resource  (the
       sector following the leader record).  To make a successful request:

       •  RESOURCE:lver must be greater than the lver presently held by the other host.  This implies the leader
          record must be read to discover the lver, prior to making a request.

       •  RESOURCE:lver  must be greater than or equal to the lver presently written to the request record.  Two
          hosts may write a new request at the same time for the same lver, in which case  both  would  succeed,
          but the force_mode from the last would win.

       •  The force_mode must be greater than zero.

       •  To  unconditionally  clear  the  request record (set both lver and force_mode to 0), make request with
          RESOURCE:0 and force_mode 0.

       The owner of the requested resource will not know of the request unless it is explicitly told to  examine
       its resources via the "examine" api/command, or otherwise notfied.

       The  second  part  of  making  a request is notifying the resource lease owner that it should examine the
       request records of its resource leases.  The notification will cause the lease owner to automatically run
       the equivalent of "sanlock client examine -s LOCKSPACE" for the lockspace of the requested resource.

       The notification is made using a bitmap in each host_id delta lease.  Each bit  represents  each  of  the
       possible host_ids (1-2000).  If host A wants to notify host B to examine its resources, A sets the bit in
       its  own  bitmap  that corresponds to the host_id of B.  When B next renews its delta lease, it reads the
       delta leases for all hosts and checks each bitmap to see if its own host_id has been set.  It  finds  the
       bit  for  its own host_id set in A's bitmap, and examines its resource request records.  (The bit remains
       set in A's bitmap for request_finish_seconds.)

       force_mode determines the action the resource lease owner should take:

       1 (KILL_PID): kill the process holding the resource lease.  When the process  has  exited,  the  resource
       lease will be released, and can then be acquired by anyone.

SEE ALSO

       wdmd(8)

                                                   2011-08-05                                         SANLOCK(8)