Provided by: sanlock_2.2-2_amd64 bug

NAME

       sanlock - shared storage lock manager

SYNOPSIS

       sanlock [COMMAND] [ACTION] ...

DESCRIPTION

       The  sanlock  daemon  manages  leases  for applications running on a cluster of hosts with
       shared storage.  All lease management and coordination is done through reading and writing
       blocks  on  the  shared  storage.  Two types of leases are used, each based on a different
       algorithm:

       "delta leases" are slow to acquire and require regular i/o to  shared  storage.   A  delta
       lease  exists  in  a single sector of storage.  Acquiring a delta lease involves reads and
       writes to that sector separated by specific  delays.   Once  acquired,  a  lease  must  be
       renewed  by  updating  a  timestamp  in  the sector regularly.  sanlock uses a delta lease
       internally to hold a lease on a host_id.  host_id leases prevent two hosts from using  the
       same host_id and provide basic host liveness information based on the renewals.

       "paxos  leases"  are  generally  fast  to  acquire  and  sanlock  makes  them available to
       applications as general purpose resource leases.  A paxos lease exists in  1MB  of  shared
       storage  (8MB  for  4k  sectors).   Acquiring  a  paxos lease involves reads and writes to
       max_hosts (2000) sectors in a specific sequence specified by  the  Disk  Paxos  algorithm.
       paxos  leases  use  host_id's  internally  to  indicate  the  owner  of the lease, and the
       algorithm fails if different hosts use the same host_id.  So,  delta  leases  provide  the
       unique  host_id's  used in paxos leases.  paxos leases also refer to delta leases to check
       if a host_id is alive.

       Before sanlock can be used, the user must assign each host a host_id, which  is  a  number
       between  1  and  2000.   Two hosts should not be given the same host_id (even though delta
       leases attempt to detect this mistake.)

       sanlock views a pool of storage as a "lockspace".  Each distinct  pool  of  storage,  e.g.
       from  different sources, would typically be defined as a separate lockspace, with a unique
       lockspace name.

       Part of this storage space must be reserved and initialized for  sanlock  to  store  delta
       leases.  Each host that wants to use the lockspace must first acquire a delta lease on its
       host_id number within the lockspace.   (See  the  add_lockspace  action/api.)   The  space
       required  for 2000 delta leases in the lockspace (for 2000 possible host_id's) is 1MB (8MB
       for 4k sectors).  (This is the same size required for a single paxos lease.)

       More storage space must be reserved and initialized for paxos  leases,  according  to  the
       needs of the applications using sanlock.

       The  following  steps  illustrate these concepts using the command line.  Applications may
       choose to do these same steps through libsanlock.

       1. Create storage pools and reserve and initialize host_id leases
       two different LUNs on a SAN: /dev/sdb, /dev/sdc
       # vgcreate pool1 /dev/sdb
       # vgcreate pool2 /dev/sdc
       # lvcreate -n hostid_leases -L 1MB pool1
       # lvcreate -n hostid_leases -L 1MB pool2
       # sanlock direct init -s LS1:0:/dev/pool1/hostid_leases:0
       # sanlock direct init -s LS2:0:/dev/pool2/hostid_leases:0

       2. Start the sanlock daemon on each host
       # sanlock daemon

       3. Add each lockspace to be used
       host1:
       # sanlock client add_lockspace -s LS1:1:/dev/pool1/hostid_leases:0
       # sanlock client add_lockspace -s LS2:1:/dev/pool2/hostid_leases:0
       host2:
       # sanlock client add_lockspace -s LS1:2:/dev/pool1/hostid_leases:0
       # sanlock client add_lockspace -s LS2:2:/dev/pool2/hostid_leases:0

       4. Applications can now reserve/initialize space for resource leases, and then acquire the
       leases as they need to access the resources.

       The  resource  leases  that  are created and how they are used depends on the application.
       For example, say application A, running on host1 and host2, needs to synchronize access to
       data it stores on /dev/pool1/Adata.  A could use a resource lease as follows:

       5. Reserve and initialize a single resource lease for Adata
       # lvcreate -n Adata_lease -L 1MB pool1
       # sanlock direct init -r LS1:Adata:/dev/pool1/Adata_lease:0

       6.   Acquire   the   lease   from   the   app   using  libsanlock  (see  sanlock_register,
       sanlock_acquire).  If the app is already running as pid 123, and has registered  with  the
       sanlock daemon, the lease can be added for it manually.
       # sanlock client acquire -r LS1:Adata:/dev/pool1/Adata_lease:0 -p 123

       offsets

       offsets  must  be  1MB  aligned for disks with 512 byte sectors, and 8MB aligned for disks
       with 4096 byte sectors.

       offsets may be used to place leases on the same device rather than using separate  devices
       and offset 0 as shown in examples above, e.g. these commands above:
       # sanlock direct init -s LS1:0:/dev/pool1/hostid_leases:0
       # sanlock direct init -r LS1:Adata:/dev/pool1/Adata_lease:0
       could be replaced by:
       # sanlock direct init -s LS1:0:/dev/pool1/leases:0
       # sanlock direct init -r LS1:Adata:/dev/pool1/leases:1048576

       failures

       If  a process holding resource leases fails or exits without releasing its leases, sanlock
       will release the leases for it automatically.

       If the sanlock daemon cannot renew a lockspace host_id  for  a  specific  period  of  time
       (usually because storage access is lost), sanlock will kill any process holding a resource
       lease within the lockspace.

       If the sanlock daemon crashes or gets stuck, it will no longer renew the  expiry  time  of
       its  per-host_id  connections  to  the wdmd daemon, and the watchdog device will reset the
       host.

       watchdog

       sanlock uses the wdmd(8) daemon to access /dev/watchdog.  A separate  wdmd  connection  is
       maintained  with  wdmd  for  each  host_id  being renewed.  Each host_id connection has an
       expiry time for some seconds in  the  future.   After  each  successful  host_id  renewal,
       sanlock updates the associated expiry time in wdmd.  If wdmd finds any connection expired,
       it will not  pet  /dev/watchdog.   After  enough  successive  expired/failed  checks,  the
       watchdog device will fire and reset the host.

       After a number of failed attempts to renew a host_id, sanlock kills any process using that
       lockspace.  Once all those processes have exited, sanlock will unregister  the  associated
       wdmd connection.  wdmd will no longer find the expired connection, and will resume petting
       /dev/watchdog (assuming it finds no other failed/expired tests.)  If the killed  processes
       did  not  exit  quickly  enough, the expired wdmd connection will not be unregistered, and
       /dev/watchdog will reset the host.

       Based on these known timeout values, sanlock on another host can calculate, based  on  the
       last host_id renewal, when the failed host will have been reset by its watchdog (or killed
       all the necessary processes).

       If the sanlock daemon itself fails, crashes, get stuck,  it  will  no  longer  update  the
       expiry  time  for  its  host_id  connections to wdmd, which will also lead to the watchdog
       resetting the host.

       safety

       sanlock leases are meant to guarantee that two process on two hosts are never  allowed  to
       hold  the  same resource lease at once.  If they were, the resource being protected may be
       corrupted.  There are three levels of protection built into sanlock itself:

       1. The paxos leases and delta leases themselves.

       2. If the leases cannot function because storage  access  is  lost  (host_id's  cannot  be
       renewed), the sanlock daemon kills any pids using resource leases in the lockspace.

       3.  If  the  pids  do  not  exit  after  being killed, or if the sanlock daemon fails, the
       watchdog device resets the host.

OPTIONS

       COMMAND can be one of three primary top level choices

       sanlock daemon start daemon
       sanlock client send request to daemon (default command if none given)
       sanlock direct access storage directly (no coordination with daemon)

       sanlock daemon [options]

       -D no fork and print all logging to stderr

       -Q 0|1 quiet error messages for common lock contention

       -R 0|1 renewal debugging, log debug info for each renewal

       -L pri write logging at priority level and up to logfile (-1 none)

       -S pri write logging at priority level and up to syslog (-1 none)

       -U uid user id

       -G gid group id

       -t num max worker threads

       -w 0|1 use watchdog through wdmd

       -h 0|1 use high priority features (realtime scheduling, mlockall)

       -a 0|1 use async i/o

       -o sec io timeout in seconds

       sanlock client action [options]

       sanlock client status

       Print processes, lockspaces, and resources being manged by the sanlock daemon.  Add -D  to
       show extra internal daemon status for debugging.  Add -o p to show resources by pid, or -o
       s to show resources by lockspace.

       sanlock client host_status -s LOCKSPACE

       Print state of host_id delta leases read during the last renewal.  Only lockspace_name  is
       used  from  the  LOCKSPACE  argument.   Add  -D  to  show extra internal daemon status for
       debugging.

       sanlock client log_dump

       Print the sanlock daemon internal debug log.

       sanlock client shutdown

       Ask the sanlock daemon to exit.  Without the force option (-f  0),  the  command  will  be
       ignored  if  any lockspaces exist.  With the force option (-f 1), any registered processes
       will be killed, their resource leases released, and lockspaces removed.

       sanlock client init -s LOCKSPACE
       sanlock client init -r RESOURCE

       Tell the sanlock daemon to initialize storage for lease areas.  (See sanlock direct init.)

       sanlock client align -s LOCKSPACE

       Tell the sanlock daemon to report the required lease alignment for a storage  path.   Only
       path is used from the LOCKSPACE argument.

       sanlock client add_lockspace -s LOCKSPACE

       Tell  the  sanlock  daemon  to  acquire the specified host_id in the lockspace.  This will
       allow resources to be acquired in the lockspace.

       sanlock client inq_lockspace -s LOCKSPACE

       Ask to the sanlock daemon weather the lockspace is acquired or not.

       sanlock client rem_lockspace -s LOCKSPACE

       Tell the sanlock daemon to release the specified host_id in the lockspace.  Any  processes
       holding  resource  leases  in  this  lockspace will be killed, and the resource leases not
       released.

       sanlock client command -r RESOURCE -c path args

       Register with the sanlock daemon, acquire the  specified  resource  lease,  and  exec  the
       command  at  path  with args.  When the command exits, the sanlock daemon will release the
       lease.  -c must be the final option.

       sanlock client acquire -r RESOURCE -p pid
       sanlock client release -r RESOURCE -p pid

       Tell the sanlock daemon to acquire or release the specified resource lease for  the  given
       pid.   The  pid must be registered with the sanlock daemon.  acquire can optionally take a
       versioned RESOURCE string RESOURCE:lver, where lver is the version of the lease that  must
       be acquired, or fail.

       sanlock client inquire -p pid

       Print  the  resource leases held the given pid.  The format is a versioned RESOURCE string
       "RESOURCE:lver" where lver is the version of the lease held.

       sanlock client request -r RESOURCE -f force_mode

       Request the owner of a  resource  do  something  specified  by  force_mode.   A  versioned
       RESOURCE:lver  string  must  be  used with a greater version than is presently held.  Zero
       lver and force_mode clears the request.

       sanlock client examine -r RESOURCE

       Examine the request record for the currently held resource lease and carry out the  action
       specified by the requested force_mode.

       sanlock client examine -s LOCKSPACE

       Examine  requests  for  all  resource  leases currently held in the named lockspace.  Only
       lockspace_name is used from the LOCKSPACE argument.

       sanlock direct action [options]

       -a 0|1 use async i/o

       -o sec io timeout in seconds

       sanlock direct init -s LOCKSPACE
       sanlock direct init -r RESOURCE

       Initialize storage for 2000 host_id (delta) leases for the given lockspace, or  initialize
       storage  for  one resource (paxos) lease.  Both options require 1MB of space.  The host_id
       in the LOCKSPACE string is not relevant to initialization, so the value is ignored.   (The
       default  of  2000  host_ids can be changed for special cases using the -n num_hosts and -m
       max_hosts options.)

       sanlock direct read_leader -s LOCKSPACE
       sanlock direct read_leader -r RESOURCE

       Read a leader record from disk and print the fields.  The  leader  record  is  the  single
       sector of a delta lease, or the first sector of a paxos lease.

       sanlock direct read_id -s LOCKSPACE
       sanlock direct live_id -s LOCKSPACE

       read_id reads a host_id and prints the owner.  live_id reads a host_id once a second until
       it the timestamp or owner change (prints live 1), or until host_dead_seconds (prints  live
       0).   (host_dead_seconds  is  derived from the io_timeout option.  The live 0|1 conclusion
       will not match the sanlock daemon's conclusion unless the configured timeouts match.)

       sanlock direct dump path[:offset]

       Read disk sectors and print leader records for delta or paxos leases.  Add -f 1  to  print
       the request record values for paxos leases, and host_ids set in delta lease bitmaps.

   LOCKSPACE option string
       -s lockspace_name:host_id:path:offset

       lockspace_name name of lockspace
       host_id local host identifier in lockspace
       path path to storage reserved for leases
       offset offset on path (bytes)

   RESOURCE option string
       -r lockspace_name:resource_name:path:offset

       lockspace_name name of lockspace
       resource_name name of resource
       path path to storage reserved for leases
       offset offset on path (bytes)

   RESOURCE option string with version
       -r lockspace_name:resource_name:path:offset:lver

       lver leader version or SH for shared lease

   Defaults
       sanlock help shows the default values for the options above.

       sanlock version shows the build version.

USAGE

   Request/Examine
       The  first  part  of  making a request for a resource is writing the request record of the
       resource (the sector following the leader record).  To make a successful request:

       •  RESOURCE:lver must be greater than the lver presently held by  the  other  host.   This
          implies the leader record must be read to discover the lver, prior to making a request.

       •  RESOURCE:lver  must  be  greater  than  or  equal  to the lver presently written to the
          request record.  Two hosts may write a new request at the same time for the same  lver,
          in which case both would succeed, but the force_mode from the last would win.

       •  The force_mode must be greater than zero.

       •  To  unconditionally  clear the request record (set both lver and force_mode to 0), make
          request with RESOURCE:0 and force_mode 0.

       The owner of the requested resource will not know of the request unless it  is  explicitly
       told to examine its resources via the "examine" api/command, or otherwise notfied.

       The  second  part of making a request is notifying the resource lease owner that it should
       examine the request records of its resource leases.  The notification will cause the lease
       owner to automatically run the equivalent of "sanlock client examine -s LOCKSPACE" for the
       lockspace of the requested resource.

       The notification is made using a bitmap in each host_id delta lease.  Each bit  represents
       each  of  the possible host_ids (1-2000).  If host A wants to notify host B to examine its
       resources, A sets the bit in its own bitmap that corresponds to the host_id of B.  When  B
       next  renews  its  delta  lease,  it  reads the delta leases for all hosts and checks each
       bitmap to see if its own host_id has been set.  It finds the bit for its own  host_id  set
       in  A's  bitmap,  and  examines its resource request records.  (The bit remains set in A's
       bitmap for request_finish_seconds.)

       force_mode determines the action the resource lease owner should take:

       1 (KILL_PID): kill the process holding the resource lease.  When the process  has  exited,
       the resource lease will be released, and can then be acquired by anyone.

SEE ALSO

       wdmd(8)

                                            2011-08-05                                 SANLOCK(8)