lunar (1) lamshrink.1.gz

Provided by: lam-runtime_7.1.4-7_amd64 bug

NAME

       lamshrink - Shrink a LAM universe.

SYNOPSIS

       lamshrink [-dhv] [-w delay] nodeid

OPTIONS

       -d            Print detailed debugging information.

       -h            Print useful information on this command.

       -v            Be verbose.

       -w delay      Notify  processes  on  the  doomed  node  and pause for delay seconds before
                     proceeding.

       nodeid        Remove the LAM node with this ID.

DESCRIPTION

       An existing LAM session, initiated by lamboot(1), can be shrunk to include less nodes with
       lamshrink.   One  node is removed for each invocation.  At a minimum, the node ID is given
       on the command line.  Once  lamshrink  completes,  the  node  ID  is  invalid  across  the
       remaining nodes (as can be seen by running lamnodes(1)).

       Existing application processes on the target node can be warned of impending shutdown with
       the -w option.  A LAM signal (SIGFUSE) will be sent to these processes and lamshrink  will
       then  pause  for the given number of seconds before proceeding with removing the node.  By
       default, SIGFUSE is ignored.  A different handler can be installed with ksignal(2).

       All application processes on all remaining nodes are always informed of  the  death  of  a
       node.   This  is  also done with a signal (SIGSHRINK), which by default causes a process's
       runtime route cache to be flushed (to remove any cached information on the dead node).  If
       this  signal  is re-vectored for the purpose of fault tolerance, the old handler should be
       called at the beginning of the new handler.  The signal does  not,  by  itself,  give  the
       process  information  on  which  node  has  been  removed.  One technique for getting this
       information  is  to  query  the  router  for  information  on  all  relevant  nodes  using
       getroute(2).  The dead node will cause this routine to return an error.

   FAULT TOLERANCE
       If  enabled  with  lamboot(1),  LAM  will  watch  for  nodes that fail.  The procedure for
       removing a node that has failed is the same as  lamshrink  after  the  warning  step.   In
       particular, the SIGSHRINK signal is delivered.

EXAMPLES

       lamshrink -v n1 Remove LAM on n1.  Report about important steps as
           they are done.

       lamshrink n30 -w 10
           Inform  all  processes on LAM node 30, that the node will be dead in 10 seconds.  Wait
           10 seconds and remove the node.  Operate silently.

SEE ALSO

       lamboot(1), lamnodes(1), ksignal(2), getroute(2)