Provided by: blcr-util_0.8.5-2.1_amd64 bug

NAME

       cr_restart - restarts a process, process group, or session from a checkpoint file.

SYNOPSIS

       cr_restart [options] [checkpoint_file]

DESCRIPTION

       cr_restart  restarts  a  process (or set of processes) from a checkpoint file created with
       cr_checkpoint(1).

       A restarted process has all of the attributes they had at checkpoint time,  including  its
       process  id.  If any needed resources cannot be attained for the processes in a checkpoint
       file (ex: a pid is in use), cr_restart will fail.   If  a  process  group  or  session  is
       restarted,  all  parent/child  relations  and  pipes,  etc.,  between the processes in the
       checkpoint will be correctly restored.

       If the stdin/stdout/stderr of  any  restarted  process  was  directed  to  a  terminal  at
       checkpoint time, it is redirected to the controlling terminal of the cr_restart program.

       The  current  working  directory  of  a  restarted  process  is  the  same  as when it was
       checkpointed, regardless of where the context file is  located,  or  where  cr_restart  is
       invoked.

       The  cr_restart  process  becomes the parent of the 'eldest' process in any restarted job.
       This means that getppid(2) may return a  different  value  to  the  eldest  process  after
       restart.  When the eldest restarted process exits (or dies from a signal), cr_restart will
       exit with the same error code (or kill itself with the same  signal),  so  it  is  largely
       invisible  (it  is  necessary  to  keep  cr_restart  `in-between' your shell and restarted
       processes, however, as most Unix shells get quite confused if they observe their  children
       changing process ids).

   Signals
       By default restarted processes begin to run after the restart is complete.  Alternatively,
       you may specify that they be  stopped  (via  --stop),  or  terminated/aborted/killed  (via
       --term,  --abort,  or  --kill).   This  is done by sending the appropriate signal to every
       process that is part of the restart.  If the  processes  were  stopped  at  the  time  the
       checkpoint  was  requested, then --cont may be used to send SIGCONT to all processes after
       the restart is completed.

   Error handling
       By default cr_restart will block until the restarted process has completed, and will  exit
       with the same exit value as the restarted process (even if the restarted process died with
       a fatal signal).  This can make it nearly impossible to determine if a non-zero exit  from
       cr_restart  is  due  to a failure to restart, or is the exit code of a correctly restarted
       process.   The  simple  approach  of  looking  for  'Restart  failed:'  is  not  reliable.
       Therefore,  the  --run-on-*  family  of  flags  are  available  to  supply alternative (or
       supplementary) error handling.  When any of the --run-on-* flags  is  passed,  a  hook  is
       installed for the given category of failure (or success), as defined below.  When an error
       (or success) is detected and a corresponding hook is installed, the hook is  run  via  the
       system(3)  function.   If  the  exit code of the hook is non-zero, then cr_restart returns
       this value, suppressing any error message that would otherwise be generated.  If  no  hook
       is  installed,  the  hook is an empty string, or if the hook returns an exit code of zero,
       then an explanatory error message is printed and an exit code related to the  errno  value
       at the time of failure is returned.

       --run-on-success='cmd'
              Runs  the  given  command  as  soon  as  the  restarted process(es) are known to be
              running.  If the return value of 'cmd' is non-zero, this also results in cr_restart
              terminating without waiting on termination of the restarted process(es).

       --run-on-fail-args='cmd'
              Runs  the  given  command  if the arguments are invalid.  This includes the case in
              which the given context file is missing or unreadable.

       --run-on-fail-temp='cmd'
              Runs the given command if a "temporary" failure is  detected.   This  includes  the
              case of a required pid being in use.

       --run-on-fail-perm='cmd'
              Runs the given command if a "permanent" failure is detected.  This is most commonly
              due to a corrupted context file.

       --run-on-fail-env='cmd'
              Runs the given command if an "environmental" failure is  detected.   This  includes
              when files required for restarting are missing or inaccessible.

       --run-on-failure='cmd'
              This installs the given command for all of the --run-on-fail-* hooks.

   File relocation
       By default, files and directories are saved `by reference', storing their full pathname in
       the context file.  This includes files  associated  with  a  process  via  open(2)  and/or
       mmap(2)  and  directories  associated  via opendir(3) or as the current working directory.
       Use of --relocate oldpath=newpath allows remapping of  such  paths  to  new  locations  at
       restart-time.

       When  parsing  the  --relocate argument the sequences `\=' and `\\' are interpreted as `='
       and `\', respectively, to allow for  paths  that  contain  the  `='  character.   The  `\'
       character  is  not  special  in  any  other  context.  (Note that command shells also have
       special treatment of `\' and you may therefore need quotes or additional `\' characters to
       pass the argument you intend.)

       When  file  or  directory  associations are restored, the oldpath is compared to the saved
       fullpath of each file or directory.  If it matches the leading components of the path, the
       matching portion is replaced by the value of newpath.  Note that oldpath must match entire
       path components, and only leading components.  Therefore an oldpath of /tmp/foo will match
       /tmp/foo  or  /tmp/foo/1, but will not match to /tmp/fooz (not matching the full component
       fooz) or to /var/tmp/foo (not matching the leading component /var.)

       It is important to be aware the the saved fullpaths in a context file  are  the  canonical
       paths.   Therefore  the  oldpath  you  provide  must  also be a canonical path, though the
       newpath doesn't need to be.  For instance, if /tmp is a symbolic link to /var/tmp, then if
       your application opens the file /tmp/work/1234 the path stored in the context file will be
       /var/tmp/work/1234.  Therefore,
              --relocate /tmp/work=/tmp/play
       would not work as desired, but either of the following would:
              --relocate /var/tmp/work=/tmp/play
              --relocate /var/tmp/work=/var/tmp/play

       If the --relocate option is passed multiple times, all are applied  to  restored  file  or
       directory  associations, but only the first match is applied to any given path.  Currently
       a maximum of 16 relocations is supported.

   PID and related identifiers
       By default, processes are restarted with the same  pid  and  thread  id  (as  returned  by
       getpid(2),  and  gettid(2) respectively).  This default ensures that processes and threads
       that signal each other and processes that wait  on  children  will  continue  to  function
       correctly.   However,  this  prevents  restarting concurrent instances of the same context
       file.

       By default, the process group and session (as returned by getpgrp(2), and  getsid(2))  are
       set to those of the cr_restart program.  This ensures that job control via the requester's
       session leader (typically a login shell) will continue to  function  correctly.   However,
       this  interferes  with  any  job control or process group signaling that may be take place
       among the restarted processes.

       There are options to individually control whether the pid, process group and  session  are
       restored  to  their  saved  values  or  assume  new  values (the process group and session
       inherited from cr_restart and a fresh pid obtained from fork(2)).  There  is  no  separate
       control  for  the  thread ids, as they must always follow the same policy as the pid.  The
       following describes each option, along with outlining some of the  risks  associated  with
       the non-default ones:

       --restore-pid
              (default) This causes pid and thread ids to be restored to their saved values.

       --no-restore-pid
              This  causes  pid  and thread ids to assume new values.  Any multi-threaded process
              has the possibility of using functions like  tkill(2)  which  will  not  behave  as
              desired  if  the  thread  ids  are  not  restored.   Similarly,  any  multi-process
              application may make use of kill(2)  or  waitpid(2),  among  others,  that  require
              restored pids for correct operation.  It is also worth noting that many versions of
              glibc will cache the result of getpid(), which may result in  calls  after  restore
              returning the original value, even though the pid was changed by the restart.

       --restore-pgid
              This  causes  the  process group ids to be restored to their saved values.  This is
              required for correct operation of any multi-process application  that  may  perform
              signal  or wait operations on process groups (as by passing a negative pid value to
              kill(2) or waitpid(2), among others), or which uses process groups  for  POSIX  job
              control operations.  This is NOT the default behavior because restoring the process
              group ids will prevent job control by the requester's shell (or  other  controlling
              process).

       --no-restore-pgid
              (default)  This  causes  the  restarted  processes to join the process group of the
              cr_restart process.

       --restore-sid
              This causes the session ids  to  be  restored  to  their  saved  values.   This  is
              required,  for  instance, for systems that are performing batch accounting based on
              the session id.

       --no-restore-sid
              (default) This causes the restarted processes to join the session of the cr_restart
              process.

       Note  that  use  of --restore-pgid or --restore-sid will produce an error in the case that
       the required identifiers are in use in the system.  This  includes  the  possibility  that
       they conflict the the process group or session of cr_restart.

OPTIONS

   General options:
       -?, --help
              print this help message.

       -v, --version
              print version information.

       -q, --quiet
              suppress error/warning messages to stderr.

   Options for source location of the checkpoint:
       -d, --dir DIR
              checkpoint  read  from  directory  DIR,  with  one  'context.ID'  file  per process
              (unimplemented).

       -f, --file FILE
              checkpoint read from FILE.

       -F, --fd FD
              checkpoint read from an open file descriptor.

              Options in this group are mutually exclusive.  If no  option  is  given  from  this
              group, the default is to take the final argument as FILE.

   Options for signal sent to process(es) after restart:
       --run  no signal sent: continue execution (default).

       -S, --signal NUM
              signal NUM sent to all processes/threads.

       --stop SIGSTOP sent to all processes.

       --term SIGTERM sent to all processes.

       --abort
              SIGABRT sent to all processes.

       --kill SIGKILL sent to all processes.

       --cont SIGCONT sent to all processes.

              Options  in this group are mutually exclusive.  If more than one is given then only
              the last will be honored.

   Options for checkpoints of restarted process(es):
       --omit-maybe
              use a heuristic to omit cr_restart from checkpoints (default)

       --omit-always
              always omit cr_restart from checkpoints

       --omit-never
              never omit cr_restart from checkpoints

   Options for alternate error handling:
       --run-on-success='cmd'
              run the given command on success

       --run-on-fail-args='cmd'
              run the given command invalid arguments

       --run-on-fail-temp='cmd'
              run the given command on 'temporary' failure

       --run-on-fail-env='cmd'
              run the given command on 'environmental' failure

       --run-on-fail-perm='cmd'
              run the given command on 'permanent' failure

       --run-on-failure='cmd'
              run the given command on any failure

   Options for relocation:
       --relocate OLDPATH=NEWPATH
              map paths of files and directories to new locations by prefix replacement.

       Options for restoring pid, process group and session ids

       --restore-pid
              restore pids to saved values (default).

       --no-restore-pid
              restart with new pids.

       --restore-pgid
              restore pgid to saved values.

       --no-restore-pgid
              restart with new pgids (default).

       --restore-sid
              restore sid to saved values.

       --no-restore-sid
              restart with new sids (default).

              Options in each restore/no-restore pair are mutually exclusive.  If both are  given
              then only the last will be honored.

   Options for kernel log messages (default is --kmsg-error):
       --kmsg-none
              don't report any kernel messages.

       --kmsg-error
              on  restart  failure,  report  on  stderr  any  kernel messages associated with the
              restart request.

       --kmsg-warning
              report  on  stderr  any  kernel  messages  associated  with  the  restart  request,
              regardless of success or failure.  Messages generated in the absence of failure are
              considered to be warnings.

              Options in this group are mutually exclusive.  If more than one is given then  only
              the  last  will  be  honored.   Note  that  --quiet  suppresses  all stderr output,
              including these messages.

AUTHORS

       Jason Duell, Paul Hargrove, and Eric Roman, Lawrence Berkeley National Laboratory.

REPORTING BUGS

       Bug reports may be filed on the web at http://mantis.lbl.gov/bugzilla.

SEE ALSO

       cr_run(1), cr_checkpoint(1),