Provided by: blcr-util_0.8.5-2.1_amd64 bug

NAME

       cr_restart - restarts a process, process group, or session from a checkpoint file.

SYNOPSIS

       cr_restart [options] [checkpoint_file]

DESCRIPTION

       cr_restart restarts a process (or set of processes) from a checkpoint file created with cr_checkpoint(1).

       A  restarted process has all of the attributes they had at checkpoint time, including its process id.  If
       any needed resources cannot be attained for the processes in a checkpoint file (ex: a  pid  is  in  use),
       cr_restart  will fail.  If a process group or session is restarted, all parent/child relations and pipes,
       etc., between the processes in the checkpoint will be correctly restored.

       If the stdin/stdout/stderr of any restarted process was directed to a terminal at checkpoint time, it  is
       redirected to the controlling terminal of the cr_restart program.

       The  current working directory of a restarted process is the same as when it was checkpointed, regardless
       of where the context file is located, or where cr_restart is invoked.

       The cr_restart process becomes the parent of the 'eldest' process in any restarted job.  This means  that
       getppid(2)  may  return a different value to the eldest process after restart.  When the eldest restarted
       process exits (or dies from a signal), cr_restart will exit with the same error code (or kill itself with
       the  same signal), so it is largely invisible (it is necessary to keep cr_restart `in-between' your shell
       and restarted processes, however, as most Unix shells get quite confused if they observe  their  children
       changing process ids).

   Signals
       By  default  restarted  processes  begin  to  run  after the restart is complete.  Alternatively, you may
       specify that they be stopped (via --stop), or terminated/aborted/killed (via --term, --abort, or --kill).
       This  is  done  by  sending  the appropriate signal to every process that is part of the restart.  If the
       processes were stopped at the time the checkpoint was requested, then --cont may be used to send  SIGCONT
       to all processes after the restart is completed.

   Error handling
       By  default  cr_restart will block until the restarted process has completed, and will exit with the same
       exit value as the restarted process (even if the restarted process died with a fatal signal).   This  can
       make it nearly impossible to determine if a non-zero exit from cr_restart is due to a failure to restart,
       or is the exit code of a correctly restarted process.   The  simple  approach  of  looking  for  'Restart
       failed:'  is not reliable.  Therefore, the --run-on-* family of flags are available to supply alternative
       (or supplementary) error handling.  When any of the --run-on-* flags is passed, a hook is  installed  for
       the given category of failure (or success), as defined below.  When an error (or success) is detected and
       a corresponding hook is installed, the hook is run via the system(3) function.  If the exit code  of  the
       hook  is non-zero, then cr_restart returns this value, suppressing any error message that would otherwise
       be generated.  If no hook is installed, the hook is an empty string, or if the hook returns an exit  code
       of  zero, then an explanatory error message is printed and an exit code related to the errno value at the
       time of failure is returned.

       --run-on-success='cmd'
              Runs the given command as soon as the restarted process(es) are  known  to  be  running.   If  the
              return  value of 'cmd' is non-zero, this also results in cr_restart terminating without waiting on
              termination of the restarted process(es).

       --run-on-fail-args='cmd'
              Runs the given command if the arguments are invalid.  This includes the case in  which  the  given
              context file is missing or unreadable.

       --run-on-fail-temp='cmd'
              Runs the given command if a "temporary" failure is detected.  This includes the case of a required
              pid being in use.

       --run-on-fail-perm='cmd'
              Runs the given command if a "permanent" failure is detected.  This  is  most  commonly  due  to  a
              corrupted context file.

       --run-on-fail-env='cmd'
              Runs  the  given  command  if  an  "environmental"  failure is detected.  This includes when files
              required for restarting are missing or inaccessible.

       --run-on-failure='cmd'
              This installs the given command for all of the --run-on-fail-* hooks.

   File relocation
       By default, files and directories are saved `by reference', storing their full pathname  in  the  context
       file.   This  includes  files  associated  with  a  process  via  open(2)  and/or mmap(2) and directories
       associated via opendir(3) or as the current working directory.  Use of --relocate oldpath=newpath  allows
       remapping of such paths to new locations at restart-time.

       When  parsing  the  --relocate  argument  the  sequences  `\='  and  `\\' are interpreted as `=' and `\',
       respectively, to allow for paths that contain the `=' character.  The `\' character is not special in any
       other  context.   (Note that command shells also have special treatment of `\' and you may therefore need
       quotes or additional `\' characters to pass the argument you intend.)

       When file or directory associations are restored, the oldpath is compared to the saved fullpath  of  each
       file or directory.  If it matches the leading components of the path, the matching portion is replaced by
       the value of newpath.  Note that oldpath must match entire path components, and only leading  components.
       Therefore  an oldpath of /tmp/foo will match /tmp/foo or /tmp/foo/1, but will not match to /tmp/fooz (not
       matching the full component fooz) or to /var/tmp/foo (not matching the leading component /var.)

       It is important to be aware the the saved fullpaths in a context file are the canonical paths.  Therefore
       the  oldpath  you  provide  must  also  be  a canonical path, though the newpath doesn't need to be.  For
       instance, if /tmp is a symbolic link to /var/tmp, then if your application opens the file  /tmp/work/1234
       the path stored in the context file will be /var/tmp/work/1234.  Therefore,
              --relocate /tmp/work=/tmp/play
       would not work as desired, but either of the following would:
              --relocate /var/tmp/work=/tmp/play
              --relocate /var/tmp/work=/var/tmp/play

       If  the  --relocate  option  is  passed  multiple  times,  all  are applied to restored file or directory
       associations, but only the first match is  applied  to  any  given  path.   Currently  a  maximum  of  16
       relocations is supported.

   PID and related identifiers
       By  default,  processes  are  restarted  with  the  same pid and thread id (as returned by getpid(2), and
       gettid(2) respectively).  This default ensures that processes and threads  that  signal  each  other  and
       processes  that  wait on children will continue to function correctly.  However, this prevents restarting
       concurrent instances of the same context file.

       By default, the process group and session (as returned by getpgrp(2), and getsid(2)) are set to those  of
       the  cr_restart  program.   This ensures that job control via the requester's session leader (typically a
       login shell) will continue to function correctly.  However, this  interferes  with  any  job  control  or
       process group signaling that may be take place among the restarted processes.

       There  are  options  to  individually  control whether the pid, process group and session are restored to
       their saved values or assume new values (the process group and session inherited from  cr_restart  and  a
       fresh  pid  obtained from fork(2)).  There is no separate control for the thread ids, as they must always
       follow the same policy as the pid.  The following describes each option, along with outlining some of the
       risks associated with the non-default ones:

       --restore-pid
              (default) This causes pid and thread ids to be restored to their saved values.

       --no-restore-pid
              This  causes  pid  and  thread  ids  to  assume  new  values.   Any multi-threaded process has the
              possibility of using functions like tkill(2) which will not behave as desired if  the  thread  ids
              are not restored.  Similarly, any multi-process application may make use of kill(2) or waitpid(2),
              among others, that require restored pids for correct operation.  It is also worth noting that many
              versions  of  glibc  will  cache  the  result of getpid(), which may result in calls after restore
              returning the original value, even though the pid was changed by the restart.

       --restore-pgid
              This causes the process group ids to be restored to their saved  values.   This  is  required  for
              correct  operation  of any multi-process application that may perform signal or wait operations on
              process groups (as by passing a negative pid value to kill(2) or  waitpid(2),  among  others),  or
              which  uses  process  groups  for  POSIX job control operations.  This is NOT the default behavior
              because restoring the process group ids will prevent job control  by  the  requester's  shell  (or
              other controlling process).

       --no-restore-pgid
              (default) This causes the restarted processes to join the process group of the cr_restart process.

       --restore-sid
              This causes the session ids to be restored to their saved values.  This is required, for instance,
              for systems that are performing batch accounting based on the session id.

       --no-restore-sid
              (default) This causes the restarted processes to join the session of the cr_restart process.

       Note that use of --restore-pgid or --restore-sid will produce an error in  the  case  that  the  required
       identifiers  are  in use in the system.  This includes the possibility that they conflict the the process
       group or session of cr_restart.

OPTIONS

   General options:
       -?, --help
              print this help message.

       -v, --version
              print version information.

       -q, --quiet
              suppress error/warning messages to stderr.

   Options for source location of the checkpoint:
       -d, --dir DIR
              checkpoint read from directory DIR, with one 'context.ID' file per process (unimplemented).

       -f, --file FILE
              checkpoint read from FILE.

       -F, --fd FD
              checkpoint read from an open file descriptor.

              Options in this group are mutually exclusive.  If no option is given from this group, the  default
              is to take the final argument as FILE.

   Options for signal sent to process(es) after restart:
       --run  no signal sent: continue execution (default).

       -S, --signal NUM
              signal NUM sent to all processes/threads.

       --stop SIGSTOP sent to all processes.

       --term SIGTERM sent to all processes.

       --abort
              SIGABRT sent to all processes.

       --kill SIGKILL sent to all processes.

       --cont SIGCONT sent to all processes.

              Options  in  this group are mutually exclusive.  If more than one is given then only the last will
              be honored.

   Options for checkpoints of restarted process(es):
       --omit-maybe
              use a heuristic to omit cr_restart from checkpoints (default)

       --omit-always
              always omit cr_restart from checkpoints

       --omit-never
              never omit cr_restart from checkpoints

   Options for alternate error handling:
       --run-on-success='cmd'
              run the given command on success

       --run-on-fail-args='cmd'
              run the given command invalid arguments

       --run-on-fail-temp='cmd'
              run the given command on 'temporary' failure

       --run-on-fail-env='cmd'
              run the given command on 'environmental' failure

       --run-on-fail-perm='cmd'
              run the given command on 'permanent' failure

       --run-on-failure='cmd'
              run the given command on any failure

   Options for relocation:
       --relocate OLDPATH=NEWPATH
              map paths of files and directories to new locations by prefix replacement.

       Options for restoring pid, process group and session ids

       --restore-pid
              restore pids to saved values (default).

       --no-restore-pid
              restart with new pids.

       --restore-pgid
              restore pgid to saved values.

       --no-restore-pgid
              restart with new pgids (default).

       --restore-sid
              restore sid to saved values.

       --no-restore-sid
              restart with new sids (default).

              Options in each restore/no-restore pair are mutually exclusive.  If both are given then  only  the
              last will be honored.

   Options for kernel log messages (default is --kmsg-error):
       --kmsg-none
              don't report any kernel messages.

       --kmsg-error
              on restart failure, report on stderr any kernel messages associated with the restart request.

       --kmsg-warning
              report on stderr any kernel messages associated with the restart request, regardless of success or
              failure.  Messages generated in the absence of failure are considered to be warnings.

              Options in this group are mutually exclusive.  If more than one is given then only the  last  will
              be honored.  Note that --quiet suppresses all stderr output, including these messages.

AUTHORS

       Jason Duell, Paul Hargrove, and Eric Roman, Lawrence Berkeley National Laboratory.

REPORTING BUGS

       Bug reports may be filed on the web at http://mantis.lbl.gov/bugzilla.

SEE ALSO

       cr_run(1), cr_checkpoint(1),