Provided by: blcr-util_0.8.5-2.3_amd64 bug

NAME

       cr_checkpoint - checkpoints a process, process group, or session.

SYNOPSIS

       cr_checkpoint [options] ID

DESCRIPTION

       Invoking  cr_checkpoint  causes  a  process  (with or without all of its descendants), all
       processes within a process group, or all processes within a session, to  be  checkpointed.
       The result is a checkpoint file (or a directory with one checkpoint file per process) that
       contains all the state needed to restart the process(es) at a  later  time.   Checkpointed
       processes can be restarted via cr_restart(1).

       To  be  checkpointed by cr_checkpoint, a process must have the libcr.so library (or one of
       its relatives) loaded.  This can be achieved by starting the program with cr_run(1), or by
       linking  your application with -lcr.  Or, the library may be loaded by other libraries you
       have linked with (such as a checkpoint-ready MPI library), or your system's  parallel  job
       startup script, etc.  Check your system documentation for details.

   File creation/replacement
       By default (or if --atomic is passed) cr_checkpoint creates the new context file/directory
       atomically:  either the checkpoint fails  (and  any  existing  context  file/directory  is
       unchanged), or it appears in the directory ready to be used by cr_restart.  If an existing
       checkpoint with the same file name exists, it will either be be  unmodified  (if  the  new
       checkpoint   fails   for   any   reason),  or  replaced  atomically  (via  rename(2).   If
       --backup[=NAME] is passed, any existing checkpoint will be backed up  instead,  either  to
       NAME  or  with a numbered extension (.~1~, .~2~, etc., with more recent checkpoints having
       higher numbers).  If --clobber is passed,  the  checkpoint  will  immediately  remove  any
       existing  checkpoint  files,  and  will  write the checkpoint directly out into the target
       file/directory: this option uses less disk space if an  existing  checkpoint  is  present,
       since  the  old checkpoint is immediately discarded, but if the checkpoint fails, the pre-
       existing checkpoint is lost.  Finally, if --noclobber is passed, then the checkpoint  will
       fail if the target file/directory exists.

   File sync
       By  default  (or  when  --sync  is  passed),  cr_checkpoint  waits until the checkpoint is
       complete in memory, and additionally calls fsync(2) on all files and directories  involved
       in  the  checkpoint  (including  back-up  files) to disk before exiting.  Passing --nosync
       causes these fsync calls to be skipped.

   Timeout
       A maximum timeout in seconds can be set for a checkpoint  via  the  --time  flag:  if  the
       checkpoint  takes longer than this, cr_checkpoint will print an error mesage and exit with
       an error.  If a timeout occurs, the state of the process  or  processes  that  were  being
       checkpointed is undefined.

   Signals
       By  default  checkpointed  processes  continue  to  run  after  a  checkpoint is complete.
       Alternatively,   you   may   specify   that   they   be   stopped   (via    --stop),    or
       terminated/aborted/killed  (via  --term, --abort, or --kill).  This is done by sending the
       appropriate signal to every process that is part of the checkpoint.  If the processes were
       stopped  at the time the checkpoint was requested, then --cont may be used to send SIGCONT
       to all processes after the checkpoint is completed.

   Memory mapped files
       By default, checkpoints do not include any  files  that  are  mmap()ed  into  the  process
       address  space unless they are already unlinked at the time the checkpoint is taken.  This
       is a space/time saving optimization under the assumption  that  the  files  required  will
       still  be  present (and uncorrupted) at restart time.  Typically the largest savings comes
       from not saving the executable file or dynamic (a.k.a shared) libraries.  However, options
       exist to cause the checkpoint to save these files as well.  The flag --save-exe will cause
       the executable file to be included in the context  file.   The  flag  --save-private  will
       include  in  the  context  file any files that are mapped with the MAP_PRIVATE flag, which
       under Linux includes the executable and dynamic/shared libaries.  The  flag  --save-shared
       is  for  saving files that are mapped with the MAP_SHARED flag.  Note that this is not the
       flag you want for shared libraries.  At restart any file saved  by  these  flags  will  be
       mapped  into  the  process regardless of whether any file exists at the original location.
       If there is file at the original location it remains untouched by  the  restart.   Finally
       --save-all  and  --save-none will cause all (or none) of these optional mmaped files to be
       saved.  The default is --save-none.  When passing  multiple  of  these  options  they  are
       processed from left to right with all options being additive, except for --save-none which
       cancels the effects of any these options appearing earlier.

   Checkpointing ptrace()ed processes
       There is (currently) no way to fully transparently deal with checkpoints of processes that
       are  being  traced  with  ptrace(2).   Therefore, the default behavior (also available via
       --ptraced-error) is to return an error if any of the  processes  to  be  checkpointed  are
       currently being ptraced.  However, there are two other possible behaviors to choose among:

       --ptraced-skip
              Ptraced  processes  will  be  siliently  excluded from the checkpoint.  No error is
              generated unless this results in zero processes checkpointed.

       --ptraced-allow
              Ptraced processes will be checkpointed just like  any  other  processes.   WARNING:
              Because  the  checkpointed  process  and the BLCR kernel module must interact using
              signals and system calls, the debugger (or other tracer) may need to `continue' the
              target process(es), possibly more than once, to allow the checkpoint to complete.

   Checkpointing ptrace()ing processes
       There is (currently) no way to fully transparently deal with checkpoints of processes that
       are tracing other processes  using  ptrace(2).   Therefore,  the  default  behavior  (also
       available  via  --ptracer-error)  is  to  return  an  error  if any of the processes to be
       checkpointed are currently ptracing other processes.  However --ptracer-skip is  available
       to  cause  cr_checkpoint to silently exclude such processes from the checkpoint.  No error
       is generated in that case unless this would result in zero processes checkpointed.

OPTIONS

   General options:
       -v, --verbose
              print progress messages to stderr.

       -q, --quiet
              suppress error/warning messages to stderr.

       -?, --help
              print this message and exit.

       --version
              print version information and exit.

   Options for scope of the checkpoint:
       -T, --tree
              ID identifies a process id.  It and all of its descendants are to be  checkpointed.
              This is the default.

       -p, --pid, --process
              ID identifies a single process id.

       -g, --pgid, --group
              ID identifies a process group id.

       -s, --sid, --session
              ID identifies a session id.

   Options for destination location of the checkpoint:
       -c, --cwd
              checkpoint saved as a single 'context.ID' file in cr_checkpoint's working directory
              (default).

       -d, --dir DIR
              checkpoint saved in new directory DIR,  with  one  'context.ID'  file  per  process
              (unimplemented).

       -f, --file FILE
              checkpoint saved as FILE.

       -F, --fd FD
              checkpoint written to an open file descriptor.

   Options for creation/replacement policy for checkpoint files:
       --atomic
              checkpoint created/replaced atomically (default).

       --backup[=NAME]
              checkpoint  created  atomically,  and  any existing checkpoint backed up to NAME or
              *.~1~, *.~2~, etc.

       --clobber
              checkpoint  written  incrementally  to   target,   overwriting   any   pre-existing
              checkpoint.

       --noclobber
              checkpoint will fail if the target file exists.

              These options are ignored if the destination is a file descriptor.

   Options for signal sent to process(es) after checkpoint:
       --run  no signal sent: continue execution (default).

       -S, --signal NUM
              signal NUM sent to all processess.

       --stop SIGSTOP sent to all processes.

       --term SIGTERM sent to all processes.

       --abort
              SIGABRT sent to all processes.

       --kill SIGKILL sent to all processes.

       --cont SIGCONT sent to all processes.

              Options  in this group are mutually exclusive.  If more than one is given then only
              the last will be honored.

   Options for file system synchronization (default is --sync):
       --sync fsync checkpoint file(s) to disk (default).

       --nosync
              do not fsync checkpoint file(s) to disk.

   Options to save optional portions of memory:
       --save-exe
              save the executable file.

       --save-private
              save private mapped files.  (executables and libraries are mapped this way)

       --save-shared
              save shared mapped files.  (System V IPC is mapped this way).

       --save-all
              save all of the above.

       --save-none
              save none of the above (the default).

   Options for ptraced processes (default is --ptraced-error):
       --ptraced-error
              return an error if a checkpoint is requested of a process being ptraced.

       --ptraced-skip
              ptraced processes are silently  excluded  from  the  checkpoint  request.   If  the
              checkpoint  scope  is  --tree,  then  this  will  also exclude any children of such
              processes.   No  error  is  produced  unless  this  results   in   zero   processes
              checkpointed.

       --ptraced-allow
              checkpoint  ptraced  processes  normally.   WARNING: This may require the tracer to
              "continue" the target process(es), possibly more than once.

   Options for processes ptracing others (default is --ptracer-error):
       --ptracer-error
              return an error if a checkpoint is requested of a process which is ptracing others.

       --ptracer-skip
              processes ptracing others are silently excluded from the  checkpoint  request.   If
              the  checkpoint  scope  is --tree, then this will also exclude any children of such
              processes.   No  error  is  produced  unless  this  results   in   zero   processes
              checkpointed.

   Options for kernel log messages (default is --kmsg-error):
       --kmsg-none
              don't report any kernel messages.

       --kmsg-error
              on  checkpoint  failure,  report  on stderr any kernel messages associated with the
              checkpoint request.

       --kmsg-warning
              report on stderr any  kernel  messages  associated  with  the  checkpoint  request,
              regardless of success or failure.  Messages generated in the absence of failure are
              considered to be warnings.

              Options in this group are mutually exclusive.  If more than one is given then  only
              the  last  will  be  honored.   Note  that  --quiet  suppresses  all stderr output,
              including these messages.

   Misc Options:
       -t, --time SEC
              allow  only  SEC  seconds  for  target  to  complete  checkpoint   (default:   wait
              indefinitely).

EXAMPLES

       To checkpoint the process with process ID 23452, saving its state to file context.23452:

              cr_checkpoint -p 23452

       To checkpoint all the processes in process group 68473, and save them to file groupie:

              cr_checkpoint -g -f groupie 68473

       To  checkpoint  all the process in session 8362, and save separate 'context.PID' files for
       each process in directory 'my_checkpoints':

              cr_checkpoint -s -d my_checkpoints 8362

BUGS

       Some features in this manpage may be unimplemented.

AUTHORS

       Jason Duell, Paul Hargrove, and Eric Roman, Lawrence Berkeley National Laboratory.

REPORTING BUGS

       Bug reports may be filed on the web at http://mantis.lbl.gov/bugzilla.

SEE ALSO

       cr_restart(1), cr_run(1)