Provided by: openmpi-checkpoint_1.4.3-2.1ubuntu3_i386 bug


       OPAL_CRS  -  Open PAL MCA Checkpoint/Restart Service (CRS): Overview of
       Open PAL's CRS framework, and selected modules.  Open MPI 1.4.3.


       Open PAL can involuntarily checkpoint and restart sequential  programs.
       Doing  so  requires  that Open PAL was compiled with thread support and
       that the back-end checkpointing systems are available at run-time.

   Phases of Checkpoint / Restart
       Open PAL defines three phases for checkpoint /  restart  support  in  a

           When  the  checkpoint  request arrives, the procress is notified of
           the request before the checkpoint is taken.

           After a checkpoint has successfully completed, the same process  as
           the  checkpoint  is  notified  of  its  successful  continuation of

           After a checkpoint has successfully completed, a  new  /  restarted
           process is notified of its successful restart.

       The Continue and Restart phases are identical except for the process in
       which they are invoked. The Continue  phase  is  invoked  in  the  same
       process  as the Checkpoint phase was invoked. The Restart phase is only
       invoked in newly restarted processes.


       In order for a process to use the  Open  PAL  CRS  components  it  must
       adhear to a few programmatic requirements.

       First,  the  program  must  call OPAL_INIT early in its execution. This
       should only be called once, and it is not possible  to  checkpoint  the
       process without it first having called this function.

       The  program  must  call  OPAL_FINALIZE before termination. This does a
       significant amount of cleanup. If it is not called,  then  it  is  very
       likely that remnants are left in the filesystem.

       To  checkpoint and restart a process you must use the Open PAL tools to
       do so. Using the backend checkpointer's checkpoint  and  restart  tools
       will   lead  to  undefined  behavior.   To  checkpoint  a  process  use
       opal_checkpoint  (opal_checkpoint(1)).   To  restart  a   process   use
       opal_restart (opal_restart(1)).


       Open PAL ships with two CRS components: self and blcr.

       The following MCA parameters apply to all components:

           Set the verbosity level for all components. Default is 0, or silent
           except on error.

           The directory to store the checkpoint snapshots. Default is /tmp.

   self CRS Component
       The self component invokes user-defined functions to save  and  restore
       checkpoints.  It is simply a mechanism for user-defined functions to be
       invoked at Open PAL's Checkpoint, Continue, and Restart phases.  Hence,
       the only data that is saved during the checkpoint is what is written in
       the user's checkpoint function. No libary state is saved at all.

       As such, the model for the self component is slightly differnt than for
       other  components. Specifically, the Restart function is not invoked in
       the same process image  of  the  process  that  was  checkpointed.  The
       Restart  phase  is  invoked during OPAL_INIT of the new instance of the
       applicaiton (i.e., it starts over from main()).

       The self component has the following MCA parameters:

           Speficy a string prefix for the name of the  checkpoint,  continue,
           and  restart  functions  that  Open  PAL  will  invoke  during  the
           respective stages. That is,  by  specifying  "-mca  crs_self_prefix
           foo"  means  that  Open PAL expects to find three functions at run-

              int foo_checkpoint()

              int foo_continue()

              int foo_restart()

           By default, the prefix is set to "opal_crs_self_user".

           Set the self components default priority

           Set the verbosity level. Default is 0, or silent except on error.

           This is mostly internally used. A general user should never need to
           set  this value. This is set to non-0 when a the new process should
           invoke the restart callback in OPAL_INIT. Default is 0,  or  normal

   blcr CRS Component
       The Berkeley Lab Checkpoint/Restart (BLCR) single-process checkpoint is
       a software system developed at Lawrence Berkeley  National  Laboratory.
       See the project website for more details:


       The blcr component has the following MCA parameters:

           Set the blcr components default priority.

           Set the verbosity level. Default is 0, or silent except on error.

   none CRS Component
       The  none  component  simply  selects  no CRS component. All of the CRS
       function calls return immediately with OPAL_SUCCESS.

       This component is the last component to be selected  by  default.  This
       means  that  if  another component is available, and the none component
       was not explicity requested then OPAL will attempt to activate  all  of
       the available components before falling back to this component.


         opal_checkpoint(1), opal_restart(1)