Provided by: dmtcp_1.2.1-2build1_amd64 bug

NAME

       dmtcp - Distributed MultiThreaded Checkpointing

SYNOPSIS

       dmtcp_coordinator [port]

       dmtcp_checkpoint command [args...]

       dmtcp_restart ckpt1.mtcp [ckpt2.mtcp...]

       dmtcp_command coordinatorCommand

DESCRIPTION

       dmtcp is a tool to transparently checkpointing the state of an arbitrary group of programs
       spread across many machines and connected by  sockets.  It  does  not  modify  the  user's
       program nor the operating system.

OPTIONS

       Most  options are controlled through environment variables.  These can be set in bash with
       "export NAME=value" or in tcsh with "setenv NAME value".

       DMTCP_CHECKPOINT_INTERVAL=integer
              Time in seconds between automatic checkpoints.  Checkpoints can also  be  initiated
              manually   by   typing   'c'   into   the   coordinator.   (default:  0,  disabled;
              dmtcp_coordinator only)

       DMTCP_HOST=string
              Hostname where  the  cluster-wide  coordinator  is  running.  (default:  localhost;
              dmtcp_checkpoint, dmtcp_restart only)

       DMTCP_PORT=integer
              The port the cluster-wide coordinator listens on. (default: 7779)

       DMTCP_GZIP=(1|0)
              Set  to  "0"  to  disable compression of checkpoint images. (default:0, compression
              disabled; dmtcp_checkpoint only)

       DMTCP_CHECKPOINT_DIR=path
              Directory to store checkpoint images in. (default: ./)

       DMTCP_SIGCKPT=integer
              Internal signal number to use for checkpointing.  Must not  be  used  by  the  user
              program.  (default: SIGUSR2; dmtcp_checkpoint only)

DMTCP_COORDINATOR

       A  dmtcp_coordinator  process  must  be  started  in  order for either dmtcp_checkpoint or
       dmtcp_restart to operate.  There should be exactly one dmtcp_coordinator for each  network
       of processes.  In the case of multiple hosts, the address of the single global coordinator
       should be communicated to dmtcp_checkpoint and dmtcp_restart through  the  DMTCP_HOST  and
       DMTCP_PORT environment variables.

       The  coordinator  is  stateless  and is not checkpointed.  The dmtcp_coordinator initiates
       checkpoints of all processes in the system.  Checkpoints can be performed automatically on
       an  interval  (see  DMTCP_CHECKPOINT_INTERVAL above), or they can be initiated manually on
       the command line of the coordinator.

       The coordinator accepts the following commands on its standard input.  Each command should
       be followed by the <return> key.  The commands are:
         l : List connected nodes
         s : Print status message
         c : Checkpoint all nodes
         f : Force a restart even if there are missing nodes (debugging)
         k : Kill all nodes
         q : Kill all nodes and quit
         ? : Show this message

       Coordinator commands can also be issued remotely using dmtcp_command.

EXAMPLE USAGE

       1. In a separate terminal window, start the dmtcp_coodinator.  (See previous section.)

               dmtcp_coordinator

       2.  In  separate  terminal(s),  replace each command(s) with "dmtcp_checkpoint [command]".
       The checkpointed program will connect to  the  coordinator  specified  by  DMTCP_HOST  and
       DMTCP_PORT.  Child processes will automatically be checkpointed.  Remote processes started
       via ssh will automatically checkpointed. (The ssh command line with be  modified  to  call
       dmtcp_checkpoint on the remote host.)

               dmtcp_checkpoint ./myprogram

       3. To manually initiate a checkpoint, either run the command below or type "c" followed by
       <return> into the coordinator.  Checkpoint files for  each  process  will  be  written  to
       DMTCP_CHECKPOINT_DIR.  The  dmtcp_coordinator  will write "dmtcp_restart_script.sh" to its
       working directory.  This script contains the necessary calls to dmtcp_restart  to  restart
       the entire computation.

                   dmtcp_command -c
              OR:  dmtcp_command --checkpoint

       4.  To  restart,  one should use dmtcp_restart_script.sh created by the dmtcp_coordinator.
       One can optionally edit this script to migrate processes to different hosts.  In order  to
       give  a  restarted  program  standard  input, the script must be edited to run the desired
       process in the foreground of a terminal.

               ./dmtcp_restart_script.sh

PROGRAMMING INTERFACE

       DMTCP provides a programming interface to allow checkpointed applications to interact with
       dmtcp.

       The  user  application should link with libdmtcpaware.so (-ldmtcpaware) and use the header
       file dmtcp/dmtcpaware.h.

       For more information see: http://dmtcp.sourceforge.net/

SEE ALSO

       Full documentation is available from http://dmtcp.sourceforge.net/

AUTHORS

       DMTCP and its standalone single-process compontent MTCP (MultiThreaded CheckPointing) were
       created  and  are  maintained by Jason Ansel, Kapil Arya, Gene Cooperman, Mike Rieker, Ana
       Maria Visan, and Alex Brick.