Provided by: dmtcp_1.2.5-1ubuntu2_amd64 bug

NAME

       dmtcp - Distributed MultiThreaded Checkpointing

SYNOPSIS

       dmtcp_coordinator [port]

       dmtcp_checkpoint command [args...]

       dmtcp_restart ckpt_FILE1.dmtcp [ckpt_FILE2.dmtcp...]

       dmtcp_command coordinatorCommand

       dmtcp_inspector [-o<ofile>] [-t<tool>] [-cdaznh] <ckpt1.dmtcp> [ckpt2.dmtcp...]

       mtcp_restart FILE.mtcp

DESCRIPTION

       DMTCP  is a tool to transparently checkpointing the state of an arbitrary group of programs spread across
       many machines and connected by sockets. It does not modify the user's program nor the  operating  system.
       MTCP is a standalone component of DMTCP available as a checkpointing library for a single process.

OPTIONS

       For  each  command,  the --help or -h flag will show the command-line options.  Most command line options
       can also be controlled through environment variables.  These can be set in bash with "export  NAME=value"
       or in tcsh with "setenv NAME value".

       DMTCP_CHECKPOINT_INTERVAL=integer
              Time  in  seconds  between  automatic  checkpoints.  Checkpoints can also be initiated manually by
              typing 'c' into the coordinator. (default: 0, disabled; dmtcp_coordinator only)

       DMTCP_HOST=string
              Hostname where the cluster-wide coordinator is  running.  (default:  localhost;  dmtcp_checkpoint,
              dmtcp_restart only)

       DMTCP_PORT=integer
              The port the cluster-wide coordinator listens on. (default: 7779)

       DMTCP_GZIP=(1|0)
              Set  to  "0"  to  disable  compression  of  checkpoint  images.  (default: 1, compression enabled;
              dmtcp_checkpoint only) WARNING:  gzip adds seconds.  Without gzip, ckpt/restart is often less than
              1 s

       DMTCP_CHECKPOINT_DIR=path
              Directory to store checkpoint images in. (default: ./)

       DMTCP_SIGCKPT=integer
              Internal signal number to use for checkpointing.  Must not be used by the user program.  (default:
              SIGUSR2; dmtcp_checkpoint only)

DMTCP_COORDINATOR

       Each computation to be checkpointed must include a DMTCP coordinator process.  One can explicitly start a
       coordinator  through  dmtcp_coordinator,  or  allow  one to be started implicitly in background by either
       dmtcp_checkpoint or dmtcp_restart to operate.  The address of the unique coordinator should be  specified
       by  dmtcp_checkpoint,  dmtcp_restart, and dmtcp_command either through the --host and --port command-line
       flags or through the the DMTCP_HOST and DMTCP_PORT environment variables.  If neither is given, the host-
       port  pair  defaults  to  localhost-7779.  The host-port pair associated with a particular coordinator is
       given by the command-line flags used in the dmtcp_coordinator command, or the environment variables  then
       in effect, or the default of localhost-7779.

       The  coordinator  is  stateless  and  is  not checkpointed.  On restart, one can use an existing or a new
       coordinator.  Multiple computations under DMTCP control can coexist by  providing  a  unique  coordinator
       (with a unique host-port pair) for each such computation.

       The  coordinator  initiates a checkpoint for all processes in its computation group.  Checkpoints can be:
       performed automatically on an interval (see DMTCP_CHECKPOINT_INTERVAL above); or  initiated  manually  on
       the  standard  input of the coordinator (see next paragraph); or initiated directly under program control
       by the comptuation through the dmtcpaware API (see below).

       The coordinator accepts the following commands on its standard input.  Each command should be followed by
       the <return> key.  The commands are:
         l : List connected nodes
         s : Print status message
         c : Checkpoint all nodes
         f : Force a restart even if there are missing nodes (debugging)
         k : Kill all nodes
         q : Kill all nodes and quit
         ? : Show this message

       Coordinator commands can also be issued remotely using dmtcp_command.

DMTCP_INSPECTOR

       dmtcp_inspector  is  a  tool  for  offline  checkpoint  analysis.  It  provides  information about socket
       connections and parent-child relations between processes of a  distributed  program.  The  output  is  in
       graphviz  package  format and can be rendered using graphviz tools like dot, neato, twopi, circo, fdp and
       sfdp.  Command line options are following:

       -o, --out <file>
              Write output to <file>

       -t, --tool
              Graphviz tool to use. By default no graphviz and output is in dot-like format

       -c, --cred
              Add information about parent-child relations on the graph

       -d, --no-sock
              Remove information about socket connections

       -a, --sock-all
              Add verbose information about socket connections

       -z, --sock-half
              Also represent half-connections (when some *.dmtcp files are missed

       -n, --node
              Verbose node names indication

       -h, --help
              Display this help

EXAMPLE USAGE

       1. In a separate terminal window, start the dmtcp_coodinator.
              (See previous section.)

               dmtcp_coordinator

       2. In separate terminal(s), replace each command(s) with "dmtcp_checkpoint
              [command]".  The checkpointed program will connect to the coordinator specified by DMTCP_HOST  and
              DMTCP_PORT.   New  threads  will  be  checkpointed  as  part of the process.  Child processes will
              automatically be checkpointed.  Remote processes started via ssh will automatically  checkpointed.
              (Internally, DMTCP modifies the ssh command line to call dmtcp_checkpoint on the remote host.)

               dmtcp_checkpoint ./myprogram

       3. To manually initiate a checkpoint, either run the command below
              or  type "c" followed by <return> into the coordinator.  Checkpoint files for each process will be
              written to DMTCP_CHECKPOINT_DIR. The dmtcp_coordinator will write "dmtcp_restart_script.sh" to its
              working  directory.   This  script  contains  the  necessary calls to dmtcp_restart to restart the
              entire computation, including remote processes created via ssh.

                   dmtcp_command -c
              OR:  dmtcp_command --checkpoint

       4. To restart, one should execute dmtcp_restart_script.sh, which is
              created by the dmtcp_coordinator in its working directory at  the  time  of  checkpoint.  One  can
              optionally  edit  this  script  to  migrate  processes  to  different hosts.  By default, only one
              restarted process will be restarted in the foreground and receive the standard input.  The  script
              may be edited to choose which process will be restarted in the foreground.

               ./dmtcp_restart_script.sh

DMTCPAWARE API

       DMTCP provides a programming interface to allow checkpointed applications to interact with dmtcp.  In the
       source distribution, see dmtcpaware/dmtcpaware.h for the functions available.  See test/dmtcpaware[123].c
       for three example applications.  For an example of its usage, try:

        cd test; rm dmtcpaware1; make dmtcpaware1; ./autotest -v dmtcpaware1

       The  user  application  should  link  with  libdmtcpaware.so  (-ldmtcpaware)  and  use  the  header  file
       dmtcp/dmtcpaware.h.

       The file utils/dmtcp.py in the source distribution provides an example python binding for the  dmtcpaware
       interface.

DMTCP PLUGIN MODULES

       The  source  distribution  includes  a top-level plugin directory, with examples of how to write a plugin
       module for DMTCP.  Further examples are in the test/plugin directory.  The plugin feature adds three  new
       user-programmable  capabilities.  A plugin may: add wrappers around system calls; take special actions at
       during certain events (e.g. pre-checkpoint, resume/post-checkpoint, restart); and  may  insert  key-value
       pairs  into a database at restart time that is then available to be queried by the restarted processes of
       a computation.  (The events available to the plugin feature form a superset of the events available  with
       the  dmtcpaware  interface.)   One  or  more  plugins  are invoked via a list of colon-separated absolute
       pathnames.

         dmtcp_checkpoint --with-plugin PLUGIN1[:PLUGIN2]...

RETURN CODE

       A target program under DMTCP control normally returns the same return code as if executed without  DMTCP.
       However, if DMTCP fails (as opposed to the target program failing), DMTCP returns a DMTCP-specific return
       code, rc (or rc+1, rc+2 for two special cases), where rc is the integer value of the environment variable
       DMTCP_FAIL_RC if set, or else the default value, 99.

SEE ALSO

       Full documentation is available from http://dmtcp.sourceforge.net/

AUTHORS

       DMTCP  and  its  standalone single-process compontent MTCP (MultiThreaded CheckPointing) were created and
       are maintained by Jason Ansel, Kapil Arya, Gene Cooperman, Artem  Y.  Polyakov,  Mike  Rieker,  Ana-Maria
       Visan,  and  a  series  of  newer contributors including Alex Brick, Tyler Denniston, Rohan Garg, Gregory
       Kerr, and others.