Provided by: dmtcp_1.2.5-1ubuntu2_amd64 bug

NAME

       dmtcp - Distributed MultiThreaded Checkpointing

SYNOPSIS

       dmtcp_coordinator [port]

       dmtcp_checkpoint command [args...]

       dmtcp_restart ckpt_FILE1.dmtcp [ckpt_FILE2.dmtcp...]

       dmtcp_command coordinatorCommand

       dmtcp_inspector [-o<ofile>] [-t<tool>] [-cdaznh] <ckpt1.dmtcp> [ckpt2.dmtcp...]

       mtcp_restart FILE.mtcp

DESCRIPTION

       DMTCP is a tool to transparently checkpointing the state of an arbitrary group of programs
       spread across many machines and connected by  sockets.  It  does  not  modify  the  user's
       program  nor the operating system.  MTCP is a standalone component of DMTCP available as a
       checkpointing library for a single process.

OPTIONS

       For each command, the --help or -h flag will show the command-line options.  Most  command
       line  options  can  also be controlled through environment variables.  These can be set in
       bash with "export NAME=value" or in tcsh with "setenv NAME value".

       DMTCP_CHECKPOINT_INTERVAL=integer
              Time in seconds between automatic checkpoints.  Checkpoints can also  be  initiated
              manually   by   typing   'c'   into   the   coordinator.   (default:  0,  disabled;
              dmtcp_coordinator only)

       DMTCP_HOST=string
              Hostname where  the  cluster-wide  coordinator  is  running.  (default:  localhost;
              dmtcp_checkpoint, dmtcp_restart only)

       DMTCP_PORT=integer
              The port the cluster-wide coordinator listens on. (default: 7779)

       DMTCP_GZIP=(1|0)
              Set  to  "0" to disable compression of checkpoint images.  (default: 1, compression
              enabled;  dmtcp_checkpoint  only)  WARNING:   gzip  adds  seconds.   Without  gzip,
              ckpt/restart is often less than 1 s

       DMTCP_CHECKPOINT_DIR=path
              Directory to store checkpoint images in. (default: ./)

       DMTCP_SIGCKPT=integer
              Internal  signal  number  to  use  for checkpointing.  Must not be used by the user
              program.  (default: SIGUSR2; dmtcp_checkpoint only)

DMTCP_COORDINATOR

       Each computation to be checkpointed must include a DMTCP  coordinator  process.   One  can
       explicitly  start  a  coordinator  through  dmtcp_coordinator,  or allow one to be started
       implicitly in background by either dmtcp_checkpoint  or  dmtcp_restart  to  operate.   The
       address  of the unique coordinator should be specified by dmtcp_checkpoint, dmtcp_restart,
       and dmtcp_command either through the --host and --port command-line flags or  through  the
       the  DMTCP_HOST  and DMTCP_PORT environment variables.  If neither is given, the host-port
       pair defaults  to  localhost-7779.   The  host-port  pair  associated  with  a  particular
       coordinator  is  given by the command-line flags used in the dmtcp_coordinator command, or
       the environment variables then in effect, or the default of localhost-7779.

       The coordinator is stateless and is not checkpointed.  On restart, one can use an existing
       or  a new coordinator.  Multiple computations under DMTCP control can coexist by providing
       a unique coordinator (with a unique host-port pair) for each such computation.

       The coordinator initiates a  checkpoint  for  all  processes  in  its  computation  group.
       Checkpoints can be:  performed automatically on an interval (see DMTCP_CHECKPOINT_INTERVAL
       above); or initiated  manually  on  the  standard  input  of  the  coordinator  (see  next
       paragraph);  or  initiated  directly  under program control by the comptuation through the
       dmtcpaware API (see below).

       The coordinator accepts the following commands on its standard input.  Each command should
       be followed by the <return> key.  The commands are:
         l : List connected nodes
         s : Print status message
         c : Checkpoint all nodes
         f : Force a restart even if there are missing nodes (debugging)
         k : Kill all nodes
         q : Kill all nodes and quit
         ? : Show this message

       Coordinator commands can also be issued remotely using dmtcp_command.

DMTCP_INSPECTOR

       dmtcp_inspector  is  a tool for offline checkpoint analysis. It provides information about
       socket connections and parent-child relations between processes of a distributed  program.
       The  output  is  in  graphviz package format and can be rendered using graphviz tools like
       dot, neato, twopi, circo, fdp and sfdp.  Command line options are following:

       -o, --out <file>
              Write output to <file>

       -t, --tool
              Graphviz tool to use. By default no graphviz and output is in dot-like format

       -c, --cred
              Add information about parent-child relations on the graph

       -d, --no-sock
              Remove information about socket connections

       -a, --sock-all
              Add verbose information about socket connections

       -z, --sock-half
              Also represent half-connections (when some *.dmtcp files are missed

       -n, --node
              Verbose node names indication

       -h, --help
              Display this help

EXAMPLE USAGE

       1. In a separate terminal window, start the dmtcp_coodinator.
              (See previous section.)

               dmtcp_coordinator

       2. In separate terminal(s), replace each command(s) with "dmtcp_checkpoint
              [command]".  The checkpointed program will connect to the coordinator specified  by
              DMTCP_HOST  and  DMTCP_PORT.   New  threads  will  be  checkpointed  as part of the
              process.  Child processes will automatically  be  checkpointed.   Remote  processes
              started  via  ssh  will automatically checkpointed. (Internally, DMTCP modifies the
              ssh command line to call dmtcp_checkpoint on the remote host.)

               dmtcp_checkpoint ./myprogram

       3. To manually initiate a checkpoint, either run the command below
              or type "c" followed by <return> into the coordinator.  Checkpoint files  for  each
              process  will  be written to DMTCP_CHECKPOINT_DIR. The dmtcp_coordinator will write
              "dmtcp_restart_script.sh" to its  working  directory.   This  script  contains  the
              necessary  calls  to  dmtcp_restart  to  restart  the entire computation, including
              remote processes created via ssh.

                   dmtcp_command -c
              OR:  dmtcp_command --checkpoint

       4. To restart, one should execute dmtcp_restart_script.sh, which is
              created  by  the  dmtcp_coordinator  in  its  working  directory  at  the  time  of
              checkpoint.  One  can optionally edit this script to migrate processes to different
              hosts.  By default, only one restarted process will be restarted in the  foreground
              and  receive  the standard input.  The script may be edited to choose which process
              will be restarted in the foreground.

               ./dmtcp_restart_script.sh

DMTCPAWARE API

       DMTCP provides a programming interface to allow checkpointed applications to interact with
       dmtcp.   In  the  source  distribution,  see  dmtcpaware/dmtcpaware.h  for  the  functions
       available.  See test/dmtcpaware[123].c for three example applications.  For an example  of
       its usage, try:

        cd test; rm dmtcpaware1; make dmtcpaware1; ./autotest -v dmtcpaware1

       The  user  application should link with libdmtcpaware.so (-ldmtcpaware) and use the header
       file dmtcp/dmtcpaware.h.

       The file utils/dmtcp.py in the source distribution provides an example python binding  for
       the dmtcpaware interface.

DMTCP PLUGIN MODULES

       The  source  distribution  includes  a top-level plugin directory, with examples of how to
       write a plugin module for DMTCP.  Further examples are in the test/plugin directory.   The
       plugin  feature adds three new user-programmable capabilities.  A plugin may: add wrappers
       around system calls; take special actions at during certain events  (e.g.  pre-checkpoint,
       resume/post-checkpoint,  restart);  and  may  insert  key-value  pairs  into a database at
       restart time that is then available  to  be  queried  by  the  restarted  processes  of  a
       computation.   (The  events  available to the plugin feature form a superset of the events
       available with the dmtcpaware interface.)  One or more plugins are invoked via a  list  of
       colon-separated absolute pathnames.

         dmtcp_checkpoint --with-plugin PLUGIN1[:PLUGIN2]...

RETURN CODE

       A  target program under DMTCP control normally returns the same return code as if executed
       without DMTCP.  However, if DMTCP fails (as opposed to the target program failing),  DMTCP
       returns  a  DMTCP-specific return code, rc (or rc+1, rc+2 for two special cases), where rc
       is the integer value of the environment variable DMTCP_FAIL_RC if set, or else the default
       value, 99.

SEE ALSO

       Full documentation is available from http://dmtcp.sourceforge.net/

AUTHORS

       DMTCP and its standalone single-process compontent MTCP (MultiThreaded CheckPointing) were
       created and are maintained by Jason Ansel, Kapil Arya, Gene Cooperman, Artem Y.  Polyakov,
       Mike  Rieker,  Ana-Maria  Visan,  and a series of newer contributors including Alex Brick,
       Tyler Denniston, Rohan Garg, Gregory Kerr, and others.