Ubuntu Manpage: pdcp - copy files to groups of hosts in parallel

NAME

       pdcp - copy files to groups of hosts in parallel
       rpdcp - (reverse pdcp) copy files from a group of hosts in parallel

SYNOPSIS

       pdcp [options]... src [src2...] dest
       rpdcp [options]... src [src2...] dir

DESCRIPTION

       pdcp is a variant of the rcp(1) command.  Unlike rcp(1), which copies files to a single remote host, pdcp
       can  copy  files  to  multiple  remote  hosts in parallel.  However, pdcp does not recognize files in the
       format ``rname@rhost:path,'' therefore all source files must be on the local host  machine.   Destination
       nodes  must  be  listed on the pdcp command line using a suitable target nodelist option (See the OPTIONS
       section below).  Each destination node listed must have pdcp installed for the copy to succeed.

       When pdcp receives SIGINT (ctrl-C), it lists the status of current threads.  A second SIGINT  within  one
       second  terminates  the  program.  Pending threads may be canceled by issuing ctrl-Z within one second of
       ctrl-C.  Pending threads are those that have not yet been initiated, or  are  still  in  the  process  of
       connecting to the remote host.

       Like pdsh(1), the functionality of pdcp may be supplemented by dynamically loadable modules. In pdcp, the
       modules  may  provide  a new connect protocol (replacing the standard rsh(1) protocol), filtering options
       (e.g. excluding hosts that are down), and/or host selection options (e.g. -a selects  all  nodes  from  a
       local  config  file).   By default, pdcp requires at least one "rcmd" module to be loaded (to provide the
       channel for remote copy).

REVERSE PDCP

       rpdcp performs a reverse parallel copy.  Rather than copying files to remote hosts, files  are  retrieved
       from  remote  hosts  and  stored  locally.   All directories or files retrieved will be stored with their
       remote hostname appended to the filename.  The destination file must be a directory when this  option  is
       used.

       In  other respects, rpdcp is exactly like pdcp, and further statements regarding pdcp in this manual also
       apply to rpdcp.

RCMD MODULES

       The method by which pdcp connects to remote hosts may be selected at runtime using  the  -R  option  (See
       OPTIONS  below).   This  functionality is ultimately implemented via dynamically loadable modules, and so
       the list of available options may be different from installation to installation.  A  list  of  currently
       available  rcmd  modules  is printed when using any of the -h, -V, or -L options. The default rcmd module
       will also be displayed with the -h and -V options.

       A list of rcmd modules currently distributed with pdcp follows.

       rsh     Uses an internal, thread-safe implementation of BSD rcmd(3) to run commands  using  the  standard
               rsh(1) protocol.

       ssh     Uses a variant of popen(3) to run multiple copies of the ssh(1) command.

       mrsh    This  module uses the mrsh(1) protocol to execute jobs on remote hosts.  The mrsh protocol uses a
               credential based authentication, forgoing the need to allocate reserved ports. In other  aspects,
               it acts just like rsh.

       krb4    The  krb4  module  allows users to execute remote commands after authenticating with kerberos. Of
               course, the remote rshd daemons must be kerberized.

       xcpu    The xcpu module uses the xcpu service to execute remote commands.

OPTIONS

       The list of available pdcp options is determined at runtime by supplementing the list  of  standard  pdcp
       options  with  any  options provided by loaded rcmd and misc modules.  In some cases, options provided by
       modules may conflict with each other. In these cases, the modules are incompatible and the  first  module
       loaded wins.

Standard target nodelist options

-w TARGETS,...
Target and or filter the specified list of hosts. Do not use with any other node selection options
(e.g. -a, -g, if they are available). No spaces are allowed in the comma-separated list.
Arguments in the TARGETS list may include normal host names, a range of hosts in hostlist format
(See HOSTLIST EXPRESSIONS), or a single `-' character to read the list of hosts on stdin.

If a host or hostlist is preceded by a `-' character, this causes those hosts to be explicitly
excluded. If the argument is preceded by a single `^' character, it is taken to be the path to
file containing a list of hosts, one per line. If the item begins with a `/' character, it is
taken as a regular expression on which to filter the list of hosts (a regex argument may also be
optionally trailed by another '/', e.g. /node.*/). A regex or file name argument may also be
preceeded by a minus `-' to exclude instead of include thoses hosts.

A list of hosts may also be preceded by "user@" to specify a remote username other than the
default, or "rcmd_type:" to specify an alternate rcmd connection type for these hosts. When used
together, the rcmd type must be specified first, e.g. "ssh:user1@host0" would use ssh to connect
to host0 as user "user1."

-x host,host,...
Exclude the specified hosts. May be specified in conjunction with other target node list options
such as -a and -g (when available). Hostlists may also be specified to the -x option (see the
HOSTLIST EXPRESSIONS section below). Arguments to -x may also be preceeded by the filename (`^')
and regex ('/') characters as described above, in which case the resulting hosts are excluded as
if they had been given to -w and preceeded with the minus `-' character.

Standard pdcp options

-h Output usage menu and quit. A list of available rcmd modules will be printed at the end of the
usage message.

-q List option values and the target nodelist and exit without action.

-b Disable ctrl-C status feature so that a single ctrl-C kills parallel copy. (Batch Mode)

-r Copy directories recursively.

-p Preserve modification time and modes.

-e PATH
Explicitly specify path to remote pdcp binary instead of using the locally executed path. Can also
be set via the environment variable PDSH_REMOTE_PDCP_PATH.

-l user
This option may be used to copy files as another user, subject to authorization. For BSD rcmd,
this means the invoking user and system must be listed in the user´s .rhosts file (even for root).

-t seconds
Set the connect timeout. Default is 10 seconds.

-f number
Set the maximum number of simultaneous remote copies to number. The default is 32.

-R name
Set rcmd module to name. This option may also be set via the PDSH_RCMD_TYPE environment variable.
A list of available rcmd modules may be obtained via either the -h or -L options.

-M name,...
When multiple misc modules provide the same options to pdsh, the first module initialized "wins"
and subsequent modules are not loaded. The -M option allows a list of modules to be specified
that will be force-initialized before all others, in-effect ensuring that they load without
conflict (unless they conflict with eachother). This option may also be set via the
PDSH_MISC_MODULES environment variable.

-L List info on all loaded pdcp modules and quit.

-d Include more complete thread status when SIGINT is received, and display connect and command time
statistics on stderr when done.

-V Output pdcp version information, along with list of currently loaded modules, and exit.

HOSTLIST EXPRESSIONS

       As  noted  in  sections above, pdcp accepts ranges of hostnames in the general form: prefix[n-m,l-k,...],
       where n < m and l < k, etc., as an alternative to explicit lists of  hosts.   This  form  should  not  be
       confused  with  regular  expression character classes (also denoted by ``[]''). For example, foo[19] does
       not represent foo1 or foo9, but rather represents a degenerate range: foo19.

       This range syntax is meant only as a convenience on  clusters  with  a  prefixNN  naming  convention  and
       specification  of  ranges  should not be considered necessary -- the list foo1,foo9 could be specified as
       such, or by the range foo[1,9].

       Some examples of range usage follow:

       Copy /etc/hosts to foo01,foo02,...,foo05
           pdcp -w foo[01-05] /etc/hosts /etc

       Copy /etc/hosts to foo7,foo9,foo10
           pdcp -w foo[7,9-10] /etc/hosts /etc

       Copy /etc/hosts to foo0,foo4,foo5
           pdcp -w foo[0-5] -x foo[1-3] /etc/hosts /etc

       As a reminder to the reader, some shells will interpret brackets ('['  and  ']')  for  pattern  matching.
       Depending  on  your  shell,  it  may be necessary to enclose ranged lists within quotes.  For example, in
       tcsh, the first example above should be executed as:

           pdcp -w "foo[01-05]" /etc/hosts /etc

ORIGIN

       Pdsh/pdcp was originally a rewrite of IBM dsh(1) by Jim Garlick <garlick@llnl.gov> on LLNL's  ASCI  Blue-
       Pacific IBM SP system.  It is now also used on Linux clusters at LLNL.

LIMITATIONS

       When using ssh for remote execution, stderr of ssh to be folded in with that of the remote command.  When
       invoked  by pdcp, it is not possible for ssh to prompt for confirmation if a host key changes, prompt for
       passwords if RSA keys are not configured properly, etc..  Finally, the connect timeout is only adjustable
       with ssh when the underlying ssh implementation supports it, and pdsh has been built to use  the  correct
       option.