Ubuntu Manpage: dggath, dgscat, gscat - convert distributed source graphs to or from centralized ones

NAME

       dggath, dgscat, gscat - convert distributed source graphs to or from centralized ones

SYNOPSIS

       dggath [options] [igfile] [ogfile]

       dgscat [options] [igfile] [ogfile]

       gscat [options] [igfile] [ogfile]

DESCRIPTION

The dggath program gathers distributed graphs into centralized graphs. It reads a set of files igfile
representing fragments of a distributed source graph, and writes them back on the form of a single
centralized source graph ogfile.

The dgscat program scatters centralized source graphs into distributed graphs. It reads a centralized
source graph igfile and writes it back on the form of a set of files ogfile representing fragments of the
corresponding distributed source graph.

The gscat program does exactly the same as dgscat, but does not require to be run in a parallel
environment. Since gscat processes the input centralized graph file as a text stream, it does not need to
load the full graph in memory before building the distributed graph fragment files. It is therefore much
less resource consuming, but does not allow for the checking of graph consistency, as it has no global
vision of the graph structure.

When file names are not specified, data is read from standard input and written to standard output.
Standard streams can also be explicitly represented by a dash '-'.

When the proper libraries have been included at compile time, dggath and dgscat can directly handle
compressed graphs, both as input and output. A stream is treated as compressed whenever its name is
postfixed with a compressed file extension, such as in 'brol.grf.bz2' or '-.gz'. The compression formats
which can be supported are the bzip2 format ('.bz2'), the gzip format ('.gz'), and the lzma format
('.lzma', on input only).

dggath and dgscat base on implementations of the MPI interface to spread work across the processing
elements. It is therefore not likely to be run directly, but instead through some launcher command such
as mpirun.

DISTRIBUTED FILE NAMES

In order to tell whether programs should read from, or write to, a single file located on only one
processor, or to multiple instances of the same file on all of the processors, or else to distinct files
on each of the processors, a special grammar has been designed, which is based on the '%' escape
character. Four such escape sequences are defined, which are interpreted independently on every
processor, prior to file opening. By default, when a filename is provided, it is assumed that the file is
to be opened on only one of the processors, called the root processor, which is usually process 0 of the
communicator within which the program is run. The index of the root processor can be changed by means of
the -r option. Using any of the first three escape sequences below will instruct programs to open in
parallel a file of name equal to the interpreted filename, on every processor on which they are run.

%p Replaced by the number of processes in the global communicator in which the program is run. Leads
to parallel opening.

%r Replaced on each process running the program by the rank of this process in the global
communicator. Leads to parallel opening.

%- Discarded, but leads to parallel opening. This sequence is mainly used to instruct programs to
open on every processor a file of identical name. The opened files can be, according whether the
given path leads to a shared directory or to directories that are local to each processor, either
to the opening of multiple instances of the same file, or to the opening of distinct files which
may each have a different content, respectively (but in this latter case it is much recommended to
identify files by means of the '%r' sequence).

%% Replaced by a single '%' character. File names using this escape sequence are not considered for
parallel opening, unless one or several of the three other escape sequences are also present.

For instance, filename 'brol' will lead to the opening of file 'brol' on the root processor only,
filename '%-brol' (or even 'br%-ol') will lead to the parallel opening of files called 'brol' on every
processor, and filename 'brol%p-%r' will lead to the opening of files 'brol2-0' and 'brol2-1',
respectively, on each of the two processors on which the program were to run.

OPTIONS

       -c     For  dggath and dgscat only. Check the consistency of the input source graph after loading it into
              memory.

       -h     Display some help.

       -rpnum Set root process for centralized files (default is 0).

       -V     Display program version and copyright.

EXAMPLE

       Run dgscat on 5 processing elements to scatter centralized  graph  file  brol.grf  into  5  gzipped  file
       fragments brol5-0.dgr.gz to brol5-4.dgr.gz.

           $ mpirun -np 5 dgscat brol.grf brol%p-%r.dgr.gz

AUTHOR

       Francois Pellegrini <francois.pellegrini@labri.fr>

                                                 August 03, 2010                                       dgscat(1)