Provided by: pegasus-wms_4.0.1+dfsg-8_amd64 bug

NAME

       pegasus-plan - runs Pegasus to generate the executable workflow

SYNOPSIS

       pegasus-plan [-v] [-q] [-V] [-h]
                    [-Dprop=value...]] [-b prefix]
                    [--conf propsfile]
                    [-c cachefile[,cachefile...]]
                    [-C style[,style...]]
                    [--dir dir]
                    [--force] [--force-replan]
                    [--inherited-rc-files] [-j prefix]
                    [-n] [-o site]
                    [-s site1[,site2...]]
                    [--staging-site s1=ss1[,s2=ss2[..]]
                    [--randomdir[=dirname]]
                    [--relative-dir dir]
                    [--relative-submit-dir dir]
                    -d daxfile

DESCRIPTION

       The pegasus-plan command takes in as input the DAX and generates an executable workflow
       usually in form of condor submit files, which can be submitted to an execution site for
       execution.

       As part of generating an executable workflow, the planner needs to discover:

       data
           The Pegasus Workflow Planner ensures that all the data required for the execution of
           the executable workflow is transferred to the execution site by adding transfer nodes
           at appropriate points in the DAG. This is done by looking up an appropriate Replica
           Catalog to determine the locations of the input files for the various jobs. At present
           the default replica mechanism used is RLS.

           The Pegasus Workflow Planner also tries to reduce the workflow, unless specified
           otherwise. This is done by deleting the jobs whose output files have been found in
           some location in the Replica Catalog. At present no cost metrics are used. However
           preference is given to a location corresponding to the execution site

           The planner can also add nodes to transfer all the materialized files to an output
           site. The location on the output site is determined by looking up the site catalog
           file, the path to which is picked up from the pegasus.catalog.site.file property
           value.

       executables
           The planner looks up a Transformation Catalog to discover locations of the executables
           referred to in the executable workflow. Users can specify INSTALLED or STAGEABLE
           executables in the catalog. Stageable executables can be used by Pegasus to stage
           executables to resources where they are not pre-installed.

       resources
           The layout of the sites, where Pegasus can schedule jobs of a workflow are described
           in the Site Catalog. The planner looks up the site catalog to determine for a site
           what directories a job can be executed in, what servers to use for staging in and out
           data and what jobmanagers (if applicable) can be used for submitting jobs.

       The data and executable locations can now be specified in DAX’es conforming to DAX schema
       version 3.2 or higher.

OPTIONS

       Any option will be displayed with its long options synonym(s).

       -Dproperty=value
           The -D option allows an experienced user to override certain properties which
           influence the program execution, among them the default location of the user’s
           properties file and the PEGASUS home location. One may set several CLI properties by
           giving this option multiple times. The -D option(s) must be the first option on the
           command line. A CLI property take precedence over the properties file property of the
           same key.

       -d file, --dax file
           The DAX is the XML input file that describes an abstract workflow. This is a mandatory
           option, which has to be used.

       -b prefix, --basename prefix
           The basename prefix to be used while constructing per workflow files like the dagman
           file (.dag file) and other workflow specific files that are created by Condor. Usually
           this prefix, is taken from the name attribute specified in the root element of the dax
           files.

       -c file[,file,...], --cache file[,file,...]
           A comma separated list of paths to replica cache files that override the results from
           the replica catalog for a particular LFN.

           Each entry in the cache file describes a LFN , the corresponding PFN and the
           associated attributes. The pool attribute should be specified for each entry.

               LFN_1 PFN_1 pool=[site handle 1]
               LFN_2 PFN_2 pool=[site handle 2]
                ...
               LFN_N PFN_N [site handle N]

           To treat the cache files as supplemental replica catalogs set the property
           pegasus.catalog.replica.cache.asrc to true. This results in the mapping in the cache
           files to be merged with the mappings in the replica catalog. Thus, for a particular
           LFN both the entries in the cache file and replica catalog are available for replica
           selection.

       -C style[,style,...], --cluster style[,style,...]
           Comma-separated list of clustering styles to apply to the workflow. This mode of
           operation results in clustering of n compute jobs into a larger jobs to reduce remote
           scheduling overhead. You can specify a list of clustering techniques to recursively
           apply them to the workflow. For example, this allows you to cluster some jobs in the
           workflow using horizontal clustering and then use label based clustering on the
           intermediate workflow to do vertical clustering.

           The clustered jobs can be run at the remote site, either sequentially or by using MPI.
           This can be specified by setting the property pegasus.job.aggregator. The property can
           be overridden by associating the PEGASUS profile key collapser either with the
           transformation in the transformation catalog or the execution site in the site
           catalog. The value specified (to the property or the profile), is the logical name of
           the transformation that is to be used for clustering jobs. Note that clustering will
           only happen if the corresponding transformations are catalogued in the transformation
           catalog.

           PEGASUS ships with a clustering executable seqexec that can be found in the
           $PEGASUS_HOME/bin directory. It runs the jobs in the clustered job sequentially on the
           same node at the remote site.

           In addition, an MPI wrapper mpiexec, is distributed as source with PEGASUS. It can be
           found in the $PEGASUS_HOME/src/tools/cluster directory. The wrapper is run on every
           MPI node, with the first one being the master and the rest of the ones as workers. The
           number of instances of mpiexec that are invoked is equal to the value of the Globus
           RSL key nodecount. The master distributes the smaller constituent jobs to the workers.
           For e.g. If there were 10 jobs in the clustered job and nodecount was 5, then one node
           acts as master, and the 10 jobs are distributed amongst the 4 slaves on demand. The
           master hands off a job to the slave node as and when it gets free. So initially all
           the 4 nodes are given a single job each, and then as and when they get done are handed
           more jobs till all the 10 jobs have been executed.

           By default, seqexec is used for clustering jobs unless overridden in the properties or
           by the pegasus profile key collapser.

           The following type of clustering styles are currently supported:

           •    horizontal is the style of clustering in which jobs on the same level are
               aggregated into larger jobs. A level of the workflow is defined as the greatest
               distance of a node, from the root of the workflow. Clustering occurs only on jobs
               of the same type i.e they refer to the same logical transformation in the
               transformation catalog.

               Horizontal Clustering can operate in one of two modes. a. Job count based.

               The granularity of clustering can be specified by associating either the PEGASUS
               profile key clusters.size or the PEGASUS profile key clusters.num with the
               transformation.

               The clusters.size key indicates how many jobs need to be clustered into the larger
               clustered job. The clusters.num key indicates how many clustered jobs are to be
               created for a particular level at a particular execution site. If both keys are
               specified for a particular transformation, then the clusters.num key value is used
               to determine the clustering granularity.

                1. Runtime based.

                   To cluster jobs according to runtimes user needs to set one property and two
                   profile keys. The property pegasus.clusterer.preference must be set to the
                   value runtime. In addition user needs to specify two Pegasus profiles. a.
                   clusters.maxruntime which specifies the maximum duration for which the
                   clustered job should run for. b. job.runtime which specifies the duration for
                   which the job with which the profile key is associated, runs for. Ideally,
                   clusters.maxruntime should be set in transformation catalog and job.runtime
                   should be set for each job individually.

           •    label is the style of clustering in which you can label the jobs in your
               workflow. The jobs with the same level are put in the same clustered job. This
               allows you to aggregate jobs across levels, or in a manner that is best suited to
               your application.

               To label the workflow, you need to associate PEGASUS profiles with the jobs in the
               DAX. The profile key to use for labeling the workflow can be set by the property
               pegasus.clusterer.label.key. It defaults to label, meaning if you have a PEGASUS
               profile key label with jobs, the jobs with the same value for the pegasus profile
               key label will go into the same clustered job.

       --conf propfile
           The path to properties file that contains the properties planner needs to use while
           planning the workflow.

       --dir dir
           The base directory where you want the output of the Pegasus Workflow Planner usually
           condor submit files, to be generated. Pegasus creates a directory structure in this
           base directory on the basis of username, VO Group and the label of the workflow in the
           DAX.

           By default the base directory is the directory from which one runs the pegasus-plan
           command.

       -f, --force
           This bypasses the reduction phase in which the abstract DAG is reduced, on the basis
           of the locations of the output files returned by the replica catalog. This is
           analogous to a make style generation of the executable workflow.

       --force-replan
           By default, for hierarichal workflows if a DAX job fails, then on job retry the rescue
           DAG of the associated workflow is submitted. This option causes Pegasus to replan the
           DAX job in case of failure instead.

       -g, --group
           The VO Group to which the user belongs to.

       -h, --help
           Displays all the options to the pegasus-plan command.

       --inherited-rc-files file[,file,...]
           A comma separated list of paths to replica files. Locations mentioned in these have a
           lower priority than the locations in the DAX file. This option is usually used
           internally for hierarchical workflows, where the file locations mentioned in the
           parent (encompassing) workflow DAX, passed to the sub workflows (corresponding) to the
           DAX jobs.

       -j prefix, --job-prefix prefix
           The job prefix to be applied for constructing the filenames for the job submit files.

       -n, --nocleanup
           This results in the generation of the separate cleanup workflow that removes the
           directories created during the execution of the executable workflow. The cleanup
           workflow is to be submitted after the executable workflow has finished.

           If this option is not specified, then Pegasus adds cleanup nodes to the executable
           workflow itself that cleanup files on the remote sites when they are no longer
           required.

       -o site, --o site
           The output site where all the materialized data is transferred to.

           By default the materialized data remains in the working directory on the execution
           site where it was created. Only those output files are transferred to an output site
           for which transfer attribute is set to true in the DAX.

       -q, --quiet
           Decreases the logging level.

       -r[dirname], --randomdir[=dirname]
           Pegasus Worfklow Planner adds create directory jobs to the executable workflow that
           create a directory in which all jobs for that workflow execute on a particular site.
           The directory created is in the working directory (specified in the site catalog with
           each site).

           By default, Pegasus duplicates the relative directory structure on the submit host on
           the remote site. The user can specify this option without arguments to create a random
           timestamp based name for the execution directory that are created by the create dir
           jobs. The user can can specify the optional argument to this option to specify the
           basename of the directory that is to be created.

           The create dir jobs refer to the dirmanager executable that is shipped as part of the
           PEGASUS worker package. The transformation catalog is searched for the transformation
           named pegasus::dirmanager for all the remote sites where the workflow has been
           scheduled. Pegasus can create a default path for the dirmanager executable, if
           PEGASUS_HOME environment variable is associated with the sites in the site catalog as
           an environment profile.

       --relative-dir dir
           The directory relative to the base directory where the executable workflow it to be
           generated and executed. This overrides the default directory structure that Pegasus
           creates based on username, VO Group and the DAX label.

       --relative-submit-dir dir
           The directory relative to the base directory where the executable workflow it to be
           generated. This overrides the default directory structure that Pegasus creates based
           on username, VO Group and the DAX label. By specifying --relative-dir and
           --relative-submit-dir you can have different relative execution directory on the
           remote site and different relative submit directory on the submit host.

       -s site[,site,...], --sites site[,site,...]
           A comma separated list of execution sites on which the workflow is to be executed.
           Each of the sites should have an entry in the site catalog, that is being used. To run
           on the submit host, specify the execution site as local.

           In case this option is not specified, all the sites in the site catalog are picked up
           as candidates for running the workflow.

       --staging-site s1=ss1[,s2=ss2[..]]
           A comma separated list of key=value pairs , where the key is the execution site and
           value is the staging site for that execution site.

           In case of running on a shared filesystem, the staging site is automatically
           associated by the planner to be the execution site. If only a value is specified, then
           that is taken to be the staging site for all the execution sites. e.g --staging-site
           local means that the planner will use the local site as the staging site for all jobs
           in the workflow.

       -s, --submit
           Submits the generated executable workflow using pegasus-run script in
           $PEGASUS_HOME/bin directory. By default, the Pegasus Workflow Planner only generates
           the Condor submit files and does not submit them.

       -v, --verbose
           Increases the verbosity of messages about what is going on. By default, all FATAL,
           ERROR, CONSOLE and WARN messages are logged. The logging hierarchy is as follows:

            1. FATAL

            2. ERROR

            3. CONSOLE

            4. WARN

            5. INFO

            6. CONFIG

            7. DEBUG

            8. TRACE

           For example, to see the INFO, CONFIG and DEBUG messages additionally, set -vvv.

       -V, --version
           Displays the current version number of the Pegasus Workflow Management System.

RETURN VALUE

       If the Pegasus Workflow Planner is able to generate an executable workflow successfully,
       the exitcode will be 0. All runtime errors result in an exitcode of 1. This is usually in
       the case when you have misconfigured your catalogs etc. In the case of an error occurring
       while loading a specific module implementation at run time, the exitcode will be 2. This
       is usually due to factory methods failing while loading a module. In case of any other
       error occurring during the running of the command, the exitcode will be 1. In most cases,
       the error message logged should give a clear indication as to where things went wrong.

PEGASUS PROPERTIES

       This is not an exhaustive list of properties used. For the complete description and list
       of properties refer to $PEGASUS_HOME/doc/advanced-properties.pdf

       pegasus.selector.site
           Identifies what type of site selector you want to use. If not specified the default
           value of Random is used. Other supported modes are RoundRobin and NonJavaCallout that
           calls out to a external site selector.

       pegasus.catalog.replica
           Specifies the type of replica catalog to be used.

           If not specified, then the value defaults to RLS.

       pegasus.catalog.replica.url
           Contact string to access the replica catalog. In case of RLS it is the RLI url.

       pegasus.dir.exec
           A suffix to the workdir in the site catalog to determine the current working
           directory. If relative, the value will be appended to the working directory from the
           site.config file. If absolute it constitutes the working directory.

       pegasus.catalog.transformation
           Specifies the type of transformation catalog to be used. One can use either a file
           based or a database based transformation catalog. At present the default is Text.

       pegasus.catalog.transformation.file
           The location of file to use as transformation catalog.

           If not specified, then the default location of $PEGASUS_HOME/var/tc.data is used.

       pegasus.catalog.site
           Specifies the type of site catalog to be used. One can use either a text based or an
           xml based site catalog. At present the default is XML3.

       pegasus.catalog.site.file
           The location of file to use as a site catalog. If not specified, then default value of
           $PEGASUS_HOME/etc/sites.xml is used in case of the xml based site catalog and
           $PEGASUS_HOME/etc/sites.txt in case of the text based site catalog.

       pegasus.data.configuration
           This property sets up Pegasus to run in different environments. This can be set to

           sharedfs If this is set, Pegasus will be setup to execute jobs on the shared
           filesystem on the execution site. This assumes, that the head node of a cluster and
           the worker nodes share a filesystem. The staging site in this case is the same as the
           execution site.

           nonsharedfs If this is set, Pegasus will be setup to execute jobs on an execution site
           without relying on a shared filesystem between the head node and the worker nodes.

           condorio If this is set, Pegasus will be setup to run jobs in a pure condor pool, with
           the nodes not sharing a filesystem. Data is staged to the compute nodes from the
           submit host using Condor File IO.

       pegasus.code.generator
           The code generator to use. By default, Condor submit files are generated for the
           executable workflow. Setting to Shell results in Pegasus generating a shell script
           that can be executed on the submit host.

FILES

       $PEGASUS_HOME/etc/dax-3.3.xsd
           is the suggested location of the latest DAX schema to produce DAX output.

       $PEGASUS_HOME/etc/sc-3.0.xsd
           is the suggested location of the latest Site Catalog schema that is used to create the
           XML3 version of the site catalog

       $PEGASUS_HOME/etc/tc.data.text
           is the suggested location for the file corresponding to the Transformation Catalog.

       $PEGASUS_HOME/etc/sites.xml3 | $PEGASUS_HOME/etc/sites.xml
           is the suggested location for the file containing the site information.

       $PEGASUS_HOME/lib/pegasus.jar
           contains all compiled Java bytecode to run the Pegasus Workflow Planner.

SEE ALSO

       pegasus-sc-client(1), pegasus-tc-client(1), pegasus-rc-client(1)

AUTHORS

       Karan Vahi <vahi at isi dot edu>

       Gaurang Mehta <gmehta at isi dot edu>

       Pegasus Team http://pegasus.isi.edu

                                            02/28/2012                            PEGASUS-PLAN(1)