bionic (1) pegasus-plan.1.gz

Provided by: pegasus-wms_4.4.0+dfsg-7_amd64 bug

NAME

       pegasus-plan - runs Pegasus to generate the executable workflow

SYNOPSIS

       pegasus-plan [-v] [-q] [-V] [-h]
                    [-Dprop=value...]] [-b prefix]
                    [--conf propsfile]
                    [-c cachefile[,cachefile...]] [--cleanup cleanup strategy ]
                    [-C style[,style...]]
                    [--dir dir]
                    [--force] [--force-replan]
                    [--inherited-rc-files] [-j prefix]
                    [-n][-I input-dir][-O output-dir] [-o site]
                    [-s site1[,site2...]]
                    [--staging-site s1=ss1[,s2=ss2[..]]
                    [--randomdir[=dirname]]
                    [--relative-dir dir]
                    [--relative-submit-dir dir]
                    -d daxfile

DESCRIPTION

       The pegasus-plan command takes in as input the DAX and generates an executable workflow usually in form
       of condor submit files, which can be submitted to an execution site for execution.

       As part of generating an executable workflow, the planner needs to discover:

       data
           The Pegasus Workflow Planner ensures that all the data required for the execution of the executable
           workflow is transferred to the execution site by adding transfer nodes at appropriate points in the
           DAG. This is done by looking up an appropriate Replica Catalog to determine the locations of the
           input files for the various jobs. By default, a file based replica catalog is used.

           The Pegasus Workflow Planner also tries to reduce the workflow, unless specified otherwise. This is
           done by deleting the jobs whose output files have been found in some location in the Replica Catalog.
           At present no cost metrics are used. However preference is given to a location corresponding to the
           execution site

           The planner can also add nodes to transfer all the materialized files to an output site. The location
           on the output site is determined by looking up the site catalog file, the path to which is picked up
           from the pegasus.catalog.site.file property value.

       executables
           The planner looks up a Transformation Catalog to discover locations of the executables referred to in
           the executable workflow. Users can specify INSTALLED or STAGEABLE executables in the catalog.
           Stageable executables can be used by Pegasus to stage executables to resources where they are not
           pre-installed.

       resources
           The layout of the sites, where Pegasus can schedule jobs of a workflow are described in the Site
           Catalog. The planner looks up the site catalog to determine for a site what directories a job can be
           executed in, what servers to use for staging in and out data and what jobmanagers (if applicable) can
           be used for submitting jobs.

       The data and executable locations can now be specified in DAX’es conforming to DAX schema version 3.2 or
       higher.

OPTIONS

       Any option will be displayed with its long options synonym(s).

       -Dproperty=value
           The -D option allows an experienced user to override certain properties which influence the program
           execution, among them the default location of the user’s properties file and the PEGASUS home
           location. One may set several CLI properties by giving this option multiple times. The -D option(s)
           must be the first option on the command line. A CLI property take precedence over the properties file
           property of the same key.

       -d file, --dax file
           The DAX is the XML input file that describes an abstract workflow. This is a mandatory option, which
           has to be used.

       -b prefix, --basename prefix
           The basename prefix to be used while constructing per workflow files like the dagman file (.dag file)
           and other workflow specific files that are created by Condor. Usually this prefix, is taken from the
           name attribute specified in the root element of the dax files.

       -c file[,file,...], --cache file[,file,...]
           A comma separated list of paths to replica cache files that override the results from the replica
           catalog for a particular LFN.

           Each entry in the cache file describes a LFN , the corresponding PFN and the associated attributes.
           The pool attribute should be specified for each entry.

               LFN_1 PFN_1 pool=[site handle 1]
               LFN_2 PFN_2 pool=[site handle 2]
                ...
               LFN_N PFN_N [site handle N]

           To treat the cache files as supplemental replica catalogs set the property
           pegasus.catalog.replica.cache.asrc to true. This results in the mapping in the cache files to be
           merged with the mappings in the replica catalog. Thus, for a particular LFN both the entries in the
           cache file and replica catalog are available for replica selection.

       -C style[,style,...], --cluster style[,style,...]
           Comma-separated list of clustering styles to apply to the workflow. This mode of operation results in
           clustering of n compute jobs into a larger jobs to reduce remote scheduling overhead. You can specify
           a list of clustering techniques to recursively apply them to the workflow. For example, this allows
           you to cluster some jobs in the workflow using horizontal clustering and then use label based
           clustering on the intermediate workflow to do vertical clustering.

           The clustered jobs can be run at the remote site, either sequentially or by using MPI. This can be
           specified by setting the property pegasus.job.aggregator. The property can be overridden by
           associating the PEGASUS profile key collapser either with the transformation in the transformation
           catalog or the execution site in the site catalog. The value specified (to the property or the
           profile), is the logical name of the transformation that is to be used for clustering jobs. Note that
           clustering will only happen if the corresponding transformations are catalogued in the transformation
           catalog.

           PEGASUS ships with a clustering executable pegasus-cluster that can be found in the $PEGASUS_HOME/bin
           directory. It runs the jobs in the clustered job sequentially on the same node at the remote site.

           In addition, an MPI based clustering tool called pegasus-mpi-cluster', is also distributed and can be
           found in the bin directory. pegasus-mpi-cluster can also be used in the sharedfs setup and needs to
           be compiled against the remote site MPI install. directory. The wrapper is run on every MPI node,
           with the first one being the master and the rest of the ones as workers.

           By default, pegasus-cluster is used for clustering jobs unless overridden in the properties or by the
           pegasus profile key collapser.

           The following type of clustering styles are currently supported:

           •   horizontal is the style of clustering in which jobs on the same level are aggregated into larger
               jobs. A level of the workflow is defined as the greatest distance of a node, from the root of the
               workflow. Clustering occurs only on jobs of the same type i.e they refer to the same logical
               transformation in the transformation catalog.

               Horizontal Clustering can operate in one of two modes. a. Job count based.

               The granularity of clustering can be specified by associating either the PEGASUS profile key
               clusters.size or the PEGASUS profile key clusters.num with the transformation.

               The clusters.size key indicates how many jobs need to be clustered into the larger clustered job.
               The clusters.num key indicates how many clustered jobs are to be created for a particular level
               at a particular execution site. If both keys are specified for a particular transformation, then
               the clusters.num key value is used to determine the clustering granularity.

                1. Runtime based.

                   To cluster jobs according to runtimes user needs to set one property and two profile keys.
                   The property pegasus.clusterer.preference must be set to the value runtime. In addition user
                   needs to specify two Pegasus profiles. a. clusters.maxruntime which specifies the maximum
                   duration for which the clustered job should run for. b. job.runtime which specifies the
                   duration for which the job with which the profile key is associated, runs for. Ideally,
                   clusters.maxruntime should be set in transformation catalog and job.runtime should be set for
                   each job individually.

           •   label is the style of clustering in which you can label the jobs in your workflow. The jobs with
               the same level are put in the same clustered job. This allows you to aggregate jobs across
               levels, or in a manner that is best suited to your application.

               To label the workflow, you need to associate PEGASUS profiles with the jobs in the DAX. The
               profile key to use for labeling the workflow can be set by the property
               pegasus.clusterer.label.key. It defaults to label, meaning if you have a PEGASUS profile key
               label with jobs, the jobs with the same value for the pegasus profile key label will go into the
               same clustered job.

       --cleanup cleanup strategy
           The cleanup strategy to be used for workflows. Pegasus can add cleanup jobs to the executable
           workflow that can remove files and directories during the workflow execution.

           The following type of cleanup strategies are currently supported:

           •   none disables cleanup altogether. The planner does not add any cleanup jobs in the executable
               workflow whatsoever.

           •   leaf the planner adds a leaf cleanup node per staging site that removes the directory created by
               the create dir job in the workflow.

           •   inplace the planner adds in addition to leaf cleanup nodes, cleanup nodes per level of the
               workflow that remove files no longer required during execution. For example, an added cleanup
               node will remove input files for a particular compute job after the job has finished
               successfully.

       --conf propfile
           The path to properties file that contains the properties planner needs to use while planning the
           workflow.

       --dir dir
           The base directory where you want the output of the Pegasus Workflow Planner usually condor submit
           files, to be generated. Pegasus creates a directory structure in this base directory on the basis of
           username, VO Group and the label of the workflow in the DAX.

           By default the base directory is the directory from which one runs the pegasus-plan command.

       -f, --force
           This bypasses the reduction phase in which the abstract DAG is reduced, on the basis of the locations
           of the output files returned by the replica catalog. This is analogous to a make style generation of
           the executable workflow.

       --force-replan
           By default, for hierarichal workflows if a DAX job fails, then on job retry the rescue DAG of the
           associated workflow is submitted. This option causes Pegasus to replan the DAX job in case of failure
           instead.

       -g, --group
           The VO Group to which the user belongs to.

       -h, --help
           Displays all the options to the pegasus-plan command.

       --inherited-rc-files file[,file,...]
           A comma separated list of paths to replica files. Locations mentioned in these have a lower priority
           than the locations in the DAX file. This option is usually used internally for hierarchical
           workflows, where the file locations mentioned in the parent (encompassing) workflow DAX, passed to
           the sub workflows (corresponding) to the DAX jobs.

       -I, --input-dir
           A path to the input directory where the input files reside. This internally loads a Directory based
           Replica Catalog backend, that constructs does a directory listing to create the LFN→PFN mappings for
           the files in the input directory. You can specify additional properties either on the command line or
           the properties file to control the site attribute and url prefix associated with the mappings.

           pegasus.catalog.replica.directory.site specifies the pool attribute to associate with the mappings.
           Defaults to local

           pegasus.catalog.replica.directory.url.prefix specifies the URL prefix to use while constructing the
           PFN. Defaults to file://

       -j prefix, --job-prefix prefix
           The job prefix to be applied for constructing the filenames for the job submit files.

       -n, --nocleanup
           This option is deprecated. Use --cleanup none instead.

       -o site, --output-site site
           The output site to which the output files of the DAX are transferred to.

           By default the materialized data remains in the working directory on the execution site where it was
           created. Only those output files are transferred to an output site for which transfer attribute is
           set to true in the DAX.

       -O output directory, --output-dir output directory
           The output directory to which the output files of the DAX are transferred to.

           If -o is specified the storage directory of the site specified as the output site is updated to be
           the directory passed. If no output site is specified, then this option internally sets the output
           site to local with the storage directory updated to the directory passed.

       -q, --quiet
           Decreases the logging level.

       -r[dirname], --randomdir[=dirname]
           Pegasus Worfklow Planner adds create directory jobs to the executable workflow that create a
           directory in which all jobs for that workflow execute on a particular site. The directory created is
           in the working directory (specified in the site catalog with each site).

           By default, Pegasus duplicates the relative directory structure on the submit host on the remote
           site. The user can specify this option without arguments to create a random timestamp based name for
           the execution directory that are created by the create dir jobs. The user can can specify the
           optional argument to this option to specify the basename of the directory that is to be created.

           The create dir jobs refer to the dirmanager executable that is shipped as part of the PEGASUS worker
           package. The transformation catalog is searched for the transformation named pegasus::dirmanager for
           all the remote sites where the workflow has been scheduled. Pegasus can create a default path for the
           dirmanager executable, if PEGASUS_HOME environment variable is associated with the sites in the site
           catalog as an environment profile.

       --relative-dir dir
           The directory relative to the base directory where the executable workflow it to be generated and
           executed. This overrides the default directory structure that Pegasus creates based on username, VO
           Group and the DAX label.

       --relative-submit-dir dir
           The directory relative to the base directory where the executable workflow it to be generated. This
           overrides the default directory structure that Pegasus creates based on username, VO Group and the
           DAX label. By specifying --relative-dir and --relative-submit-dir you can have different relative
           execution directory on the remote site and different relative submit directory on the submit host.

       -s site[,site,...], --sites site[,site,...]
           A comma separated list of execution sites on which the workflow is to be executed. Each of the sites
           should have an entry in the site catalog, that is being used. To run on the submit host, specify the
           execution site as local.

           In case this option is not specified, all the sites in the site catalog are picked up as candidates
           for running the workflow.

       --staging-site s1=ss1[,s2=ss2[..]]
           A comma separated list of key=value pairs , where the key is the execution site and value is the
           staging site for that execution site.

           In case of running on a shared filesystem, the staging site is automatically associated by the
           planner to be the execution site. If only a value is specified, then that is taken to be the staging
           site for all the execution sites. e.g --staging-site local means that the planner will use the local
           site as the staging site for all jobs in the workflow.

       -s, --submit
           Submits the generated executable workflow using pegasus-run script in $PEGASUS_HOME/bin directory. By
           default, the Pegasus Workflow Planner only generates the Condor submit files and does not submit
           them.

       -v, --verbose
           Increases the verbosity of messages about what is going on. By default, all FATAL, ERROR, CONSOLE and
           WARN messages are logged. The logging hierarchy is as follows:

            1. FATAL

            2. ERROR

            3. CONSOLE

            4. WARN

            5. INFO

            6. CONFIG

            7. DEBUG

            8. TRACE

           For example, to see the INFO, CONFIG and DEBUG messages additionally, set -vvv.

       -V, --version
           Displays the current version number of the Pegasus Workflow Management System.

RETURN VALUE

       If the Pegasus Workflow Planner is able to generate an executable workflow successfully, the exitcode
       will be 0. All runtime errors result in an exitcode of 1. This is usually in the case when you have
       misconfigured your catalogs etc. In the case of an error occurring while loading a specific module
       implementation at run time, the exitcode will be 2. This is usually due to factory methods failing while
       loading a module. In case of any other error occurring during the running of the command, the exitcode
       will be 1. In most cases, the error message logged should give a clear indication as to where things went
       wrong.

CONTROLLING PEGASUS-PLAN MEMORY CONSUMPTION

       pegasus-plan will try to determine memory limits automatically using factors such as total system memory
       and potential memory limits (ulimits). The automatic limits can be overridden by setting the JAVA_HEAPMIN
       and JAVA_HEAPMAX environment variables before invoking pegasus-plan. The values are in megabytes. As a
       rule of thumb, JAVA_HEAPMIN can be set to half of the value of JAVA_HEAPMAX.

PEGASUS PROPERTIES

       This is not an exhaustive list of properties used. For the complete description and list of properties
       refer to $PEGASUS_HOME/doc/advanced-properties.pdf

       pegasus.selector.site
           Identifies what type of site selector you want to use. If not specified the default value of Random
           is used. Other supported modes are RoundRobin and NonJavaCallout that calls out to a external site
           selector.

       pegasus.catalog.replica
           Specifies the type of replica catalog to be used.

           If not specified, then the value defaults to RLS.

       pegasus.catalog.replica.url
           Contact string to access the replica catalog. In case of RLS it is the RLI url.

       pegasus.dir.exec
           A suffix to the workdir in the site catalog to determine the current working directory. If relative,
           the value will be appended to the working directory from the site.config file. If absolute it
           constitutes the working directory.

       pegasus.catalog.transformation
           Specifies the type of transformation catalog to be used. One can use either a file based or a
           database based transformation catalog. At present the default is Text.

       pegasus.catalog.transformation.file
           The location of file to use as transformation catalog.

           If not specified, then the default location of $PEGASUS_HOME/var/tc.data is used.

       pegasus.catalog.site
           Specifies the type of site catalog to be used. One can use either a text based or an xml based site
           catalog. At present the default is XML3.

       pegasus.catalog.site.file
           The location of file to use as a site catalog. If not specified, then default value of
           $PEGASUS_HOME/etc/sites.xml is used in case of the xml based site catalog and
           $PEGASUS_HOME/etc/sites.txt in case of the text based site catalog.

       pegasus.data.configuration
           This property sets up Pegasus to run in different environments. This can be set to

           sharedfs If this is set, Pegasus will be setup to execute jobs on the shared filesystem on the
           execution site. This assumes, that the head node of a cluster and the worker nodes share a
           filesystem. The staging site in this case is the same as the execution site.

           nonsharedfs If this is set, Pegasus will be setup to execute jobs on an execution site without
           relying on a shared filesystem between the head node and the worker nodes.

           condorio If this is set, Pegasus will be setup to run jobs in a pure condor pool, with the nodes not
           sharing a filesystem. Data is staged to the compute nodes from the submit host using Condor File IO.

       pegasus.code.generator
           The code generator to use. By default, Condor submit files are generated for the executable workflow.
           Setting to Shell results in Pegasus generating a shell script that can be executed on the submit
           host.

FILES

       $PEGASUS_HOME/etc/dax-3.3.xsd
           is the suggested location of the latest DAX schema to produce DAX output.

       $PEGASUS_HOME/etc/sc-4.0.xsd
           is the suggested location of the latest Site Catalog schema that is used to create the XML version of
           the site catalog

       $PEGASUS_HOME/etc/tc.data.text
           is the suggested location for the file corresponding to the Transformation Catalog.

       $PEGASUS_HOME/etc/sites.xml4 | $PEGASUS_HOME/etc/sites.xml3
           is the suggested location for the file containing the site information.

       $PEGASUS_HOME/lib/pegasus.jar
           contains all compiled Java bytecode to run the Pegasus Workflow Planner.

SEE ALSO

       pegasus-run(1), pegasus-status(1), pegasus-remove(1), pegasus-rc-client(1), pegasus-analyzer(1)

AUTHORS

       Karan Vahi <vahi at isi dot edu>

       Pegasus Team http://pegasus.isi.edu