Provided by: coop-computing-tools_9.9-2ubuntu3_amd64 bug


       work_queue_factory - maintain a pool of Work Queue workers on a batch system.


       work_queue_factory -M <project-name> -T <batch-type> [options]


       work_queue_factory  submits  and  maintains  a number of work_queue_worker(1) processes on
       various  batch  systems,  such  as  Condor  and  SGE.   All  the  workers  managed  by   a
       work_queue_factory  process will be directed to work for a specific manager, or any set of
       managers matching a given project name.  work_queue_factory will  automatically  determine
       the  correct number of workers to have running, based on criteria set on the command line.
       The decision on how many workers to run is reconsidered once per minute.

       By default, work_queue_factory will run as many workers as  the  indicated  managers  have
       tasks  ready  to run.  If there are multiple managers, then enough workers will be started
       to satisfy their collective needs.  For example, if there are two managers with  the  same
       project  name, each with 10 tasks to run, then work_queue_factory will start a total of 20

       If the number of needed workers increases, work_queue_factory will submit more workers  to
       meet  the  desired  need.   However,  it  will not run more than a fixed maximum number of
       workers, given by the -W option.

       If the need for workers drops, work_queue_factory does not remove  them  immediately,  but
       waits  to  them  to  exit on their own.  (This happens when the worker has been idle for a
       certain time.)  A minimum number of workers will be maintained, given by the -w option.

       If given the -c option, then work_queue_factory will consider  the  capacity  reported  by
       each  manager.  The capacity is the estimated number of workers that the manager thinks it
       can handle, based on the task execution and data transfer times currently observed at  the
       manager.   With  the -c option on, work_queue_factory will consider the manager's capacity
       to be the maximum number of workers to run.

       If work_queue_factory receives a terminating signal, it will attempt to remove all running
       workers before exiting.


       General options:


              Batch  system  type  (required).  One of: local, wq, condor, sge, pbs, lsf, torque,
              moab, mpi, slurm, chirp, amazon, amazon-batch, lambda, mesos, k8s, dryrun


               Use configuration file <file>.


               Project name of managers to server, can be regex


               Foremen to serve, can be a regular expression.


               Catalog server to query for managers.


               Password file for workers to authenticate.


               Use this scratch dir for factory.



               Force factory to run itself as a manager.


               Exit if parent process dies.


               Enable debugging for this subsystem.


               Send debugging to this file.


               Specify the size of the debug file.


               Show the version string.


               Show this screen.

              Concurrent control options:


               Minimum workers running (default=5).


               Maximum workers running (default=100).


               Max number of new workers per 30s (default=5)


               Workers abort after idle time (default=300).


               Exit after no manager seen in <n> seconds.


               Average tasks per worker (default=one per core).


               Use worker capacity reported by managers.

              Resource management options:


               Set the number of cores requested per worker.


               Set the number of GPUs requested per worker.


               Set the amount of memory (in MB) per worker.


               Set the amount of disk (in MB) per worker.


               Autosize worker to slot (Condor, Mesos, K8S).

              Worker environment options:


               Environment variable to add to worker.


               Extra options to give to worker.


               Alternate binary instead of work_queue_worker.


               Wrap factory with this command prefix.


               Add this input file needed by the wrapper.


               Use runos tool to create environment (ND only).


               Run each worker inside this python package.

              Options  specific to batch systems:


               Generic batch system options.


               Specify Amazon config file.


               Set requirements for the workers as Condor jobs.


               Host name of mesos manager node..


               Path to mesos python library..


               Libraries for running mesos.


               Container image for Kubernetes.


               Container image with worker for Kubernetes.


       On success, returns zero. On failure, returns non-zero.


       Suppose you have a Work Queue manager with  a  project  name  of  "barney".   To  maintain
       workers for barney, do this:

               work_queue_factory -T condor -M barney

       To maintain a maximum of 100 workers on an SGE batch system, do this:

               work_queue_factory -T sge -M barney -W 100

       To start workers such that the workers exit after 5 minutes (300s) of idleness:

               work_queue_factory -T condor -M barney -t 300

       If you want to start workers that match any project that begins with barney, use a regular

               work_queue_factory -T condor -M barney.-t 300

       If running on condor, you may manually specify condor requirements:

               work_queue_factory -T condor -M barney --condor-requirements 'MachineGroup == "disc"' --condor-requirements 'has_matlab == true'

       Repeated uses of condor-requirements  are  and-ed  together.  The  previous  example  will
       produce a statement equivalent to:

       requirements = ((MachineGroup == "disc") && (has_matlab == true))

       Use the configuration file my_conf:

               work_queue_factory -Cmy_conf

       my_conf should be a proper JSON document, as:

                       "manager-name": "my_manager.*",
                       "max-workers": 100,
                       "min-workers": 0

       Valid configuration fields are:



       The  capacity  measurement  currently  assumes  single-core  tasks  running on single-core
       workers, and behaves unexpectedly with multi-core tasks or multi-core workers.


       The Cooperative Computing Tools are Copyright (C) 2005-2019 The University of Notre  Dame.
       This  software  is distributed under the GNU General Public License.  See the file COPYING
       for details.


Cooperative Computing Tools DocumentationWork Queue User Manualwork_queue_worker(1)            work_queue_status(1)             work_queue_factory(1)
           condor_submit_workers(1) sge_submit_workers(1) torque_submit_workers(1)