Provided by: gridengine-common_8.1.9+dfsg-10build1_all bug


       sge_pe - Grid Engine parallel environment configuration file format


       Parallel  environments  are  parallel  programming and runtime environments supporting the
       execution of shared memory  or  distributed  memory  parallelized  applications.  Parallel
       environments usually require some kind of setup to be operational before starting parallel
       applications.  Examples of common  parallel  environments  are  OpenMP  on  shared  memory
       multiprocessor   systems,  and  Message  Passing  Interface  (MPI)  on  shared  memory  or
       distributed systems.

       sge_pe allows for the definition of interfaces to arbitrary parallel environments.  Once a
       parallel  environment  is  defined or modified with the -ap or -mp options to qconf(1) and
       linked with one or more queues  via  pe_list  in  queue_conf(5)  the  environment  can  be
       requested  for  a  job via the -pe switch to qsub(1) together with a request for a numeric
       range of parallel processes to be allocated by the job. Additional -l options may be  used
       to specify more detailed job requirements.

       Note,  Grid  Engine  allows  backslashes  (\)  be  used  to escape newline characters. The
       backslash and the newline are replaced with a space character before any interpretation.


       The format of a sge_pe file is defined as follows:

       The name of the parallel environment in the format for pe_name  in  sge_types(1).   To  be
       used in the qsub(1) -pe switch.

       The  total  number  of  slots  (normally one per parallel process or thread) allowed to be
       filled concurrently under the parallel environment.  Type is integer, valid values  are  0
       to 9999999.

       A comma-separated list of user access list names (see access_list(5)).

       Each  user  contained  in  at  least  one of the user_lists access lists has access to the
       parallel environment. If the user_lists parameter is set to NONE (the  default)  any  user
       has access if not explicitly excluded via the xuser_lists parameter.

       Each  user  contained  in  at  least one of the xuser_lists access lists is not allowed to
       access the parallel environment. If the xuser_lists parameter is set to NONE (the default)
       any user has access.

       If  a  user  is contained both in an access list in xuser_lists and user_lists the user is
       denied access to the parallel environment.

       The command line respectively of a startup or shutdown procedure (an  executable  command,
       plus  possible  arguments)  for  the  parallel  environment,  or  "none"  for no procedure
       (typically for tightly integrated PEs).  The command line is started directly,  not  in  a
       shell.   An optional prefix "user@" specifies the username under which the procedure is to
       be started.  In that case see  the  SECURITY  section  below  concerning  security  issues
       running as a privileged user.

       The startup procedure is invoked by sge_shepherd(8) on the master node of the job prior to
       executing the job script. Its purpose is to setup the parallel  environment  according  to
       its  needs.  The shutdown procedure is invoked by sge_shepherd(8) after the job script has
       finished. Its purpose is to stop the parallel  environment  and  to  remove  it  from  all
       participating  systems.   The  standard  output of the procedure is redirected to the file
       REQUEST.poJID in the job's working directory (see qsub(1)), with REQUEST being the name of
       the  job  as  displayed  by  qstat(1),  and  JID  being  the  job's identification number.
       Likewise, the standard error output is redirected to  REQUEST.peJID.   If  the  -e  or  -o
       options  are  given on job submission, the PE error and standard output is merged into the
       paths specified.

       The following special variables, expanded at runtime,  can  be  used  (besides  any  other
       strings  which  have  to  be interpreted by the start and stop procedures) to constitute a
       command line:

              The pathname of a file containing a detailed  description  of  the  layout  of  the
              parallel  environment  to be setup by the start-up procedure. Each line of the file
              refers to a host on which parallel processes are to be run. The first entry of each
              line  denotes the hostname, the second entry the number of parallel processes to be
              run on the host, the third entry the name of the queue.  The entries are  separated
              by spaces.  If -binding pe is specified on job submission, the fourth column is the
              core binding specification as colon-separated socket-core  pairs,  like  "0,0:0,1",
              meaning  the first core on the first socket and the second core on the first socket
              can be used for binding.  Otherwise it will  be  "UNDEFINED".   With  the  obsolete
              queue  processors  specification  the  fourth  entry  could  be  a  multi-processor
              configuration (or "<NULL>").

       $host  The name of the host on which the startup or stop procedures are run.

              The array job task index (0 if not an array job).

              The user name of the job owner.

              Grid Engine's unique job identification number.

              The name of the job.

       $pe    The name of the parallel environment in use.

              Number of slots granted for the job.

              The processors string as contained in the queue configuration  (see  queue_conf(5))
              of the master queue (the queue in which the startup and stop procedures are run).

       $queue The cluster queue of the master queue instance.

              The SGE_CELL environment variable (useful for locating files).

              The SGE_ROOT environment variable (useful for locating files).

              The standard input path.

              The standard error path.

              The standard output path.














       The  start  and stop commands are run with the same environment setting as that of the job
       to be started afterwards (see qsub(1)).

       The allocation rule is interpreted by the scheduler thread  and  helps  the  scheduler  to
       decide  how  to  distribute  parallel  processes  among  the  available  machines. If, for
       instance, a parallel environment  is  built  for  shared  memory  applications  only,  all
       parallel  processes  have  to be assigned to a single machine, no matter how many suitable
       machines are available.  If, however, the parallel  environment  follows  the  distributed
       memory paradigm, an even distribution of processes among machines may be favorable, as may
       packing processes onto the minimum number of machines.

       The current version of the scheduler only understands the following allocation rules:

       int    An integer, fixing the number of processes per host. If it is 1, all processes have
              to reside on different hosts. If the special name $pe_slots is used, the full range
              of processes as specified with the qsub(1) -pe switch has  to  be  allocated  on  a
              single  host (no matter what value belonging to the range is finally chosen for the
              job to be allocated).

              Starting from the best suitable host/queue,  all  available  slots  are  allocated.
              Further  hosts and queues are "filled up" as long as a job still requires slots for
              parallel tasks.

              From all suitable hosts, a single slot is allocated until all  tasks  requested  by
              the  parallel  job  are dispatched. If more tasks are requested than suitable hosts
              are found, allocation starts again from the  first  host.   The  allocation  scheme
              walks through suitable hosts in a most-suitable-first order.

       This parameter can be set to TRUE or FALSE (the default). It indicates whether Grid Engine
       is the creator of  the  slave  tasks  of  a  parallel  application  via  sge_execd(8)  and
       sge_shepherd(8)  and  thus  has  full control over all processes in a parallel application
       ("tight integration").  This enables:

       •      resource limits are enforced for all tasks, even on slave hosts;

       •      resource consumption is properly accounted on all hosts;

       •      proper control of tasks, with no need to write a  customized  terminate  method  to
              ensure that whole job is finished on qdel and that tasks are properly reaped in the
              case of abnormal job termination;

       •      all tasks are started with the appropriate  nice  value  which  was  configured  as
              priority in the queue configuration;

       •      propagation of the job environment to slave hosts, e.g. so that they write into the
              appropriate per-job temporary directory specified by TMPDIR, which  is  created  on
              each host and properly cleaned up.

       To  gain  control  over  the  slave  tasks  of  a parallel application, a sophisticated PE
       interface is required, which works closely together with Grid Engine facilities, typically
       interpreting  the  Grid  Engine  hostfile  and  starting remote tasks with qrsh(1) and its
       -inherit option.  See, for instance, the $SGE_ROOT/mpi directory and the howto pages

       Please set the control_slaves parameter to false for all other PE interfaces.

       The  job_is_first_task  parameter  can  be set to TRUE or FALSE. A value of TRUE indicates
       that the Grid Engine job script  already  contains  one  of  the  tasks  of  the  parallel
       application (and the number of slots reserved for the job is the number of slots requested
       with the -pe switch).  FALSE indicates that the job script (and its  child  processes)  is
       not  part of the parallel program, just being used to kick off the tasks that do the work;
       then the number of slots reserved for the job in the master queue is increased  by  1,  as
       indicated by qstat/qhost.

       This  should  be  TRUE  for  the common modern MPI implementations with tight integration.
       Consider if the allocation rule is $fill_up, and a job is allocated only a single slot  on
       the  master  host; then one of the MPI processes actually runs in that slot, and should be
       accounted as such, so the job is the first task.

       If wallclock accounting is used (execd_params ACCT_RESERVED_USAGE
        and/or SHARETREE_RESERVED_USAGE  Is  TRUE)  and  control_slaves  is  set  to  FALSE,  the
       job_is_first_task  parameter  influences the accounting for the job: A value of TRUE means
       that accounting for CPU and requested memory  gets  multiplied  by  the  number  of  slots
       requested  with the -pe switch.  FALSE means the accounting information gets multiplied by
       number of slots + 1.  Otherwise, the only significant effect of the parameter  is  on  the
       display of the job.

       For  pending  jobs  with  a  slot range PE request with different minimum and maximum, the
       number of slots they will actually use is  not  determined.  This  setting  specifies  the
       method  to  be  used  by Grid Engine to assess the number of slots such jobs might finally

       The assumed slot allocation has a  meaning  when  determining  the  resource-request-based
       priority  contribution  for  numeric  resources  as  described  in  sge_priority(5) and is
       displayed when qstat(1) is run without -g t option.

       The following methods are supported:

       int    The specified integer number is directly used as prospective slot amount.

       min    The slot range minimum is used as prospective slot amount. If  no  lower  bound  is
              specified with the range, 1 is assumed.

       max    The  slot  range  maximum is used as prospective slot amount.  If no upper bound is
              specified with the range, the absolute maximum  possible  due  to  the  PE's  slots
              setting is assumed.

       avg    The average of all numbers occurring within the job's PE range request is assumed.

       This  parameter is only checked if control_slaves (see above) is set to TRUE and thus Grid
       Engine is the creator of the slave tasks of a parallel application  via  sge_execd(8)  and
       sge_shepherd(8).  In this case, accounting information is available for every single slave
       task started by Grid Engine.

       The accounting_summary parameter can be set to TRUE or FALSE. A value  of  TRUE  indicates
       that  only a single accounting record is written to the accounting(5) file, containing the
       accounting summary of the whole job, including all slave tasks, while  a  value  of  FALSE
       indicates  an  individual accounting(5) record is written for every slave task, as well as
       for the master task.

       Note:  When  running  tightly  integrated  jobs  with  SHARETREE_RESERVED_USAGE  set,  and
       accounting_summary  enabled  in  the  parallel  environment,  reserved  usage will only be
       reported by the master task of the parallel job.  No per-parallel task usage records  will
       be  sent  from  execd  to qmaster, which can significantly reduce load on the qmaster when
       running large, tightly integrated parallel jobs.  However, this removes the only  post-hoc
       information about which hosts a job used.

   qsort_args library qsort-function [arg1 ...]
       Specifies  a  method  for  specifying  the  queues/hosts  and order that should be used to
       schedule  a  parallel  job.   For  details,  and  the  API,  consult   the   header   file
       $SGE_ROOT/include/sge_pqs_api.h.  library is the path to the qsort dynamic library, qsort-
       function is the name of the qsort function implemented by the library, and  the  args  are
       arguments  passed  to  qsort.  Substitutions from the hard requested resource list for the
       job are made for any strings of the form $resource, where resource is the full name of the
       resource  as  defined  in the complex(5) list.  If resource is not requested in the job, a
       null string is substituted.


       Note  that  the  functionality  of  the  start  and  stop  procedures  remains  the   full
       responsibility  of  the  administrator  configuring the parallel environment.  Grid Engine
       will invoke these procedures and evaluate their exit status.  A non-zero exit status  will
       put the queue into an error state.  If the start procedure has a non-zero exit status, the
       job will be re-queued.


       If start_proc_args,  or  stop_proc_args  is  specified  with  a  user@  prefix,  the  same
       considerations apply as for the prolog and epilog, as described in the SECURITY section of


       sge_intro(1), sge__types(1), qconf(1), qdel(1), qmod(1), qrsh(1), qsub(1), access_list(5),
       sge_conf(5), sge_qmaster(8), sge_shepherd(8).




       See sge_intro(1) for a full statement of rights and permissions.