Provided by: gridengine-common_6.2u5-4_all bug

NAME

       sge_pe - Sun Grid Engine parallel environment configuration file format

DESCRIPTION

       Parallel  environments  are parallel programming and runtime environments allowing for the
       execution of shared memory  or  distributed  memory  parallelized  applications.  Parallel
       environments usually require some kind of setup to be operational before starting parallel
       applications.  Examples for  common  parallel  environments  are  shared  memory  parallel
       operating  systems  and the distributed memory environments Parallel Virtual Machine (PVM)
       or Message Passing Interface (MPI).

       sge_pe allows for the definition of interfaces to arbitrary parallel environments.  Once a
       parallel  environment  is  defined or modified with the -ap or -mp options to qconf(1) and
       linked with one or more queues  via  pe_list  in  queue_conf(5)  the  environment  can  be
       requested  for  a job via the -pe switch to qsub(1) together with a request of a range for
       the number of parallel processes to be allocated by the job. Additional -l options may  be
       used to specify the job requirement to further detail.

       Note,  Sun  Grid  Engine  allows  backslashes  (\)  be  used  to escape newline (\newline)
       characters. The backslash and the newline are replaced with a space (" ") character before
       any interpretation.

FORMAT

       The format of a sge_pe file is defined as follows:

   pe_name
       The  name  of the parallel environment as defined for pe_name in sge_types(1).  To be used
       in the qsub(1) -pe switch.

   slots
       The number of parallel processes  being  allowed  to  run  in  total  under  the  parallel
       environment concurrently.  Type is number, valid values are 0 to 9999999.

   user_lists
       A  comma  separated  list  of  user  access  list  names  (see access_list(5)).  Each user
       contained in at least one of  the  enlisted  access  lists  has  access  to  the  parallel
       environment.  If the user_lists parameter is set to NONE (the default) any user has access
       being not explicitly excluded via the xuser_lists parameter described below.  If a user is
       contained both in an access list enlisted in xuser_lists and user_lists the user is denied
       access to the parallel environment.

   xuser_lists
       The xuser_lists parameter contains a comma separated list of so called user  access  lists
       as  described  in  access_list(5).   Each  user  contained in at least one of the enlisted
       access lists is not allowed  to  access  the  parallel  environment.  If  the  xuser_lists
       parameter is set to NONE (the default) any user has access. If a user is contained both in
       an access list enlisted in xuser_lists and user_lists the user is  denied  access  to  the
       parallel environment.

   start_proc_args
       The  invocation  command  line  of  a start-up procedure for the parallel environment. The
       start-up procedure is invoked by sge_shepherd(8) prior to executing the  job  script.  Its
       purpose  is  to  setup the parallel environment correspondingly to its needs.  An optional
       prefix "user@" specifies the user under which  this  procedure  is  to  be  started.   The
       standard  output  of the start-up procedure is redirected to the file REQNAME.poJID in the
       job's working directory (see qsub(1)), with REQNAME being the name of the job as displayed
       by  qstat(1)  and JID being the job's identification number.  Likewise, the standard error
       output is redirected to REQNAME.peJID
       The following special variables being expanded at runtime can be used (besides  any  other
       strings  which  have  to  be interpreted by the start and stop procedures) to constitute a
       command line:

       $pe_hostfile
              The pathname of a file containing a detailed  description  of  the  layout  of  the
              parallel  environment  to be setup by the start-up procedure. Each line of the file
              refers to a host on which parallel processes are to be run. The first entry of each
              line  denotes the hostname, the second entry the number of parallel processes to be
              run on the host, the third entry the name of the queue,  and  the  fourth  entry  a
              processor range to be used in case of a multiprocessor machine.

       $host  The name of the host on which the start-up or stop procedures are started.

       $job_owner
              The user name of the job owner.

       $job_id
              Sun Grid Engine's unique job identification number.

       $job_name
              The name of the job.

       $pe    The name of the parallel environment in use.

       $pe_slots
              Number of slots granted for the job.

       $processors
              The  processors  string as contained in the queue configuration (see queue_conf(5))
              of the master queue (the queue in  which  the  start-up  and  stop  procedures  are
              started).

       $queue The cluster queue of the master queue instance.

   stop_proc_args
       The  invocation  command  line  of  a shutdown procedure for the parallel environment. The
       shutdown procedure is invoked by sge_shepherd(8) after the job script  has  finished.  Its
       purpose  is  to  stop  the  parallel  environment  and to remove it from all participating
       systems.  An optional prefix "user@" specifies the user under which this procedure  is  to
       be  started.   The  standard  output  of the stop procedure is also redirected to the file
       REQNAME.poJID in the job's working directory (see qsub(1)), with REQNAME being the name of
       the job as displayed by qstat(1) and JID being the job's identification number.  Likewise,
       the standard error output is redirected to REQNAME.peJID
       The same special variables as for start_proc_args can be  used  to  constitute  a  command
       line.

   allocation_rule
       The  allocation  rule  is  interpreted  by the scheduler thread and helps the scheduler to
       decide how to  distribute  parallel  processes  among  the  available  machines.  If,  for
       instance,  a  parallel  environment  is  built  for  shared  memory applications only, all
       parallel processes have to be assigned to a single machine, no matter  how  much  suitable
       machines  are  available.   If,  however, the parallel environment follows the distributed
       memory paradigm, an even distribution of processes among machines may be favorable.
       The current version of the scheduler only understands the following allocation rules:

       <int>:    An integer number fixing the number of processes per host. If the number  is  1,
                 all  processes  have  to  reside  on different hosts. If the special denominator
                 $pe_slots is used, the full range of processes as specified with the qsub(1) -pe
                 switch  has to be allocated on a single host (no matter which value belonging to
                 the range is finally chosen for the job to be allocated).

       $fill_up: Starting from the best suitable host/queue, all available slots  are  allocated.
                 Further  hosts  and queues are "filled up" as long as a job still requires slots
                 for parallel tasks.

       $round_robin:
                 From all suitable hosts a single slot is allocated until all tasks requested  by
                 the parallel job are dispatched. If more tasks are requested than suitable hosts
                 are found, allocation starts again from the first host.  The  allocation  scheme
                 walks through suitable hosts in a best-suitable-first order.

   control_slaves
       This  parameter  can  be set to TRUE or FALSE (the default). It indicates whether Sun Grid
       Engine is the creator of the slave tasks of a parallel application  via  sge_execd(8)  and
       sge_shepherd(8)  and  thus  has full control over all processes in a parallel application,
       which enables capabilities such as resource limitation and correct accounting. However, to
       gain  control over the slave tasks of a parallel application, a sophisticated PE interface
       is required, which works closely  together  with  Sun  Grid  Engine  facilities.  Such  PE
       interfaces are available through your local Sun Grid Engine support office.

       Please set the control_slaves parameter to false for all other PE interfaces.

   job_is_first_task
       The  job_is_first_task  parameter  can  be set to TRUE or FALSE. A value of TRUE indicates
       that the Sun Grid Engine job script already contains one of  the  tasks  of  the  parallel
       application  (the  number  of  slots reserved for the job is the number of slots requested
       with the -pe switch), while a value of FALSE indicates that the job script (and its  child
       processes)  is  not part of the parallel program (the number of slots reserved for the job
       is the number of slots requested with the -pe switch + 1).

       If   wallclock   accounting   is    used    (execd_params    ACCT_RESERVED_USAGE    and/or
       SHARETREE_RESERVED_USAGE   set   to   TRUE)  and  control_slaves  is  set  to  FALSE,  the
       job_is_first_task parameter influences the accounting for the job: A value of  TRUE  means
       that  accounting  for  cpu  and  requested  memory  gets multiplied by the number of slots
       requested with the -pe switch, if  job_is_first_task  is  set  to  FALSE,  the  accounting
       information gets multiplied by number of slots + 1.

   urgency_slots
       For  pending jobs with a slot range PE request the number of slots is not determined. This
       setting specifies the method to be used by Sun Grid Engine to assess the number  of  slots
       such jobs might finally get.

       The  assumed  slot  allocation  has  a meaning when determining the resource-request-based
       priority contribution for  numeric  resources  as  described  in  sge_priority(5)  and  is
       displayed when qstat(1) is run without -g t option.

       The following methods are supported:

       <int>:    The specified integer number is directly used as prospective slot amount.

       min:      The  slot range minimum is used as prospective slot amount. If no lower bound is
                 specified with the range 1 is assumed.

       max:      The of the slot range maximum is used as prospective slot amount.  If  no  upper
                 bound  is specified with the range the absolute maximum possible due to the PE's
                 slots setting is assumed.

       avg:      The average of all numbers occurring  within  the  job's  PE  range  request  is
                 assumed.

   accounting_summary
       This  parameter  is only checked if control_slaves (see above) is set to TRUE and thus Sun
       Grid Engine is the creator of the slave tasks of a parallel application  via  sge_execd(8)
       and  sge_shepherd(8).   In this case, accounting information is available for every single
       slave task started by Sun Grid Engine.

       The accounting_summary parameter can be set to TRUE or FALSE. A value  of  TRUE  indicates
       that  only a single accounting record is written to the accounting(5) file, containing the
       accounting summary of the whole job including all slave tasks,  while  a  value  of  FALSE
       indicates  an  individual accounting(5) record is written for every slave task, as well as
       for the master task.
       Note: When running tightly integrated jobs with  SHARETREE_RESERVED_USAGE  set,  and  with
       having accounting_summary enabled in the parallel environment, reserved usage will only be
       reported by the master task of the parallel job.  No per parallel task usage records  will
       be sent from execd to qmaster, which can significantly reduce load on qmaster when running
       large tightly integrated parallel jobs.

RESTRICTIONS

       Note, that the functionality of the start-up, shutdown and  signaling  procedures  remains
       the  full  responsibility  of the administrator configuring the parallel environment.  Sun
       Grid Engine will just invoke these procedures and  evaluate  their  exit  status.  If  the
       procedures  do  not  perform  their  tasks  properly or if the parallel environment or the
       parallel application behave unexpectedly, Sun Grid Engine has no means to detect this.

SEE ALSO

       sge_intro(1),  sge__types(1),  qconf(1),  qdel(1),   qmod(1),   qsub(1),   access_list(5),
       sge_qmaster(8), sge_shepherd(8).

COPYRIGHT

       See sge_intro(1) for a full statement of rights and permissions.