Provided by: gridengine-common_6.2u5-7.4_all bug

NAME

       sge_pe - Sun Grid Engine parallel environment configuration file format

DESCRIPTION

       Parallel  environments  are  parallel  programming and runtime environments allowing for the execution of
       shared memory or distributed memory parallelized applications. Parallel environments usually require some
       kind of setup to be operational before starting parallel  applications.   Examples  for  common  parallel
       environments  are  shared  memory  parallel  operating  systems  and  the distributed memory environments
       Parallel Virtual Machine (PVM) or Message Passing Interface (MPI).

       sge_pe allows for the definition of interfaces to  arbitrary  parallel  environments.   Once  a  parallel
       environment  is  defined  or modified with the -ap or -mp options to qconf(1) and linked with one or more
       queues via pe_list in queue_conf(5) the environment can be requested for a job  via  the  -pe  switch  to
       qsub(1)  together  with  a request of a range for the number of parallel processes to be allocated by the
       job. Additional -l options may be used to specify the job requirement to further detail.

       Note, Sun Grid Engine allows backslashes (\)  be  used  to  escape  newline  (\newline)  characters.  The
       backslash and the newline are replaced with a space (" ") character before any interpretation.

FORMAT

       The format of a sge_pe file is defined as follows:

   pe_name
       The  name  of the parallel environment as defined for pe_name in sge_types(1).  To be used in the qsub(1)
       -pe switch.

   slots
       The number of  parallel  processes  being  allowed  to  run  in  total  under  the  parallel  environment
       concurrently.  Type is number, valid values are 0 to 9999999.

   user_lists
       A  comma  separated list of user access list names (see access_list(5)).  Each user contained in at least
       one of the enlisted access lists has access to the parallel environment. If the user_lists  parameter  is
       set to NONE (the default) any user has access being not explicitly excluded via the xuser_lists parameter
       described  below.   If  a user is contained both in an access list enlisted in xuser_lists and user_lists
       the user is denied access to the parallel environment.

   xuser_lists
       The xuser_lists parameter contains a comma separated list of so called user access lists as described  in
       access_list(5).   Each  user  contained  in  at  least one of the enlisted access lists is not allowed to
       access the parallel environment. If the xuser_lists parameter is set to NONE (the default) any  user  has
       access.  If a user is contained both in an access list enlisted in xuser_lists and user_lists the user is
       denied access to the parallel environment.

   start_proc_args
       The invocation command line of a start-up procedure for the parallel environment. The start-up  procedure
       is  invoked  by  sge_shepherd(8)  prior to executing the job script. Its purpose is to setup the parallel
       environment correspondingly to its needs.  An optional prefix "user@" specifies the user under which this
       procedure is to be started.  The standard output of the start-up procedure  is  redirected  to  the  file
       REQNAME.poJID  in  the  job's  working directory (see qsub(1)), with REQNAME being the name of the job as
       displayed by qstat(1) and JID being the job's identification number.  Likewise, the standard error output
       is redirected to REQNAME.peJID
       The following special variables being expanded at runtime can be used (besides any  other  strings  which
       have to be interpreted by the start and stop procedures) to constitute a command line:

       $pe_hostfile
              The pathname of a file containing a detailed description of the layout of the parallel environment
              to  be  setup  by the start-up procedure. Each line of the file refers to a host on which parallel
              processes are to be run. The first entry of each line denotes the hostname, the second  entry  the
              number of parallel processes to be run on the host, the third entry the name of the queue, and the
              fourth entry a processor range to be used in case of a multiprocessor machine.

       $host  The name of the host on which the start-up or stop procedures are started.

       $job_owner
              The user name of the job owner.

       $job_id
              Sun Grid Engine's unique job identification number.

       $job_name
              The name of the job.

       $pe    The name of the parallel environment in use.

       $pe_slots
              Number of slots granted for the job.

       $processors
              The  processors  string  as contained in the queue configuration (see queue_conf(5)) of the master
              queue (the queue in which the start-up and stop procedures are started).

       $queue The cluster queue of the master queue instance.

   stop_proc_args
       The invocation command line of a shutdown procedure for the parallel environment. The shutdown  procedure
       is  invoked  by  sge_shepherd(8)  after  the job script has finished. Its purpose is to stop the parallel
       environment and to remove it from all participating systems.  An optional prefix  "user@"  specifies  the
       user  under  which  this  procedure  is to be started.  The standard output of the stop procedure is also
       redirected to the file REQNAME.poJID in the job's working directory (see qsub(1)), with REQNAME being the
       name of the job as displayed by qstat(1) and JID being the job's identification  number.   Likewise,  the
       standard error output is redirected to REQNAME.peJID
       The same special variables as for start_proc_args can be used to constitute a command line.

   allocation_rule
       The  allocation  rule  is  interpreted  by  the scheduler thread and helps the scheduler to decide how to
       distribute parallel processes among the available machines. If, for instance, a parallel  environment  is
       built  for  shared  memory  applications  only,  all  parallel  processes have to be assigned to a single
       machine, no matter how much suitable machines are  available.   If,  however,  the  parallel  environment
       follows  the  distributed  memory  paradigm,  an  even  distribution  of  processes among machines may be
       favorable.
       The current version of the scheduler only understands the following allocation rules:

       <int>:    An integer number fixing the number of processes per host. If the number is  1,  all  processes
                 have to reside on different hosts. If the special denominator $pe_slots is used, the full range
                 of  processes as specified with the qsub(1) -pe switch has to be allocated on a single host (no
                 matter which value belonging to the range is finally chosen for the job to be allocated).

       $fill_up: Starting from the best suitable host/queue, all available slots are  allocated.  Further  hosts
                 and queues are "filled up" as long as a job still requires slots for parallel tasks.

       $round_robin:
                 From  all  suitable  hosts a single slot is allocated until all tasks requested by the parallel
                 job are dispatched. If more tasks are requested  than  suitable  hosts  are  found,  allocation
                 starts  again  from  the  first  host.  The allocation scheme walks through suitable hosts in a
                 best-suitable-first order.

   control_slaves
       This parameter can be set to TRUE or FALSE (the default). It indicates whether Sun  Grid  Engine  is  the
       creator  of  the  slave tasks of a parallel application via sge_execd(8) and sge_shepherd(8) and thus has
       full control over all processes in a parallel application, which enables capabilities  such  as  resource
       limitation  and  correct  accounting.  However,  to  gain  control  over  the  slave  tasks of a parallel
       application, a sophisticated PE interface is required, which works closely together with Sun Grid  Engine
       facilities. Such PE interfaces are available through your local Sun Grid Engine support office.

       Please set the control_slaves parameter to false for all other PE interfaces.

   job_is_first_task
       The  job_is_first_task parameter can be set to TRUE or FALSE. A value of TRUE indicates that the Sun Grid
       Engine job script already contains one of the tasks of the parallel  application  (the  number  of  slots
       reserved  for  the  job  is  the  number  of slots requested with the -pe switch), while a value of FALSE
       indicates that the job script (and its child processes) is not part of the parallel program  (the  number
       of slots reserved for the job is the number of slots requested with the -pe switch + 1).

       If  wallclock accounting is used (execd_params ACCT_RESERVED_USAGE and/or SHARETREE_RESERVED_USAGE set to
       TRUE) and control_slaves is set to FALSE, the job_is_first_task parameter influences the  accounting  for
       the job: A value of TRUE means that accounting for cpu and requested memory gets multiplied by the number
       of  slots requested with the -pe switch, if job_is_first_task is set to FALSE, the accounting information
       gets multiplied by number of slots + 1.

   urgency_slots
       For pending jobs with a slot range PE request the  number  of  slots  is  not  determined.  This  setting
       specifies  the method to be used by Sun Grid Engine to assess the number of slots such jobs might finally
       get.

       The  assumed  slot  allocation  has  a  meaning  when  determining  the  resource-request-based  priority
       contribution  for numeric resources as described in sge_priority(5) and is displayed when qstat(1) is run
       without -g t option.

       The following methods are supported:

       <int>:    The specified integer number is directly used as prospective slot amount.

       min:      The slot range minimum is used as prospective slot amount. If no lower bound is specified  with
                 the range 1 is assumed.

       max:      The  of  the  slot  range  maximum  is  used  as prospective slot amount.  If no upper bound is
                 specified with the range the absolute maximum  possible  due  to  the  PE's  slots  setting  is
                 assumed.

       avg:      The average of all numbers occurring within the job's PE range request is assumed.

   accounting_summary
       This  parameter  is only checked if control_slaves (see above) is set to TRUE and thus Sun Grid Engine is
       the creator of the slave tasks of a parallel application via sge_execd(8) and sge_shepherd(8).   In  this
       case, accounting information is available for every single slave task started by Sun Grid Engine.

       The  accounting_summary  parameter  can  be  set  to TRUE or FALSE. A value of TRUE indicates that only a
       single accounting record is written to the accounting(5) file, containing the accounting summary  of  the
       whole  job including all slave tasks, while a value of FALSE indicates an individual accounting(5) record
       is written for every slave task, as well as for the master task.
       Note:  When  running  tightly  integrated  jobs  with  SHARETREE_RESERVED_USAGE  set,  and  with   having
       accounting_summary  enabled  in  the  parallel  environment,  reserved usage will only be reported by the
       master task of the parallel job.  No per parallel task usage records will be sent from execd to  qmaster,
       which can significantly reduce load on qmaster when running large tightly integrated parallel jobs.

RESTRICTIONS

       Note,  that  the  functionality  of  the  start-up,  shutdown  and  signaling procedures remains the full
       responsibility of the administrator configuring the parallel environment.   Sun  Grid  Engine  will  just
       invoke  these  procedures  and  evaluate  their exit status. If the procedures do not perform their tasks
       properly or if the parallel environment or the parallel application behave unexpectedly, Sun Grid  Engine
       has no means to detect this.

SEE ALSO

       sge_intro(1),   sge__types(1),  qconf(1),  qdel(1),  qmod(1),  qsub(1),  access_list(5),  sge_qmaster(8),
       sge_shepherd(8).

COPYRIGHT

       See sge_intro(1) for a full statement of rights and permissions.

SGE 6.2u5                                            $Date$                                            SGE_PE(5)