Provided by: gridengine-common_6.2u5-3ubuntu1_all bug

NAME

       sge_pe - Sun Grid Engine parallel environment configuration file format

DESCRIPTION

       Parallel environments are parallel programming and runtime environments
       allowing for the execution  of  shared  memory  or  distributed  memory
       parallelized  applications.  Parallel environments usually require some
       kind of setup to be operational before starting parallel  applications.
       Examples  for  common  parallel environments are shared memory parallel
       operating systems and  the  distributed  memory  environments  Parallel
       Virtual Machine (PVM) or Message Passing Interface (MPI).

       sge_pe  allows  for  the definition of interfaces to arbitrary parallel
       environments.  Once a parallel environment is defined or modified  with
       the  -ap  or -mp options to qconf(1) and linked with one or more queues
       via pe_list in queue_conf(5) the environment can be requested for a job
       via  the  -pe  switch to qsub(1) together with a request of a range for
       the number of parallel processes to be allocated by the job. Additional
       -l  options  may  be  used  to  specify  the job requirement to further
       detail.

       Note, Sun Grid Engine allows backslashes (\) be used to escape  newline
       (\newline)  characters. The backslash and the newline are replaced with
       a space (" ") character before any interpretation.

FORMAT

       The format of a sge_pe file is defined as follows:

   pe_name
       The name  of  the  parallel  environment  as  defined  for  pe_name  in
       sge_types(1).  To be used in the qsub(1) -pe switch.

   slots
       The  number  of  parallel processes being allowed to run in total under
       the parallel environment concurrently.  Type is  number,  valid  values
       are 0 to 9999999.

   user_lists
       A  comma separated list of user access list names (see access_list(5)).
       Each user contained in at least one of the enlisted  access  lists  has
       access  to the parallel environment. If the user_lists parameter is set
       to NONE (the default) any user has access being not explicitly excluded
       via  the xuser_lists parameter described below.  If a user is contained
       both in an access list enlisted in xuser_lists and user_lists the  user
       is denied access to the parallel environment.

   xuser_lists
       The  xuser_lists parameter contains a comma separated list of so called
       user access lists as described in access_list(5).  Each user  contained
       in  at  least one of the enlisted access lists is not allowed to access
       the parallel environment. If the xuser_lists parameter is set  to  NONE
       (the  default)  any  user has access. If a user is contained both in an
       access list enlisted in xuser_lists and user_lists the user  is  denied
       access to the parallel environment.

   start_proc_args
       The  invocation  command  line of a start-up procedure for the parallel
       environment. The start-up procedure is invoked by sge_shepherd(8) prior
       to  executing  the  job  script.  Its  purpose is to setup the parallel
       environment correspondingly to its needs.  An optional  prefix  "user@"
       specifies  the  user  under which this procedure is to be started.  The
       standard output of the start-up procedure is  redirected  to  the  file
       REQNAME.poJID  in  the  job's  working  directory  (see  qsub(1)), with
       REQNAME being the name of the job as  displayed  by  qstat(1)  and  JID
       being  the  job's  identification number.  Likewise, the standard error
       output is redirected to REQNAME.peJID
       The following special variables being expanded at runtime can  be  used
       (besides  any  other  strings which have to be interpreted by the start
       and stop procedures) to constitute a command line:

       $pe_hostfile
              The pathname of a file containing a detailed description of  the
              layout  of  the parallel environment to be setup by the start-up
              procedure. Each line of the file  refers  to  a  host  on  which
              parallel  processes  are to be run. The first entry of each line
              denotes the hostname, the second entry the  number  of  parallel
              processes to be run on the host, the third entry the name of the
              queue, and the fourth entry a processor range to be used in case
              of a multiprocessor machine.

       $host  The  name  of  the host on which the start-up or stop procedures
              are started.

       $job_owner
              The user name of the job owner.

       $job_id
              Sun Grid Engine's unique job identification number.

       $job_name
              The name of the job.

       $pe    The name of the parallel environment in use.

       $pe_slots
              Number of slots granted for the job.

       $processors
              The processors string as contained in  the  queue  configuration
              (see  queue_conf(5)) of the master queue (the queue in which the
              start-up and stop procedures are started).

       $queue The cluster queue of the master queue instance.

   stop_proc_args
       The invocation command line of a shutdown procedure  for  the  parallel
       environment. The shutdown procedure is invoked by sge_shepherd(8) after
       the job script has finished.  Its  purpose  is  to  stop  the  parallel
       environment  and  to  remove  it  from  all  participating systems.  An
       optional prefix "user@" specifies the user under which  this  procedure
       is  to  be  started.  The standard output of the stop procedure is also
       redirected to the file REQNAME.poJID in  the  job's  working  directory
       (see  qsub(1)),  with REQNAME being the name of the job as displayed by
       qstat(1) and JID being the job's identification number.  Likewise,  the
       standard error output is redirected to REQNAME.peJID
       The  same  special  variables  as  for  start_proc_args  can be used to
       constitute a command line.

   allocation_rule
       The allocation rule is interpreted by the scheduler  thread  and  helps
       the  scheduler to decide how to distribute parallel processes among the
       available machines. If, for instance, a parallel environment  is  built
       for  shared memory applications only, all parallel processes have to be
       assigned to a single machine, no matter how much suitable machines  are
       available.    If,   however,   the  parallel  environment  follows  the
       distributed memory paradigm, an even distribution  of  processes  among
       machines may be favorable.
       The  current  version  of  the scheduler only understands the following
       allocation rules:

       <int>:    An integer number fixing the number of processes per host. If
                 the  number  is  1, all processes have to reside on different
                 hosts. If the special denominator $pe_slots is used, the full
                 range  of  processes as specified with the qsub(1) -pe switch
                 has to be allocated on a single host (no matter  which  value
                 belonging  to  the  range is finally chosen for the job to be
                 allocated).

       $fill_up: Starting from the best  suitable  host/queue,  all  available
                 slots are allocated. Further hosts and queues are "filled up"
                 as long as a job still requires slots for parallel tasks.

       $round_robin:
                 From all suitable hosts a single slot is allocated until  all
                 tasks  requested  by the parallel job are dispatched. If more
                 tasks are requested than suitable hosts are found, allocation
                 starts  again  from  the  first  host.  The allocation scheme
                 walks through suitable hosts in a best-suitable-first order.

   control_slaves
       This parameter can be set to TRUE or FALSE (the default). It  indicates
       whether Sun Grid Engine is the creator of the slave tasks of a parallel
       application via sge_execd(8) and  sge_shepherd(8)  and  thus  has  full
       control  over  all  processes  in a parallel application, which enables
       capabilities  such  as  resource  limitation  and  correct  accounting.
       However,   to   gain  control  over  the  slave  tasks  of  a  parallel
       application, a sophisticated PE  interface  is  required,  which  works
       closely  together  with  Sun Grid Engine facilities. Such PE interfaces
       are available through your local Sun Grid Engine support office.

       Please set the control_slaves parameter  to  false  for  all  other  PE
       interfaces.

   job_is_first_task
       The job_is_first_task parameter can be set to TRUE or FALSE. A value of
       TRUE indicates that the Sun Grid Engine job script already contains one
       of  the tasks of the parallel application (the number of slots reserved
       for the job is the number of slots  requested  with  the  -pe  switch),
       while  a  value  of  FALSE indicates that the job script (and its child
       processes) is not part of the parallel program  (the  number  of  slots
       reserved  for  the  job  is  the number of slots requested with the -pe
       switch + 1).

       If  wallclock  accounting  is  used  (execd_params  ACCT_RESERVED_USAGE
       and/or  SHARETREE_RESERVED_USAGE set to TRUE) and control_slaves is set
       to FALSE, the job_is_first_task parameter influences the accounting for
       the  job:  A  value of TRUE means that accounting for cpu and requested
       memory gets multiplied by the number of slots requested  with  the  -pe
       switch,   if   job_is_first_task   is  set  to  FALSE,  the  accounting
       information gets multiplied by number of slots + 1.

   urgency_slots
       For pending jobs with a slot range PE request the number  of  slots  is
       not  determined.  This  setting  specifies the method to be used by Sun
       Grid Engine to assess the number of slots such jobs might finally get.

       The  assumed  slot  allocation  has  a  meaning  when  determining  the
       resource-request-based  priority  contribution for numeric resources as
       described in sge_priority(5) and is  displayed  when  qstat(1)  is  run
       without -g t option.

       The following methods are supported:

       <int>:    The  specified integer number is directly used as prospective
                 slot amount.

       min:      The slot range minimum is used as prospective slot amount. If
                 no lower bound is specified with the range 1 is assumed.

       max:      The  of  the  slot  range maximum is used as prospective slot
                 amount.  If no upper bound is specified with  the  range  the
                 absolute  maximum  possible  due to the PE's slots setting is
                 assumed.

       avg:      The average of all numbers  occurring  within  the  job's  PE
                 range request is assumed.

   accounting_summary
       This  parameter is only checked if control_slaves (see above) is set to
       TRUE and thus Sun Grid Engine is the creator of the slave  tasks  of  a
       parallel  application  via  sge_execd(8)  and sge_shepherd(8).  In this
       case, accounting information is available for every single  slave  task
       started by Sun Grid Engine.

       The  accounting_summary  parameter can be set to TRUE or FALSE. A value
       of TRUE indicates that only a single accounting record  is  written  to
       the  accounting(5) file, containing the accounting summary of the whole
       job including all slave tasks, while a  value  of  FALSE  indicates  an
       individual  accounting(5)  record  is  written for every slave task, as
       well as for the master task.
       Note:     When     running     tightly     integrated     jobs     with
       SHARETREE_RESERVED_USAGE   set,   and  with  having  accounting_summary
       enabled in the  parallel  environment,  reserved  usage  will  only  be
       reported  by the master task of the parallel job.  No per parallel task
       usage  records  will  be  sent  from  execd  to  qmaster,   which   can
       significantly  reduce  load  on  qmaster  when  running  large  tightly
       integrated parallel jobs.

RESTRICTIONS

       Note, that the functionality of the start-up,  shutdown  and  signaling
       procedures   remains  the  full  responsibility  of  the  administrator
       configuring the parallel environment.  Sun Grid Engine will just invoke
       these  procedures  and evaluate their exit status. If the procedures do
       not perform their tasks properly or if the parallel environment or  the
       parallel  application behave unexpectedly, Sun Grid Engine has no means
       to detect this.

SEE ALSO

       sge_intro(1),  sge__types(1),  qconf(1),  qdel(1),  qmod(1),   qsub(1),
       access_list(5), sge_qmaster(8), sge_shepherd(8).

COPYRIGHT

       See sge_intro(1) for a full statement of rights and permissions.