Provided by: gridengine-exec_6.2u5-7.4_amd64 bug

NAME

       sge_shepherd - Sun Grid Engine single job controlling agent

SYNOPSIS

       sge_shepherd

DESCRIPTION

       sge_shepherd  provides  the  parent  process  functionality for a single Sun Grid Engine job.  The parent
       functionality is necessary on UNIX systems to retrieve  resource  usage  information  (see  getrusage(2))
       after  a job has finished. In addition, the sge_shepherd forwards signals to the job, such as the signals
       for suspension, enabling, termination and the Sun Grid Engine checkpointing signal (see  sge_ckpt(1)  for
       details).

       The  sge_shepherd  receives  information  about  the job to be started from the sge_execd(8).  During the
       execution of the job it actually starts up to 5 child processes. First a prolog script  is  run  if  this
       feature  is  enabled  by  the  prolog  parameter in the cluster configuration. (See sge_conf(5).)  Next a
       parallel environment startup procedure is run if the job is a  parallel  job.  (See  sge_pe(5)  for  more
       information.)   After  that, the job itself is run, followed by a parallel environment shutdown procedure
       for parallel jobs, and finally an epilog script if requested by  the  epilog  parameter  in  the  cluster
       configuration.  The  prolog  and  epilog scripts as well as the parallel environment startup and shutdown
       procedures are to be provided by the Sun Grid Engine administrator and  are  intended  for  site-specific
       actions to be taken before and after execution of the actual user job.

       After  the  job  has  finished  and the epilog script is processed, sge_shepherd retrieves resource usage
       statistics about the job, places them in a job specific subdirectory of the sge_execd(8) spool  directory
       for reporting through sge_execd(8) and finishes.

       sge_shepherd  also places an exit status file in the spool directory. This exit status can be viewed with
       qacct -j JobId (see qacct(1)); it is not the exit status of sge_shepherd itself but of one of the methods
       executed by sge_shepherd.  This exit status can have several meanings, depending on in  which  method  an
       error  occurred  (if any).  The possible methods are: prolog, parallel start, job, parallel stop, epilog,
       suspend, restart, terminate, clean, migrate, and checkpoint.

       The following exit values are returned:

       0      All methods: Operation was executed successfully.

       99     Job script, prolog and epilog: When  FORBID_RESCHEDULE  is  not  set  in  the  configuration  (see
              sge_conf(5)), the job gets re-queued.  Otherwise see "Other".

       100    Job  script,  prolog  and  epilog:  When  FORBID_APPERROR  is  not  set  in the configuration (see
              sge_conf(5)), the job gets re-queued.  Otherwise see "Other".

       Other  Job script: This is the exit status of the job itself. No action is taken upon  this  exit  status
              because the meaning of this exit status is not known.
              Prolog, epilog and parallel start: The queue is set to error state and the job is re-queued.
              Parallel  stop:  The queue is set to error state, but the job is not re-queued. It is assumed that
              the job itself ran successfully and only the clean up script failed.
              Suspend, restart, terminate, clean, and migrate: Always successful.
              Checkpoint: Success, except for kernel checkpointing:  checkpoint  was  not  successful,  did  not
              happen (but migration will happen by Sun Grid Engine).

RESTRICTIONS

       sge_shepherd should not be invoked manually, but only by sge_execd(8).

FILES

       sgepasswd   contains   a   list   of   user   names    and    their corresponding encrypted passwords. If
       available, the password  file  will  be   used   by  sge_shepherd. To change the contents  of  this  file
       please use the sgepasswd command. It is not advised to  change that file manually.
       <execd_spool>/job_dir/<job_id>     job specific directory

SEE ALSO

       sge_intro(1), sge_conf(5), sge_execd(8).

COPYRIGHT

       See sge_intro(1) for a full statement of rights and permissions.

SGE 6.2u5                                            $Date$                                      SGE_SHEPHERD(8)