Provided by: gridengine-exec_6.2u5-7.3_amd64 bug

NAME

       sge_shepherd - Sun Grid Engine single job controlling agent

SYNOPSIS

       sge_shepherd

DESCRIPTION

       sge_shepherd  provides  the parent process functionality for a single Sun Grid Engine job.
       The parent  functionality  is  necessary  on  UNIX  systems  to  retrieve  resource  usage
       information  (see  getrusage(2))  after  a job has finished. In addition, the sge_shepherd
       forwards signals to the job, such as the signals for suspension, enabling, termination and
       the Sun Grid Engine checkpointing signal (see sge_ckpt(1) for details).

       The  sge_shepherd  receives information about the job to be started from the sge_execd(8).
       During the execution of the job it actually starts up to 5 child processes. First a prolog
       script  is  run  if  this  feature  is  enabled  by  the  prolog  parameter in the cluster
       configuration. (See sge_conf(5).)  Next a parallel environment startup procedure is run if
       the  job  is  a  parallel  job. (See sge_pe(5) for more information.)  After that, the job
       itself is run, followed by a parallel environment shutdown procedure  for  parallel  jobs,
       and  finally  an  epilog  script  if  requested  by  the  epilog  parameter in the cluster
       configuration. The prolog and epilog scripts as well as the parallel  environment  startup
       and  shutdown  procedures  are to be provided by the Sun Grid Engine administrator and are
       intended for site-specific actions to be taken before and after execution  of  the  actual
       user job.

       After  the  job  has  finished  and the epilog script is processed, sge_shepherd retrieves
       resource usage statistics about the job, places them in a job specific subdirectory of the
       sge_execd(8) spool directory for reporting through sge_execd(8) and finishes.

       sge_shepherd  also places an exit status file in the spool directory. This exit status can
       be viewed with qacct -j JobId (see qacct(1)); it is not the exit  status  of  sge_shepherd
       itself  but  of  one  of  the methods executed by sge_shepherd.  This exit status can have
       several meanings, depending on in which method an error occurred (if any).   The  possible
       methods  are:  prolog,  parallel  start,  job,  parallel  stop,  epilog, suspend, restart,
       terminate, clean, migrate, and checkpoint.

       The following exit values are returned:

       0      All methods: Operation was executed successfully.

       99     Job  script,  prolog  and  epilog:  When  FORBID_RESCHEDULE  is  not  set  in   the
              configuration (see sge_conf(5)), the job gets re-queued.  Otherwise see "Other".

       100    Job script, prolog and epilog: When FORBID_APPERROR is not set in the configuration
              (see sge_conf(5)), the job gets re-queued.  Otherwise see "Other".

       Other  Job script: This is the exit status of the job itself. No action is taken upon this
              exit status because the meaning of this exit status is not known.
              Prolog,  epilog  and parallel start: The queue is set to error state and the job is
              re-queued.
              Parallel stop: The queue is set to error state, but the job is not re-queued. It is
              assumed that the job itself ran successfully and only the clean up script failed.
              Suspend, restart, terminate, clean, and migrate: Always successful.
              Checkpoint:   Success,   except   for  kernel  checkpointing:  checkpoint  was  not
              successful, did not happen (but migration will happen by Sun Grid Engine).

RESTRICTIONS

       sge_shepherd should not be invoked manually, but only by sge_execd(8).

FILES

       sgepasswd  contains  a  list   of   user   names    and    their  corresponding  encrypted
       passwords. If available, the password  file  will  be   used   by  sge_shepherd. To change
       the contents of this file please use the sgepasswd command. It is not advised  to   change
       that file manually.
       <execd_spool>/job_dir/<job_id>     job specific directory

SEE ALSO

       sge_intro(1), sge_conf(5), sge_execd(8).

COPYRIGHT

       See sge_intro(1) for a full statement of rights and permissions.