Provided by: manpages_4.04-2_all bug

NAME

       sched - overview of scheduling APIs

DESCRIPTION

   API summary
       The Linux scheduling APIs are as follows:

       sched_setscheduler(2)
              Set the scheduling policy and parameters of a specified thread.

       sched_getscheduler(2)
              Return the scheduling policy of a specified thread.

       sched_setparam(2)
              Set the scheduling parameters of a specified thread.

       sched_getparam(2)
              Fetch the scheduling parameters of a specified thread.

       sched_get_priority_max(2)
              Return the maximum priority available in a specified scheduling policy.

       sched_get_priority_min(2)
              Return the minimum priority available in a specified scheduling policy.

       sched_rr_get_interval(2)
              Fetch the quantum used for threads that are scheduled under the "round-robin" scheduling policy.

       sched_yield(2)
              Cause the caller to relinquish the CPU, so that some other thread be executed.

       sched_setaffinity(2)
              (Linux-specific) Set the CPU affinity of a specified thread.

       sched_getaffinity(2)
              (Linux-specific) Get the CPU affinity of a specified thread.

       sched_setattr(2)
              Set the scheduling policy and parameters of a specified thread.  This (Linux-specific) system call
              provides a superset of the functionality of sched_setscheduler(2) and sched_setparam(2).

       sched_getattr(2)
              Fetch  the  scheduling  policy and parameters of a specified thread.  This (Linux-specific) system
              call provides a superset of the functionality of sched_getscheduler(2) and sched_getparam(2).

   Scheduling policies
       The scheduler is the kernel component that decides which runnable thread will  be  executed  by  the  CPU
       next.   Each thread has an associated scheduling policy and a static scheduling priority, sched_priority.
       The scheduler makes its decisions based on knowledge of the scheduling policy and static priority of  all
       threads on the system.

       For threads scheduled under one of the normal scheduling policies (SCHED_OTHER, SCHED_IDLE, SCHED_BATCH),
       sched_priority is not used in scheduling decisions (it must be specified as 0).

       Processes  scheduled  under  one  of  the real-time policies (SCHED_FIFO, SCHED_RR) have a sched_priority
       value in the range 1 (low) to 99 (high).  (As the numbers imply, real-time  threads  always  have  higher
       priority  than  normal threads.)  Note well: POSIX.1 requires an implementation to support only a minimum
       32 distinct priority levels for the real-time policies,  and  some  systems  supply  just  this  minimum.
       Portable programs should use sched_get_priority_min(2) and sched_get_priority_max(2) to find the range of
       priorities supported for a particular policy.

       Conceptually,  the scheduler maintains a list of runnable threads for each possible sched_priority value.
       In order to determine which thread runs next, the scheduler looks for the nonempty list with the  highest
       static priority and selects the thread at the head of this list.

       A  thread's  scheduling  policy  determines where it will be inserted into the list of threads with equal
       static priority and how it will move inside this list.

       All scheduling is preemptive: if a thread with a  higher  static  priority  becomes  ready  to  run,  the
       currently  running  thread will be preempted and returned to the wait list for its static priority level.
       The scheduling policy determines the ordering only within the list of runnable threads with equal  static
       priority.

   SCHED_FIFO: First in-first out scheduling
       SCHED_FIFO  can  be  used  only  with static priorities higher than 0, which means that when a SCHED_FIFO
       threads becomes  runnable,  it  will  always  immediately  preempt  any  currently  running  SCHED_OTHER,
       SCHED_BATCH,  or  SCHED_IDLE  thread.   SCHED_FIFO is a simple scheduling algorithm without time slicing.
       For threads scheduled under the SCHED_FIFO policy, the following rules apply:

       *  A SCHED_FIFO thread that has been preempted by another thread of higher priority will stay at the head
          of the list for its priority and will resume execution as soon as all threads of higher  priority  are
          blocked again.

       *  When  a  SCHED_FIFO  thread  becomes  runnable,  it  will  be  inserted at the end of the list for its
          priority.

       *  A call to sched_setscheduler(2), sched_setparam(2), or sched_setattr(2) will put  the  SCHED_FIFO  (or
          SCHED_RR)  thread identified by pid at the start of the list if it was runnable.  As a consequence, it
          may preempt the currently running thread if it has the same priority.   (POSIX.1  specifies  that  the
          thread should go to the end of the list.)

       *  A thread calling sched_yield(2) will be put at the end of the list.

       No  other  events  will  move a thread scheduled under the SCHED_FIFO policy in the wait list of runnable
       threads with equal static priority.

       A SCHED_FIFO thread runs until either it is blocked by an I/O  request,  it  is  preempted  by  a  higher
       priority thread, or it calls sched_yield(2).

   SCHED_RR: Round-robin scheduling
       SCHED_RR  is  a simple enhancement of SCHED_FIFO.  Everything described above for SCHED_FIFO also applies
       to SCHED_RR, except that each thread is allowed to run only for a maximum time quantum.   If  a  SCHED_RR
       thread has been running for a time period equal to or longer than the time quantum, it will be put at the
       end  of the list for its priority.  A SCHED_RR thread that has been preempted by a higher priority thread
       and subsequently resumes execution as a running thread will complete the unexpired portion of its  round-
       robin time quantum.  The length of the time quantum can be retrieved using sched_rr_get_interval(2).

   SCHED_DEADLINE: Sporadic task model deadline scheduling
       Since  version  3.14,  Linux  provides  a  deadline  scheduling  policy (SCHED_DEADLINE).  This policy is
       currently implemented using GEDF (Global Earliest Deadline  First)  in  conjunction  with  CBS  (Constant
       Bandwidth  Server).   To  set  and  fetch  this policy and associated attributes, one must use the Linux-
       specific sched_setattr(2) and sched_getattr(2) system calls.

       A sporadic task is one that has a sequence of jobs, where each job is activated at most once per  period.
       Each  job  also has a relative deadline, before which it should finish execution, and a computation time,
       which is the CPU time necessary for executing the job.  The moment when a task wakes up because a new job
       has to be executed is called the arrival time (also referred to as the request  time  or  release  time).
       The  start time is the time at which a task starts its execution.  The absolute deadline is thus obtained
       by adding the relative deadline to the arrival time.

       The following diagram clarifies these terms:

           arrival/wakeup                    absolute deadline
                |    start time                    |
                |        |                         |
                v        v                         v
           -----x--------xooooooooooooooooo--------x--------x---
                         |<- comp. time ->|
                |<------- relative deadline ------>|
                |<-------------- period ------------------->|

       When setting a SCHED_DEADLINE  policy  for  a  thread  using  sched_setattr(2),  one  can  specify  three
       parameters:  Runtime,  Deadline,  and  Period.   These  parameters  do  not necessarily correspond to the
       aforementioned terms: usual practice is to set Runtime to something bigger than the  average  computation
       time  (or  worst-case  execution  time  for hard real-time tasks), Deadline to the relative deadline, and
       Period to the period of the task.  Thus, for SCHED_DEADLINE scheduling, we have:

           arrival/wakeup                    absolute deadline
                |    start time                    |
                |        |                         |
                v        v                         v
           -----x--------xooooooooooooooooo--------x--------x---
                         |<-- Runtime ------->|
                |<----------- Deadline ----------->|
                |<-------------- Period ------------------->|

       The  three  deadline-scheduling  parameters  correspond  to  the   sched_runtime,   sched_deadline,   and
       sched_period  fields  of  the sched_attr structure; see sched_setattr(2).  These fields express values in
       nanoseconds.  If sched_period is specified as 0, then it is made the same as sched_deadline.

       The kernel requires that:

           sched_runtime <= sched_deadline <= sched_period

       In addition, under the current implementation, all of the parameter values must be at least  1024  (i.e.,
       just over one microsecond, which is the resolution of the implementation), and less than 2^63.  If any of
       these checks fails, sched_setattr(2) fails with the error EINVAL.

       The  CBS  guarantees non-interference between tasks, by throttling threads that attempt to over-run their
       specified Runtime.

       To ensure  deadline  scheduling  guarantees,  the  kernel  must  prevent  situations  where  the  set  of
       SCHED_DEADLINE  threads  is  not  feasible  (schedulable)  within the given constraints.  The kernel thus
       performs an admittance test  when  setting  or  changing  SCHED_DEADLINE  policy  and  attributes.   This
       admission  test  calculates whether the change is feasible; if it is not, sched_setattr(2) fails with the
       error EBUSY.

       For example, it is required (but not necessarily sufficient) for the total utilization to be less than or
       equal to the total number of CPUs available, where, since each thread can maximally run for  Runtime  per
       Period, that thread's utilization is its Runtime divided by its Period.

       In  order  to fulfil the guarantees that are made when a thread is admitted to the SCHED_DEADLINE policy,
       SCHED_DEADLINE threads are the highest priority  (user  controllable)  threads  in  the  system;  if  any
       SCHED_DEADLINE thread is runnable, it will preempt any thread scheduled under one of the other policies.

       A  call to fork(2) by a thread scheduled under the SCHED_DEADLINE policy will fail with the error EAGAIN,
       unless the thread has its reset-on-fork flag set (see below).

       A SCHED_DEADLINE thread that calls sched_yield(2) will yield the current job and wait for a new period to
       begin.

   SCHED_OTHER: Default Linux time-sharing scheduling
       SCHED_OTHER can be used at only static priority  0.   SCHED_OTHER  is  the  standard  Linux  time-sharing
       scheduler  that  is  intended  for all threads that do not require the special real-time mechanisms.  The
       thread to run is chosen from the static priority 0 list based on a dynamic priority  that  is  determined
       only  inside this list.  The dynamic priority is based on the nice value (set by nice(2), setpriority(2),
       or sched_setattr(2)) and increased for each time quantum the thread is ready to run, but denied to run by
       the scheduler.  This ensures fair progress among all SCHED_OTHER threads.

   SCHED_BATCH: Scheduling batch processes
       (Since Linux 2.6.16.)  SCHED_BATCH can be used only at static priority 0.   This  policy  is  similar  to
       SCHED_OTHER  in that it schedules the thread according to its dynamic priority (based on the nice value).
       The difference is that this policy will cause the scheduler to always assume  that  the  thread  is  CPU-
       intensive.   Consequently,  the  scheduler  will  apply a small scheduling penalty with respect to wakeup
       behavior, so that this thread is mildly disfavored in scheduling decisions.

       This policy is useful for workloads that are noninteractive, but do not want to lower their  nice  value,
       and  for  workloads  that  want  a  deterministic  scheduling  policy without interactivity causing extra
       preemptions (between the workload's tasks).

   SCHED_IDLE: Scheduling very low priority jobs
       (Since Linux 2.6.23.)  SCHED_IDLE can be used only at static priority 0; the process nice  value  has  no
       influence for this policy.

       This policy is intended for running jobs at extremely low priority (lower even than a +19 nice value with
       the SCHED_OTHER or SCHED_BATCH policies).

   Resetting scheduling policy for child processes
       Each  thread  has a reset-on-fork scheduling flag.  When this flag is set, children created by fork(2) do
       not inherit privileged scheduling policies.  The reset-on-fork flag can be set by either:

       *  ORing the SCHED_RESET_ON_FORK flag into the policy argument when calling sched_setscheduler(2)  (since
          Linux 2.6.32); or

       *  specifying the SCHED_FLAG_RESET_ON_FORK flag in attr.sched_flags when calling sched_setattr(2).

       Note  that  the  constants used with these two APIs have different names.  The state of the reset-on-fork
       flag can analogously be retrieved using sched_getscheduler(2) and sched_getattr(2).

       The reset-on-fork feature is intended for  media-playback  applications,  and  can  be  used  to  prevent
       applications  evading  the  RLIMIT_RTTIME  resource  limit  (see getrlimit(2)) by creating multiple child
       processes.

       More precisely, if the reset-on-fork flag is set, the following  rules  apply  for  subsequently  created
       children:

       *  If  the  calling  thread  has  a  scheduling  policy of SCHED_FIFO or SCHED_RR, the policy is reset to
          SCHED_OTHER in child processes.

       *  If the calling process has a negative nice value, the nice value is reset to zero in child processes.

       After the reset-on-fork flag has been enabled, it can be reset only if the thread  has  the  CAP_SYS_NICE
       capability.  This flag is disabled in child processes created by fork(2).

   Privileges and resource limits
       In  Linux kernels before 2.6.12, only privileged (CAP_SYS_NICE) threads can set a nonzero static priority
       (i.e., set a real-time scheduling policy).  The only change that an unprivileged thread can  make  is  to
       set  the SCHED_OTHER policy, and this can be done only if the effective user ID of the caller matches the
       real or effective user ID of the target thread (i.e., the thread specified by pid) whose policy is  being
       changed.

       A thread must be privileged (CAP_SYS_NICE) in order to set or modify a SCHED_DEADLINE policy.

       Since Linux 2.6.12, the RLIMIT_RTPRIO resource limit defines a ceiling on an unprivileged thread's static
       priority for the SCHED_RR and SCHED_FIFO policies.  The rules for changing scheduling policy and priority
       are as follows:

       *  If  an  unprivileged  thread has a nonzero RLIMIT_RTPRIO soft limit, then it can change its scheduling
          policy and priority, subject to the restriction that the priority cannot be set to a value higher than
          the maximum of its current priority and its RLIMIT_RTPRIO soft limit.

       *  If the RLIMIT_RTPRIO soft limit is 0, then the only permitted changes are to lower the priority, or to
          switch to a non-real-time policy.

       *  Subject to the same rules, another unprivileged thread can also make these changes,  as  long  as  the
          effective  user ID of the thread making the change matches the real or effective user ID of the target
          thread.

       *  Special rules apply for the SCHED_IDLE policy.  In Linux kernels before 2.6.39, an unprivileged thread
          operating under this policy cannot change its policy, regardless of the  value  of  its  RLIMIT_RTPRIO
          resource  limit.   In  Linux  kernels  since  2.6.39,  an unprivileged thread can switch to either the
          SCHED_BATCH or the SCHED_OTHER policy so long as its nice value falls within the  range  permitted  by
          its RLIMIT_NICE resource limit (see getrlimit(2)).

       Privileged  (CAP_SYS_NICE)  threads  ignore the RLIMIT_RTPRIO limit; as with older kernels, they can make
       arbitrary changes to scheduling policy  and  priority.   See  getrlimit(2)  for  further  information  on
       RLIMIT_RTPRIO.

   Limiting the CPU usage of real-time and deadline processes
       A  nonblocking  infinite  loop  in  a  thread scheduled under the SCHED_FIFO, SCHED_RR, or SCHED_DEADLINE
       policy will block all threads with lower priority forever.  Prior  to  Linux  2.6.25,  the  only  way  of
       preventing  a  runaway  real-time  process  from  freezing the system was to run (at the console) a shell
       scheduled under a higher static priority than the tested application.  This allows an emergency  kill  of
       tested real-time applications that do not block or terminate as expected.

       Since Linux 2.6.25, there are other techniques for dealing with runaway real-time and deadline processes.
       One of these is to use the RLIMIT_RTTIME resource limit to set a ceiling on the CPU time that a real-time
       process may consume.  See getrlimit(2) for details.

       Since version 2.6.25, Linux also provides two /proc files that can be used to reserve a certain amount of
       CPU  time to be used by non-real-time processes.  Reserving some CPU time in this fashion allows some CPU
       time to be allocated to (say) a root shell that can be used to kill a runaway  process.   Both  of  these
       files specify time values in microseconds:

       /proc/sys/kernel/sched_rt_period_us
              This  file  specifies  a scheduling period that is equivalent to 100% CPU bandwidth.  The value in
              this file can range from 1 to INT_MAX, giving an operating range of 1  microsecond  to  around  35
              minutes.  The default value in this file is 1,000,000 (1 second).

       /proc/sys/kernel/sched_rt_runtime_us
              The  value  in  this file specifies how much of the "period" time can be used by all real-time and
              deadline scheduled processes on the system.   The  value  in  this  file  can  range  from  -1  to
              INT_MAX-1.   Specifying  -1  makes the runtime the same as the period; that is, no CPU time is set
              aside for non-real-time processes (which was  the  Linux  behavior  before  kernel  2.6.25).   The
              default  value in this file is 950,000 (0.95 seconds), meaning that 5% of the CPU time is reserved
              for processes that don't run under a real-time or deadline scheduling policy.

   Response time
       A blocked high priority thread waiting for I/O has a certain response time before it is scheduled  again.
       The  device  driver  writer  can  greatly reduce this response time by using a "slow interrupt" interrupt
       handler.

   Miscellaneous
       Child processes inherit the scheduling policy and parameters across a fork(2).  The scheduling policy and
       parameters are preserved across execve(2).

       Memory locking is usually needed for real-time processes to avoid paging delays; this can  be  done  with
       mlock(2) or mlockall(2).

NOTES

       Originally,  Standard  Linux  was  intended  as  a  general-purpose operating system being able to handle
       background processes, interactive applications, and less demanding real-time  applications  (applications
       that need to usually meet timing deadlines).  Although the Linux kernel 2.6 allowed for kernel preemption
       and  the  newly  introduced  O(1)  scheduler  ensures  that  the  time  needed  to  schedule is fixed and
       deterministic irrespective of the number of active tasks, true real-time computing was not possible up to
       kernel version 2.6.17.

   Real-time features in the mainline Linux kernel
       From kernel version  2.6.18  onward,  however,  Linux  is  gradually  becoming  equipped  with  real-time
       capabilities,  most  of  which  are  derived  from  the former realtime-preempt patches developed by Ingo
       Molnar, Thomas Gleixner, Steven Rostedt, and others.  Until the patches have been completely merged  into
       the mainline kernel, they must be installed to achieve the best real-time performance.  These patches are
       named:

           patch-kernelversion-rtpatchversion

       and can be downloaded from http://www.kernel.org/pub/linux/kernel/projects/rt/.

       Without  the patches and prior to their full inclusion into the mainline kernel, the kernel configuration
       offers  only  the   three   preemption   classes   CONFIG_PREEMPT_NONE,   CONFIG_PREEMPT_VOLUNTARY,   and
       CONFIG_PREEMPT_DESKTOP  which respectively provide no, some, and considerable reduction of the worst-case
       scheduling latency.

       With the patches applied or  after  their  full  inclusion  into  the  mainline  kernel,  the  additional
       configuration item CONFIG_PREEMPT_RT becomes available.  If this is selected, Linux is transformed into a
       regular  real-time  operating  system.  The FIFO and RR scheduling policies are then used to run a thread
       with true real-time priority and a minimum worst-case scheduling latency.

SEE ALSO

       chrt(1), taskset(1), getpriority(2), mlock(2), mlockall(2), munlock(2), munlockall(2), nice(2),
       sched_get_priority_max(2), sched_get_priority_min(2), sched_getscheduler(2), sched_getaffinity(2),
       sched_getparam(2), sched_rr_get_interval(2), sched_setaffinity(2), sched_setscheduler(2),
       sched_setparam(2), sched_yield(2), setpriority(2), pthread_getaffinity_np(3), pthread_setaffinity_np(3),
       sched_getcpu(3), capabilities(7), cpuset(7)

       Programming for the real world - POSIX.4 by Bill  O.  Gallmeister,  O'Reilly  &  Associates,  Inc.,  ISBN
       1-56592-074-0.

       The  Linux kernel source files Documentation/scheduler/sched-deadline.txt, Documentation/scheduler/sched-
       rt-group.txt,   Documentation/scheduler/sched-design-CFS.txt,   and   Documentation/scheduler/sched-nice-
       design.txt

COLOPHON

       This  page  is  part  of  release  4.04  of  the  Linux man-pages project.  A description of the project,
       information  about  reporting  bugs,  and  the  latest  version  of  this   page,   can   be   found   at
       http://www.kernel.org/doc/man-pages/.

Linux                                              2015-07-23                                           SCHED(7)