Ubuntu Manpage: pid_namespaces - overview of Linux PID namespaces

NAME

       pid_namespaces - overview of Linux PID namespaces

DESCRIPTION

       For an overview of namespaces, see namespaces(7).

       PID  namespaces  isolate  the process ID number space, meaning that processes in different PID namespaces
       can  have  the  same  PID.   PID  namespaces  allow  containers  to   provide   functionality   such   as
       suspending/resuming the set of processes in the container and migrating the container to a new host while
       the processes inside the container maintain the same PIDs.

       PIDs  in  a  new  PID  namespace  start  at  1,  somewhat like a standalone system, and calls to fork(2),
       vfork(2), or clone(2) will produce processes with PIDs that are unique within the namespace.

       Use of PID namespaces requires a kernel that is configured with the CONFIG_PID_NS option.

   The namespace init process
       The first process created in a  new  namespace  (i.e.,  the  process  created  using  clone(2)  with  the
       CLONE_NEWPID  flag,  or  the  first  child  created  by  a  process  after a call to unshare(2) using the
       CLONE_NEWPID flag) has the PID 1, and is the "init" process for the namespace  (see  init(1)).   A  child
       process  that  is  orphaned  within  the namespace will be reparented to this process rather than init(1)
       (unless  one  of  the  ancestors  of  the  child  in  the  same  PID  namespace  employed  the   prctl(2)
       PR_SET_CHILD_SUBREAPER command to mark itself as the reaper of orphaned descendant processes).

       If  the  "init"  process of a PID namespace terminates, the kernel terminates all of the processes in the
       namespace via a SIGKILL signal.  This behavior reflects the fact that the "init" process is essential for
       the correct operation of a PID namespace.  In this case, a subsequent fork(2)  into  this  PID  namespace
       fail  with the error ENOMEM; it is not possible to create a new processes in a PID namespace whose "init"
       process has terminated.  Such scenarios can occur  when,  for  example,  a  process  uses  an  open  file
       descriptor  for  a /proc/[pid]/ns/pid file corresponding to a process that was in a namespace to setns(2)
       into that namespace after the "init" process has terminated.  Another possible scenario can occur after a
       call to unshare(2): if the first child subsequently created by  a  fork(2)  terminates,  then  subsequent
       calls to fork(2) fail with ENOMEM.

       Only  signals  for  which  the  "init" process has established a signal handler can be sent to the "init"
       process by other members of the PID namespace.  This restriction applies even  to  privileged  processes,
       and prevents other members of the PID namespace from accidentally killing the "init" process.

       Likewise,  a  process  in  an  ancestor namespace can—subject to the usual permission checks described in
       kill(2)—send signals to the "init" process of a child PID  namespace  only  if  the  "init"  process  has
       established  a  handler  for  that  signal.  (Within the handler, the siginfo_t si_pid field described in
       sigaction(2) will be zero.)  SIGKILL or SIGSTOP are treated exceptionally:  these  signals  are  forcibly
       delivered when sent from an ancestor PID namespace.  Neither of these signals can be caught by the "init"
       process, and so will result in the usual actions associated with those signals (respectively, terminating
       and stopping the process).

       Starting  with  Linux  3.4,  the reboot(2) system call causes a signal to be sent to the namespace "init"
       process.  See reboot(2) for more details.

   Nesting PID namespaces
       PID namespaces can be nested: each PID namespace has a  parent,  except  for  the  initial  ("root")  PID
       namespace.   The parent of a PID namespace is the PID namespace of the process that created the namespace
       using clone(2) or unshare(2).  PID namespaces thus form a tree, with all  namespaces  ultimately  tracing
       their  ancestry  to the root namespace.  Since Linux 3.7, the kernel limits the maximum nesting depth for
       PID namespaces to 32.

       A process is visible to other processes in its PID  namespace,  and  to  the  processes  in  each  direct
       ancestor  PID  namespace going back to the root PID namespace.  In this context, "visible" means that one
       process can be the target of operations by another process using system calls that specify a process  ID.
       Conversely,  the processes in a child PID namespace can't see processes in the parent and further removed
       ancestor namespaces.  More succinctly: a process can see (e.g.,  send  signals  with  kill(2),  set  nice
       values with setpriority(2), etc.) only processes contained in its own PID namespace and in descendants of
       that namespace.

       A  process  has  one process ID in each of the layers of the PID namespace hierarchy in which is visible,
       and walking back though each direct ancestor namespace through to the root PID namespace.   System  calls
       that  operate  on process IDs always operate using the process ID that is visible in the PID namespace of
       the caller.  A call to getpid(2) always returns the PID  associated  with  the  namespace  in  which  the
       process was created.

       Some  processes  in a PID namespace may have parents that are outside of the namespace.  For example, the
       parent of the initial process in the namespace (i.e., the init(1) process with PID 1) is  necessarily  in
       another  namespace.   Likewise, the direct children of a process that uses setns(2) to cause its children
       to join a PID namespace are in a  different  PID  namespace  from  the  caller  of  setns(2).   Calls  to
       getppid(2) for such processes return 0.

       While  processes  may freely descend into child PID namespaces (e.g., using setns(2) with a PID namespace
       file descriptor), they may not move in the other direction.  That is to say, processes may not enter  any
       ancestor namespaces (parent, grandparent, etc.).  Changing PID namespaces is a one-way operation.

       The  NS_GET_PARENT  ioctl(2)  operation  can  be  used  to discover the parental relationship between PID
       namespaces; see ioctl_ns(2).

   setns(2) and unshare(2) semantics
       Calls to setns(2) that specify a  PID  namespace  file  descriptor  and  calls  to  unshare(2)  with  the
       CLONE_NEWPID  flag  cause  children  subsequently  created  by the caller to be placed in a different PID
       namespace  from  the  caller.    (Since   Linux   4.12,   that   PID   namespace   is   shown   via   the
       /proc/[pid]/ns/pid_for_children  file,  as  described  in  namespaces(7).)   These calls do not, however,
       change the PID namespace of the calling process, because doing so would change the caller's idea  of  its
       own PID (as reported by getpid()), which would break many applications and libraries.

       To put things another way: a process's PID namespace membership is determined when the process is created
       and  cannot be changed thereafter.  Among other things, this means that the parental relationship between
       processes mirrors the parental relationship between PID namespaces: the parent of a process is either  in
       the same namespace or resides in the immediate parent PID namespace.

   Compatibility of CLONE_NEWPID with other CLONE_* flags
       In  current versions of Linux, CLONE_NEWPID can't be combined with CLONE_THREAD.  Threads are required to
       be in the same PID namespace such that the  threads  in  a  process  can  send  signals  to  each  other.
       Similarly,  it  must  be  possible  to  see  all of the threads of a processes in the proc(5) filesystem.
       Additionally, if two threads were in different PID namespaces, the process ID of the  process  sending  a
       signal could not be meaningfully encoded when a signal is sent (see the description of the siginfo_t type
       in  sigaction(2)).   Since this is computed when a signal is enqueued, a signal queue shared by processes
       in multiple PID namespaces would defeat that.

       In earlier versions of Linux, CLONE_NEWPID was additionally disallowed (failing with the error EINVAL) in
       combination with CLONE_SIGHAND (before Linux 4.3) as well as CLONE_VM (before Linux 3.12).   The  changes
       that lifted these restrictions have also been ported to earlier stable kernels.

   /proc and PID namespaces
       A  /proc filesystem shows (in the /proc/[pid] directories) only processes visible in the PID namespace of
       the process that performed the mount, even if the /proc filesystem is  viewed  from  processes  in  other
       namespaces.

       After  creating  a new PID namespace, it is useful for the child to change its root directory and mount a
       new procfs instance at /proc so that tools such as ps(1) work correctly.  If a  new  mount  namespace  is
       simultaneously  created by including CLONE_NEWNS in the flags argument of clone(2) or unshare(2), then it
       isn't necessary to change the root directory: a new procfs instance can be mounted directly over /proc.

       From a shell, the command to mount /proc is:

           $ mount -t proc proc /proc

       Calling readlink(2) on the path /proc/self yields the process ID of the caller in the  PID  namespace  of
       the  procfs  mount  (i.e., the PID namespace of the process that mounted the procfs).  This can be useful
       for introspection purposes, when a process wants to discover its PID in other namespaces.

   /proc files
       /proc/sys/kernel/ns_last_pid (since Linux 3.3)
              This file displays the last PID that was allocated in this PID namespace.  When the  next  PID  is
              allocated,  the kernel will search for the lowest unallocated PID that is greater than this value,
              and when this file is subsequently read it will show that PID.

              This file is writable by  a  process  that  has  the  CAP_SYS_ADMIN  capability  inside  its  user
              namespace.  This makes it possible to determine the PID that is allocated to the next process that
              is created inside this PID namespace.

   Miscellaneous
       When  a process ID is passed over a UNIX domain socket to a process in a different PID namespace (see the
       description of SCM_CREDENTIALS in unix(7)), it is translated into the  corresponding  PID  value  in  the
       receiving process's PID namespace.

CONFORMING TO

       Namespaces are a Linux-specific feature.

EXAMPLE

       See user_namespaces(7).

COLOPHON

       This  page  is  part  of  release  4.15  of  the  Linux man-pages project.  A description of the project,
       information  about  reporting  bugs,  and  the  latest  version  of  this   page,   can   be   found   at
       https://www.kernel.org/doc/man-pages/.

Linux                                              2017-11-26                                  PID_NAMESPACES(7)

NAME

DESCRIPTION

CONFORMING TO

EXAMPLE

SEE ALSO

COLOPHON