Ubuntu Manpage: pid_namespaces - overview of Linux PID namespaces

NAME

       pid_namespaces - overview of Linux PID namespaces

DESCRIPTION

       For an overview of namespaces, see namespaces(7).

       PID  namespaces  isolate  the process ID number space, meaning that processes in different PID namespaces
       can  have  the  same  PID.   PID  namespaces  allow  containers  to   provide   functionality   such   as
       suspending/resuming the set of processes in the container and migrating the container to a new host while
       the processes inside the container maintain the same PIDs.

       PIDs  in  a  new  PID  namespace  start  at  1,  somewhat like a standalone system, and calls to fork(2),
       vfork(2), or clone(2) will produce processes with PIDs that are unique within the namespace.

       Use of PID namespaces requires a kernel that is configured with the CONFIG_PID_NS option.

   The namespace init process
       The first process created in a  new  namespace  (i.e.,  the  process  created  using  clone(2)  with  the
       CLONE_NEWPID  flag,  or  the  first  child  created  by  a  process  after a call to unshare(2) using the
       CLONE_NEWPID flag) has the PID 1, and is the "init" process for the namespace  (see  init(1)).   A  child
       process  that  is  orphaned  within  the namespace will be reparented to this process rather than init(1)
       (unless  one  of  the  ancestors  of  the  child  in  the  same  PID  namespace  employed  the   prctl(2)
       PR_SET_CHILD_SUBREAPER command to mark itself as the reaper of orphaned descendant processes).

       If  the  "init"  process of a PID namespace terminates, the kernel terminates all of the processes in the
       namespace via a SIGKILL signal.  This behavior reflects the fact that the "init" process is essential for
       the correct operation of a PID namespace.  In this case, a subsequent fork(2)  into  this  PID  namespace
       will  fail  with  the error ENOMEM; it is not possible to create a new processes in a PID namespace whose
       "init" process has terminated.  Such scenarios can occur when, for example, a process uses an  open  file
       descriptor  for  a /proc/[pid]/ns/pid file corresponding to a process that was in a namespace to setns(2)
       into that namespace after the "init" process has terminated.  Another possible scenario can occur after a
       call to unshare(2): if the first child subsequently created by  a  fork(2)  terminates,  then  subsequent
       calls to fork(2) will fail with ENOMEM.

       Only  signals  for  which  the  "init" process has established a signal handler can be sent to the "init"
       process by other members of the PID namespace.  This restriction applies even  to  privileged  processes,
       and prevents other members of the PID namespace from accidentally killing the "init" process.

       Likewise,  a  process  in  an  ancestor namespace can—subject to the usual permission checks described in
       kill(2)—send signals to the "init" process of a child PID  namespace  only  if  the  "init"  process  has
       established  a  handler  for  that  signal.  (Within the handler, the siginfo_t si_pid field described in
       sigaction(2) will be zero.)  SIGKILL or SIGSTOP are treated exceptionally:  these  signals  are  forcibly
       delivered when sent from an ancestor PID namespace.  Neither of these signals can be caught by the "init"
       process, and so will result in the usual actions associated with those signals (respectively, terminating
       and stopping the process).

       Starting  with  Linux  3.4,  the reboot(2) system call causes a signal to be sent to the namespace "init"
       process.  See reboot(2) for more details.

   Nesting PID namespaces
       PID namespaces can be nested: each PID namespace has a  parent,  except  for  the  initial  ("root")  PID
       namespace.   The parent of a PID namespace is the PID namespace of the process that created the namespace
       using clone(2) or unshare(2).  PID namespaces thus form a tree, with all  namespaces  ultimately  tracing
       their ancestry to the root namespace.

       A  process  is  visible  to  other  processes  in  its PID namespace, and to the processes in each direct
       ancestor PID namespace going back to the root PID namespace.  In this context, "visible" means  that  one
       process  can be the target of operations by another process using system calls that specify a process ID.
       Conversely, the processes in a child PID namespace can't see processes in the parent and further  removed
       ancestor  namespaces.   More  succinctly:  a  process  can see (e.g., send signals with kill(2), set nice
       values with setpriority(2), etc.) only processes contained in its own PID namespace and in descendants of
       that namespace.

       A process has one process ID in each of the layers of the PID namespace hierarchy in  which  is  visible,
       and  walking  back though each direct ancestor namespace through to the root PID namespace.  System calls
       that operate on process IDs always operate using the process ID that is visible in the PID  namespace  of
       the  caller.   A  call  to  getpid(2)  always  returns the PID associated with the namespace in which the
       process was created.

       Some processes in a PID namespace may have parents that are outside of the namespace.  For  example,  the
       parent  of  the initial process in the namespace (i.e., the init(1) process with PID 1) is necessarily in
       another namespace.  Likewise, the direct children of a process that uses setns(2) to cause  its  children
       to  join  a  PID  namespace  are  in  a  different  PID  namespace from the caller of setns(2).  Calls to
       getppid(2) for such processes return 0.

       While processes may freely descend into child PID namespaces (e.g., using  setns(2)  with  CLONE_NEWPID),
       they  may  not  move  in  the  other  direction.   That  is  to say, processes may not enter any ancestor
       namespaces (parent, grandparent, etc.).  Changing PID namespaces is a one way operation.

   setns(2) and unshare(2) semantics
       Calls to setns(2) that specify a  PID  namespace  file  descriptor  and  calls  to  unshare(2)  with  the
       CLONE_NEWPID  flag  cause  children  subsequently  created  by the caller to be placed in a different PID
       namespace from the caller.  These calls do not, however, change the PID namespace of the calling process,
       because doing so would change the caller's idea of its own PID (as reported  by  getpid()),  which  would
       break many applications and libraries.

       To put things another way: a process's PID namespace membership is determined when the process is created
       and  cannot be changed thereafter.  Among other things, this means that the parental relationship between
       processes mirrors the parental relationship between PID namespaces: the parent of a process is either  in
       the same namespace or resides in the immediate parent PID namespace.

   Compatibility of CLONE_NEWPID with other CLONE_* flags
       CLONE_NEWPID can't be combined with some other CLONE_* flags:

       *  CLONE_THREAD  requires being in the same PID namespace in order that the threads in a process can send
          signals to each other.  Similarly, it must be possible to see all of the threads of a processes in the
          proc(5) filesystem.

       *  CLONE_SIGHAND requires being in the same PID namespace;  otherwise  the  process  ID  of  the  process
          sending  a  signal could not be meaningfully encoded when a signal is sent (see the description of the
          siginfo_t type in sigaction(2)).  A signal queue shared by processes in multiple PID  namespaces  will
          defeat that.

       *  CLONE_VM  requires all of the threads to be in the same PID namespace, because, from the point of view
          of a core dump, if two processes share the same address space then they are threads and will  be  core
          dumped  together.   When a core dump is written, the PID of each thread is written into the core dump.
          Writing the process IDs could not meaningfully succeed if some of the process IDs were in a parent PID
          namespace.

       To summarize: there is a technical requirement for each of CLONE_THREAD, CLONE_SIGHAND, and  CLONE_VM  to
       share  a  PID  namespace.   (Note  furthermore  that  in  clone(2)  requires  CLONE_VM to be specified if
       CLONE_THREAD or CLONE_SIGHAND is specified.)  Thus, call sequences such as the following will fail  (with
       the error EINVAL):

           unshare(CLONE_NEWPID);
           clone(..., CLONE_VM, ...);    /* Fails */

           setns(fd, CLONE_NEWPID);
           clone(..., CLONE_VM, ...);    /* Fails */

           clone(..., CLONE_VM, ...);
           setns(fd, CLONE_NEWPID);      /* Fails */

           clone(..., CLONE_VM, ...);
           unshare(CLONE_NEWPID);        /* Fails */

   /proc and PID namespaces
       A  /proc  filesystem  shows (in the /proc/PID directories) only processes visible in the PID namespace of
       the process that performed the mount, even if the /proc filesystem is  viewed  from  processes  in  other
       namespaces.

       After  creating  a new PID namespace, it is useful for the child to change its root directory and mount a
       new procfs instance at /proc so that tools such as ps(1) work correctly.  If a  new  mount  namespace  is
       simultaneously  created by including CLONE_NEWNS in the flags argument of clone(2) or unshare(2), then it
       isn't necessary to change the root directory: a new procfs instance can be mounted directly over /proc.

       From a shell, the command to mount /proc is:

           $ mount -t proc proc /proc

       Calling readlink(2) on the path /proc/self yields the process ID of the caller in the  PID  namespace  of
       the  procfs  mount  (i.e., the PID namespace of the process that mounted the procfs).  This can be useful
       for introspection purposes, when a process wants to discover its PID in other namespaces.

   Miscellaneous
       When a process ID is passed over a UNIX domain socket to a process in a different PID namespace (see  the
       description  of  SCM_CREDENTIALS  in  unix(7)),  it is translated into the corresponding PID value in the
       receiving process's PID namespace.

CONFORMING TO

       Namespaces are a Linux-specific feature.

EXAMPLE

       See user_namespaces(7).

COLOPHON

       This  page  is  part  of  release  4.04  of  the  Linux man-pages project.  A description of the project,
       information  about  reporting  bugs,  and  the  latest  version  of  this   page,   can   be   found   at
       http://www.kernel.org/doc/man-pages/.

Linux                                              2015-01-10                                  PID_NAMESPACES(7)

NAME

DESCRIPTION

CONFORMING TO

EXAMPLE

SEE ALSO

COLOPHON