Provided by: manpages-dev_4.15-1_all bug

NAME

       clone, __clone2 - create a child process

SYNOPSIS

       /* Prototype for the glibc wrapper function */

       #define _GNU_SOURCE
       #include <sched.h>

       int clone(int (*fn)(void *), void *child_stack,
                 int flags, void *arg, ...
                 /* pid_t *ptid, void *newtls, pid_t *ctid */ );

       /* For the prototype of the raw system call, see NOTES */

DESCRIPTION

       clone() creates a new process, in a manner similar to fork(2).

       This page describes both the glibc clone() wrapper function and the underlying system call
       on which it is based.  The main text describes the wrapper function; the  differences  for
       the raw system call are described toward the end of this page.

       Unlike  fork(2),  clone() allows the child process to share parts of its execution context
       with the  calling  process,  such  as  the  virtual  address  space,  the  table  of  file
       descriptors,  and  the table of signal handlers.  (Note that on this manual page, "calling
       process"  normally  corresponds  to  "parent  process".   But  see  the   description   of
       CLONE_PARENT below.)

       One  use  of  clone() is to implement threads: multiple flows of control in a program that
       run concurrently in a shared address space.

       When the child process is created with clone(), it  commences  execution  by  calling  the
       function  pointed  to  by  the  argument  fn.  (This differs from fork(2), where execution
       continues in the child from the point of the fork(2) call.)  The arg argument is passed as
       the argument of the function fn.

       When  the fn(arg) function returns, the child process terminates.  The integer returned by
       fn is the exit status for the  child  process.   The  child  process  may  also  terminate
       explicitly by calling exit(2) or after receiving a fatal signal.

       The  child_stack  argument  specifies the location of the stack used by the child process.
       Since the child and calling process may share memory, it is not  possible  for  the  child
       process  to  execute  in  the same stack as the calling process.  The calling process must
       therefore set up memory space for the child stack and pass a  pointer  to  this  space  to
       clone().   Stacks  grow  downward  on  all  processors  that  run  Linux (except the HP PA
       processors), so child_stack usually points to the topmost address of the memory space  set
       up for the child stack.

       The  low  byte  of  flags contains the number of the termination signal sent to the parent
       when the child dies.  If this signal is specified as anything other than SIGCHLD, then the
       parent process must specify the __WALL or __WCLONE options when waiting for the child with
       wait(2).  If no signal is specified, then the parent process  is  not  signaled  when  the
       child terminates.

       flags  may  also be bitwise-ORed with zero or more of the following constants, in order to
       specify what is shared between the calling process and the child process:

       CLONE_CHILD_CLEARTID (since Linux 2.5.49)
              Clear (zero) the child thread ID at the location ctid  in  child  memory  when  the
              child  exits,  and  do a wakeup on the futex at that address.  The address involved
              may be changed by the set_tid_address(2) system call.  This is  used  by  threading
              libraries.

       CLONE_CHILD_SETTID (since Linux 2.5.49)
              Store  the  child  thread ID at the location ctid in the child's memory.  The store
              operation completes before clone() returns control to user space.

       CLONE_FILES (since Linux 2.0)
              If CLONE_FILES is set, the calling process and the child  process  share  the  same
              file  descriptor  table.   Any file descriptor created by the calling process or by
              the child process is also valid in the other process.  Similarly,  if  one  of  the
              processes  closes  a  file  descriptor,  or changes its associated flags (using the
              fcntl(2) F_SETFD operation), the other process is  also  affected.   If  a  process
              sharing  a  file  descriptor  table  calls  execve(2), its file descriptor table is
              duplicated (unshared).

              If CLONE_FILES is  not  set,  the  child  process  inherits  a  copy  of  all  file
              descriptors  opened  in  the  calling  process  at the time of clone().  Subsequent
              operations that open or close file descriptors, or change  file  descriptor  flags,
              performed  by  either  the  calling  process or the child process do not affect the
              other process.  Note, however, that the duplicated file descriptors  in  the  child
              refer  to  the same open file descriptions as the corresponding file descriptors in
              the calling process, and thus  share  file  offsets  and  file  status  flags  (see
              open(2)).

       CLONE_FS (since Linux 2.0)
              If  CLONE_FS  is  set,  the  caller and the child process share the same filesystem
              information.  This includes  the  root  of  the  filesystem,  the  current  working
              directory,  and  the umask.  Any call to chroot(2), chdir(2), or umask(2) performed
              by the calling process or the child process also affects the other process.

              If CLONE_FS is not set, the child  process  works  on  a  copy  of  the  filesystem
              information  of  the  calling  process  at  the time of the clone() call.  Calls to
              chroot(2), chdir(2), or umask(2) performed later by one of  the  processes  do  not
              affect the other process.

       CLONE_IO (since Linux 2.6.25)
              If  CLONE_IO  is  set,  then the new process shares an I/O context with the calling
              process.  If this flag is not set, then (as with fork(2)) the new process  has  its
              own I/O context.

              The  I/O  context  is  the  I/O  scope  of  the  disk scheduler (i.e., what the I/O
              scheduler uses to model scheduling of a process's I/O).   If  processes  share  the
              same  I/O context, they are treated as one by the I/O scheduler.  As a consequence,
              they get to share disk time.  For some I/O schedulers, if two  processes  share  an
              I/O  context,  they  will  be  allowed to interleave their disk access.  If several
              threads are doing I/O on behalf of the same process  (aio_read(3),  for  instance),
              they should employ CLONE_IO to get better I/O performance.

              If the kernel is not configured with the CONFIG_BLOCK option, this flag is a no-op.

       CLONE_NEWCGROUP (since Linux 4.6)
              Create  the  process  in a new cgroup namespace.  If this flag is not set, then (as
              with fork(2)) the process is created in the same cgroup namespaces as  the  calling
              process.  This flag is intended for the implementation of containers.

              For further information on cgroup namespaces, see cgroup_namespaces(7).

              Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWCGROUP.

       CLONE_NEWIPC (since Linux 2.6.19)
              If  CLONE_NEWIPC  is  set, then create the process in a new IPC namespace.  If this
              flag is not set, then (as with fork(2)), the process is created  in  the  same  IPC
              namespace  as the calling process.  This flag is intended for the implementation of
              containers.

              An IPC namespace provides an isolated view of System V IPC objects  (see  svipc(7))
              and  (since  Linux  2.6.30)  POSIX message queues (see mq_overview(7)).  The common
              characteristic of these IPC mechanisms  is  that  IPC  objects  are  identified  by
              mechanisms other than filesystem pathnames.

              Objects  created  in  an  IPC namespace are visible to all other processes that are
              members of  that  namespace,  but  are  not  visible  to  processes  in  other  IPC
              namespaces.

              When an IPC namespace is destroyed (i.e., when the last process that is a member of
              the namespace terminates), all IPC  objects  in  the  namespace  are  automatically
              destroyed.

              Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWIPC.  This flag can't
              be specified in conjunction with CLONE_SYSVSEM.

              For further information on IPC namespaces, see namespaces(7).

       CLONE_NEWNET (since Linux 2.6.24)
              (The implementation of this  flag  was  completed  only  by  about  kernel  version
              2.6.29.)

              If  CLONE_NEWNET  is  set,  then create the process in a new network namespace.  If
              this flag is not set, then (as with fork(2)) the process is  created  in  the  same
              network  namespace  as  the  calling  process.   This  flag  is  intended  for  the
              implementation of containers.

              A network namespace provides an isolated view  of  the  networking  stack  (network
              device  interfaces,  IPv4  and  IPv6  protocol  stacks, IP routing tables, firewall
              rules, the  /proc/net  and  /sys/class/net  directory  trees,  sockets,  etc.).   A
              physical  network  device  can  live  in  exactly one network namespace.  A virtual
              network (veth(4)) device pair provides a pipe-like abstraction that can be used  to
              create  tunnels between network namespaces, and can be used to create a bridge to a
              physical network device in another namespace.

              When a network namespace is freed (i.e., when the last  process  in  the  namespace
              terminates),  its  physical  network  devices are moved back to the initial network
              namespace (not to the parent of the process).  For further information  on  network
              namespaces, see namespaces(7).

              Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWNET.

       CLONE_NEWNS (since Linux 2.4.19)
              If  CLONE_NEWNS  is  set,  the  cloned  child  is started in a new mount namespace,
              initialized with a copy of the namespace of the parent.  If CLONE_NEWNS is not set,
              the child lives in the same mount namespace as the parent.

              Only  a  privileged  process  (CAP_SYS_ADMIN)  can  employ  CLONE_NEWNS.  It is not
              permitted to specify both CLONE_NEWNS and CLONE_FS in the same clone() call.

              For   further   information   on   mount   namespaces,   see   namespaces(7)    and
              mount_namespaces(7).

       CLONE_NEWPID (since Linux 2.6.24)
              If  CLONE_NEWPID  is  set, then create the process in a new PID namespace.  If this
              flag is not set, then (as with fork(2)) the process is  created  in  the  same  PID
              namespace  as the calling process.  This flag is intended for the implementation of
              containers.

              For further information on PID namespaces, see namespaces(7) and pid_namespaces(7).

              Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWPID.  This flag can't
              be specified in conjunction with CLONE_THREAD or CLONE_PARENT.

       CLONE_NEWUSER
              (This flag first became meaningful for clone() in Linux 2.6.23, the current clone()
              semantics were merged in  Linux  3.5,  and  the  final  pieces  to  make  the  user
              namespaces completely usable were merged in Linux 3.8.)

              If  CLONE_NEWUSER is set, then create the process in a new user namespace.  If this
              flag is not set, then (as with fork(2)) the process is created  in  the  same  user
              namespace as the calling process.

              Before  Linux  3.8,  use  of  CLONE_NEWUSER  required  that  the  caller have three
              capabilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID.  Starting with Linux  3.8,
              no privileges are needed to create a user namespace.

              This flag can't be specified in conjunction with CLONE_THREAD or CLONE_PARENT.  For
              security reasons, CLONE_NEWUSER cannot be specified in conjunction with CLONE_FS.

              For   further   information   on   user   namespaces,   see    namespaces(7)    and
              user_namespaces(7).

       CLONE_NEWUTS (since Linux 2.6.19)
              If  CLONE_NEWUTS  is  set,  then  create  the process in a new UTS namespace, whose
              identifiers are initialized by duplicating the identifiers from the  UTS  namespace
              of  the  calling  process.   If  this  flag  is not set, then (as with fork(2)) the
              process is created in the same UTS namespace as the calling process.  This flag  is
              intended for the implementation of containers.

              A  UTS  namespace  is the set of identifiers returned by uname(2); among these, the
              domain  name  and  the  hostname  can   be   modified   by   setdomainname(2)   and
              sethostname(2),  respectively.   Changes made to the identifiers in a UTS namespace
              are visible to all other processes in the same namespace, but are  not  visible  to
              processes in other UTS namespaces.

              Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWUTS.

              For further information on UTS namespaces, see namespaces(7).

       CLONE_PARENT (since Linux 2.3.12)
              If  CLONE_PARENT  is  set,  then  the  parent  of  the  new  child  (as returned by
              getppid(2)) will be the same as that of the calling process.

              If CLONE_PARENT is not set, then (as  with  fork(2))  the  child's  parent  is  the
              calling process.

              Note  that  it  is the parent process, as returned by getppid(2), which is signaled
              when the child terminates, so that if CLONE_PARENT is set, then the parent  of  the
              calling process, rather than the calling process itself, will be signaled.

       CLONE_PARENT_SETTID (since Linux 2.5.49)
              Store  the  child thread ID at the location ptid in the parent's memory.  (In Linux
              2.5.32-2.5.48 there was a flag CLONE_SETTID that did this.)   The  store  operation
              completes before clone() returns control to user space.

       CLONE_PID (Linux 2.0 to 2.5.15)
              If  CLONE_PID  is set, the child process is created with the same process ID as the
              calling process.  This is good for hacking the system, but otherwise  of  not  much
              use.   From  Linux  2.3.21  onward, this flag could be specified only by the system
              boot process (PID 0).  The flag disappeared completely from the kernel  sources  in
              Linux  2.5.16.  Since then, the kernel silently ignores this bit if it is specified
              in flags.

       CLONE_PTRACE (since Linux 2.2)
              If CLONE_PTRACE is specified, and the calling process is being traced,  then  trace
              the child also (see ptrace(2)).

       CLONE_SETTLS (since Linux 2.5.32)
              The TLS (Thread Local Storage) descriptor is set to newtls.

              The  interpretation  of  newtls and the resulting effect is architecture dependent.
              On x86, newtls is interpreted as a struct user_desc * (see set_thread_area(2)).  On
              x86-64 it is the new value to be set for the %fs base register (see the ARCH_SET_FS
              argument to arch_prctl(2)).  On architectures with a dedicated TLS register, it  is
              the new value of that register.

       CLONE_SIGHAND (since Linux 2.0)
              If  CLONE_SIGHAND  is set, the calling process and the child process share the same
              table  of  signal  handlers.   If  the  calling  process  or  child  process  calls
              sigaction(2)  to  change  the  behavior  associated  with a signal, the behavior is
              changed in the other process as well.   However,  the  calling  process  and  child
              processes still have distinct signal masks and sets of pending signals.  So, one of
              them may block or unblock signals using sigprocmask(2) without affecting the  other
              process.

              If  CLONE_SIGHAND  is  not  set,  the  child  process inherits a copy of the signal
              handlers of  the  calling  process  at  the  time  clone()  is  called.   Calls  to
              sigaction(2)  performed  later  by one of the processes have no effect on the other
              process.

              Since Linux 2.6.0-test6, flags must  also  include  CLONE_VM  if  CLONE_SIGHAND  is
              specified

       CLONE_STOPPED (since Linux 2.6.0-test2)
              If CLONE_STOPPED is set, then the child is initially stopped (as though it was sent
              a SIGSTOP signal), and must be resumed by sending it a SIGCONT signal.

              This flag was deprecated from Linux 2.6.25 onward, and was  removed  altogether  in
              Linux  2.6.38.  Since then, the kernel silently ignores it without error.  Starting
              with Linux 4.6, the same bit was reused for the CLONE_NEWCGROUP flag.

       CLONE_SYSVSEM (since Linux 2.5.10)
              If CLONE_SYSVSEM is set, then the child and the calling process share a single list
              of System V semaphore adjustment (semadj) values (see semop(2)).  In this case, the
              shared list accumulates semadj values across all processes sharing  the  list,  and
              semaphore  adjustments are performed only when the last process that is sharing the
              list terminates (or ceases sharing the list using unshare(2)).  If this flag is not
              set, then the child has a separate semadj list that is initially empty.

       CLONE_THREAD (since Linux 2.4.0-test8)
              If CLONE_THREAD is set, the child is placed in the same thread group as the calling
              process.  To make the remainder of the discussion of  CLONE_THREAD  more  readable,
              the term "thread" is used to refer to the processes within a thread group.

              Thread groups were a feature added in Linux 2.4 to support the POSIX threads notion
              of a set of threads that share a single PID.  Internally, this shared  PID  is  the
              so-called  thread  group  identifier (TGID) for the thread group.  Since Linux 2.4,
              calls to getpid(2) return the TGID of the caller.

              The threads within a group can  be  distinguished  by  their  (system-wide)  unique
              thread  IDs (TID).  A new thread's TID is available as the function result returned
              to the caller of clone(), and a thread can obtain its own TID using gettid(2).

              When a call is made to clone() without specifying CLONE_THREAD, then the  resulting
              thread  is placed in a new thread group whose TGID is the same as the thread's TID.
              This thread is the leader of the new thread group.

              A new thread created with CLONE_THREAD has the same parent process as the caller of
              clone()  (i.e.,  like  CLONE_PARENT),  so  that calls to getppid(2) return the same
              value for all of the threads  in  a  thread  group.   When  a  CLONE_THREAD  thread
              terminates,  the  thread  that  created  it using clone() is not sent a SIGCHLD (or
              other termination) signal; nor can the status of such a thread  be  obtained  using
              wait(2).  (The thread is said to be detached.)

              After  all  of  the  threads  in a thread group terminate the parent process of the
              thread group is sent a SIGCHLD (or other termination) signal.

              If any of the threads in a thread group performs an  execve(2),  then  all  threads
              other  than the thread group leader are terminated, and the new program is executed
              in the thread group leader.

              If one of the threads in a thread group creates a child  using  fork(2),  then  any
              thread in the group can wait(2) for that child.

              Since  Linux  2.5.35,  flags  must  also  include  CLONE_SIGHAND if CLONE_THREAD is
              specified (and note that, since  Linux  2.6.0-test6,  CLONE_SIGHAND  also  requires
              CLONE_VM to be included).

              Signals  may  be sent to a thread group as a whole (i.e., a TGID) using kill(2), or
              to a specific thread (i.e., TID) using tgkill(2).

              Signal dispositions and  actions  are  process-wide:  if  an  unhandled  signal  is
              delivered  to  a thread, then it will affect (terminate, stop, continue, be ignored
              in) all members of the thread group.

              Each thread has its own signal mask, as set by sigprocmask(2), but signals  can  be
              pending  either:  for  the  whole  process  (i.e., deliverable to any member of the
              thread group), when sent with kill(2); or for an individual thread, when sent  with
              tgkill(2).   A  call to sigpending(2) returns a signal set that is the union of the
              signals pending for the whole process and the signals  that  are  pending  for  the
              calling thread.

              If  kill(2)  is  used  to send a signal to a thread group, and the thread group has
              installed a handler for the signal, then the handler will  be  invoked  in  exactly
              one,  arbitrarily  selected  member  of  the  thread group that has not blocked the
              signal.  If multiple threads in a group are waiting to accept the same signal using
              sigwaitinfo(2),  the kernel will arbitrarily select one of these threads to receive
              a signal sent using kill(2).

       CLONE_UNTRACED (since Linux 2.5.46)
              If CLONE_UNTRACED is specified, then a tracing process cannot force CLONE_PTRACE on
              this child process.

       CLONE_VFORK (since Linux 2.2)
              If  CLONE_VFORK is set, the execution of the calling process is suspended until the
              child releases its virtual memory resources via a call to execve(2) or _exit(2) (as
              with vfork(2)).

              If  CLONE_VFORK  is  not  set,  then  both  the  calling  process and the child are
              schedulable after the call,  and  an  application  should  not  rely  on  execution
              occurring in any particular order.

       CLONE_VM (since Linux 2.0)
              If  CLONE_VM  is  set,  the  calling  process and the child process run in the same
              memory space.  In particular, memory writes performed by the calling process or  by
              the  child  process  are  also  visible in the other process.  Moreover, any memory
              mapping or unmapping performed with mmap(2) or munmap(2) by the  child  or  calling
              process also affects the other process.

              If  CLONE_VM  is  not  set, the child process runs in a separate copy of the memory
              space of the calling process at  the  time  of  clone().   Memory  writes  or  file
              mappings/unmappings  performed  by one of the processes do not affect the other, as
              with fork(2).

NOTES

       Note that the glibc clone() wrapper function makes some changes in the memory  pointed  to
       by  child_stack  (changes  required  to  set  the stack up correctly for the child) before
       invoking the clone() system call.  So, in cases  where  clone()  is  used  to  recursively
       create children, do not use the buffer employed for the parent's stack as the stack of the
       child.

   C library/kernel differences
       The raw clone() system call corresponds more closely to fork(2) in that execution  in  the
       child  continues  from  the  point  of the call.  As such, the fn and arg arguments of the
       clone() wrapper function are omitted.

       Another difference for the raw clone() system call is that the child_stack argument may be
       zero,  in  which  case  the  child uses a duplicate of the parent's stack.  (Copy-on-write
       semantics ensure that the child gets separate copies of stack pages  when  either  process
       modifies  the stack.)  In this case, for correct operation, the CLONE_VM option should not
       be specified.  (If the child shares the parent's memory because of the use of the CLONE_VM
       flag, then no copy-on-write duplication occurs and chaos is likely to result.)

       The  order  of the arguments also differs in the raw system call, and there are variations
       in the arguments across architectures, as detailed in the following paragraphs.

       The raw system call interface on x86-64 and some other architectures (including sh,  tile,
       and alpha) is roughly:

           long clone(unsigned long flags, void *child_stack,
                      int *ptid, int *ctid,
                      unsigned long newtls);

       On  x86-32, and several other common architectures (including score, ARM, ARM 64, PA-RISC,
       arc, Power PC, xtensa, and MIPS), the order of the last two arguments is reversed:

           long clone(unsigned long flags, void *child_stack,
                     int *ptid, unsigned long newtls,
                     int *ctid);

       On the cris and s390 architectures, the order of the first two arguments is reversed:

           long clone(void *child_stack, unsigned long flags,
                      int *ptid, int *ctid,
                      unsigned long newtls);

       On the microblaze architecture, an additional argument is supplied:

           long clone(unsigned long flags, void *child_stack,
                      int stack_size,         /* Size of stack */
                      int *ptid, int *ctid,
                      unsigned long newtls);

   blackfin, m68k, and sparc
       The argument-passing conventions on blackfin, m68k,  and  sparc  are  different  from  the
       descriptions above.  For details, see the kernel (and glibc) source.

   ia64
       On ia64, a different interface is used:

       int __clone2(int (*fn)(void *),
                    void *child_stack_base, size_t stack_size,
                    int flags, void *arg, ...
                 /* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );

       The prototype shown above is for the glibc wrapper function; the raw system call interface
       has no fn or arg argument, and changes the order of the arguments so  that  flags  is  the
       first argument, and tls is the last argument.

       __clone2() operates in the same way as clone(), except that child_stack_base points to the
       lowest address of the child's stack area, and stack_size specifies the size of  the  stack
       pointed to by child_stack_base.

   Linux 2.4 and earlier
       In Linux 2.4 and earlier, clone() does not take arguments ptid, tls, and ctid.

RETURN VALUE

       On  success,  the  thread  ID  of  the child process is returned in the caller's thread of
       execution.  On failure, -1 is returned in the caller's context, no child process  will  be
       created, and errno will be set appropriately.

ERRORS

       EAGAIN Too many processes are already running; see fork(2).

       EINVAL CLONE_SIGHAND was specified, but CLONE_VM was not.  (Since Linux 2.6.0-test6.)

       EINVAL CLONE_THREAD was specified, but CLONE_SIGHAND was not.  (Since Linux 2.5.35.)

       EINVAL Both CLONE_FS and CLONE_NEWNS were specified in flags.

       EINVAL (since Linux 3.9)
              Both CLONE_NEWUSER and CLONE_FS were specified in flags.

       EINVAL Both CLONE_NEWIPC and CLONE_SYSVSEM were specified in flags.

       EINVAL One (or both) of CLONE_NEWPID or CLONE_NEWUSER and one (or both) of CLONE_THREAD or
              CLONE_PARENT were specified in flags.

       EINVAL Returned by the glibc clone() wrapper function when fn or child_stack is  specified
              as NULL.

       EINVAL CLONE_NEWIPC  was  specified  in  flags, but the kernel was not configured with the
              CONFIG_SYSVIPC and CONFIG_IPC_NS options.

       EINVAL CLONE_NEWNET was specified in flags, but the kernel was  not  configured  with  the
              CONFIG_NET_NS option.

       EINVAL CLONE_NEWPID  was  specified  in  flags, but the kernel was not configured with the
              CONFIG_PID_NS option.

       EINVAL CLONE_NEWUTS was specified in flags, but the kernel was  not  configured  with  the
              CONFIG_UTS option.

       EINVAL child_stack  is  not  aligned  to  a  suitable boundary for this architecture.  For
              example, on aarch64, child_stack must be a multiple of 16.

       ENOMEM Cannot allocate sufficient memory to allocate a task structure for the child, or to
              copy those parts of the caller's context that need to be copied.

       ENOSPC (since Linux 3.7)
              CLONE_NEWPID  was  specified  in  flags,  but the limit on the nesting depth of PID
              namespaces would have been exceeded; see pid_namespaces(7).

       ENOSPC (since Linux 4.9; beforehand EUSERS)
              CLONE_NEWUSER was specified in flags, and the call would cause  the  limit  on  the
              number of nested user namespaces to be exceeded.  See user_namespaces(7).

              From Linux 3.11 to Linux 4.8, the error diagnosed in this case was EUSERS.

       ENOSPC (since Linux 4.9)
              One  of  the  values  in  flags specified the creation of a new user namespace, but
              doing so would  have  caused  the  limit  defined  by  the  corresponding  file  in
              /proc/sys/user to be exceeded.  For further details, see namespaces(7).

       EPERM  CLONE_NEWCGROUP,   CLONE_NEWIPC,   CLONE_NEWNET,   CLONE_NEWNS,   CLONE_NEWPID,  or
              CLONE_NEWUTS  was  specified  by   an   unprivileged   process   (process   without
              CAP_SYS_ADMIN).

       EPERM  CLONE_PID was specified by a process other than process 0.  (This error occurs only
              on Linux 2.5.15 and earlier.)

       EPERM  CLONE_NEWUSER was specified in flags, but either  the  effective  user  ID  or  the
              effective  group  ID  of the caller does not have a mapping in the parent namespace
              (see user_namespaces(7)).

       EPERM (since Linux 3.9)
              CLONE_NEWUSER was specified in flags and the caller  is  in  a  chroot  environment
              (i.e.,  the  caller's root directory does not match the root directory of the mount
              namespace in which it resides).

       ERESTARTNOINTR (since Linux 2.6.17)
              System call was interrupted by a signal and will be restarted.  (This can  be  seen
              only during a trace.)

       EUSERS (Linux 3.11 to Linux 4.8)
              CLONE_NEWUSER  was  specified  in flags, and the limit on the number of nested user
              namespaces would be exceeded.  See the discussion of the ENOSPC error above.

CONFORMING TO

       clone() is Linux-specific and should not be used in programs intended to be portable.

NOTES

       The kcmp(2) system call can be used to test whether two processes share various  resources
       such  as a file descriptor table, System V semaphore undo operations, or a virtual address
       space.

       Handlers registered using pthread_atfork(3) are not executed during a call to clone().

       In the Linux 2.4.x series, CLONE_THREAD generally does not make  the  parent  of  the  new
       thread  the same as the parent of the calling process.  However, for kernel versions 2.4.7
       to 2.4.18 the CLONE_THREAD flag implied the CLONE_PARENT  flag  (as  in  Linux  2.6.0  and
       later).

       For  a  while  there was CLONE_DETACHED (introduced in 2.5.32): parent wants no child-exit
       signal.   In  Linux  2.6.2,  the  need  to  give  this  flag  together  with  CLONE_THREAD
       disappeared.  This flag is still defined, but has no effect.

       On i386, clone() should not be called through vsyscall, but directly through int $0x80.

BUGS

       GNU  C  library  versions  2.3.4 up to and including 2.24 contained a wrapper function for
       getpid(2) that performed caching of PIDs.  This caching relied on  support  in  the  glibc
       wrapper for clone(), but limitations in the implementation meant that the cache was not up
       to date in some circumstances.  In particular, if a signal  was  delivered  to  the  child
       immediately  after  the clone() call, then a call to getpid(2) in a handler for the signal
       could return the PID of the calling process ("the parent"), if the clone wrapper  had  not
       yet  had a chance to update the PID cache in the child.  (This discussion ignores the case
       where the child was created using CLONE_THREAD, when  getpid(2)  should  return  the  same
       value  in the child and in the process that called clone(), since the caller and the child
       are in the same thread group.  The stale-cache problem also does not occur  if  the  flags
       argument  includes  CLONE_VM.)   To  get the truth, it was sometimes necessary to use code
       such as the following:

           #include <syscall.h>

           pid_t mypid;

           mypid = syscall(SYS_getpid);

       Because of the stale-cache problem, as well as other problems noted in getpid(2), the  PID
       caching feature was removed in glibc 2.25.

EXAMPLE

       The  following  program  demonstrates  the  use  of clone() to create a child process that
       executes in a separate  UTS  namespace.   The  child  changes  the  hostname  in  its  UTS
       namespace.   Both parent and child then display the system hostname, making it possible to
       see that the hostname differs in the UTS namespaces of  the  parent  and  child.   For  an
       example of the use of this program, see setns(2).

   Program source
       #define _GNU_SOURCE
       #include <sys/wait.h>
       #include <sys/utsname.h>
       #include <sched.h>
       #include <string.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <unistd.h>

       #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                               } while (0)

       static int              /* Start function for cloned child */
       childFunc(void *arg)
       {
           struct utsname uts;

           /* Change hostname in UTS namespace of child */

           if (sethostname(arg, strlen(arg)) == -1)
               errExit("sethostname");

           /* Retrieve and display hostname */

           if (uname(&uts) == -1)
               errExit("uname");
           printf("uts.nodename in child:  %s\n", uts.nodename);

           /* Keep the namespace open for a while, by sleeping.
              This allows some experimentation--for example, another
              process might join the namespace. */

           sleep(200);

           return 0;           /* Child terminates now */
       }

       #define STACK_SIZE (1024 * 1024)    /* Stack size for cloned child */

       int
       main(int argc, char *argv[])
       {
           char *stack;                    /* Start of stack buffer */
           char *stackTop;                 /* End of stack buffer */
           pid_t pid;
           struct utsname uts;

           if (argc < 2) {
               fprintf(stderr, "Usage: %s <child-hostname>\n", argv[0]);
               exit(EXIT_SUCCESS);
           }

           /* Allocate stack for child */

           stack = malloc(STACK_SIZE);
           if (stack == NULL)
               errExit("malloc");
           stackTop = stack + STACK_SIZE;  /* Assume stack grows downward */

           /* Create child that has its own UTS namespace;
              child commences execution in childFunc() */

           pid = clone(childFunc, stackTop, CLONE_NEWUTS | SIGCHLD, argv[1]);
           if (pid == -1)
               errExit("clone");
           printf("clone() returned %ld\n", (long) pid);

           /* Parent falls through to here */

           sleep(1);           /* Give child time to change its hostname */

           /* Display hostname in parent's UTS namespace. This will be
              different from hostname in child's UTS namespace. */

           if (uname(&uts) == -1)
               errExit("uname");
           printf("uts.nodename in parent: %s\n", uts.nodename);

           if (waitpid(pid, NULL, 0) == -1)    /* Wait for child */
               errExit("waitpid");
           printf("child has terminated\n");

           exit(EXIT_SUCCESS);
       }

SEE ALSO

       fork(2),  futex(2), getpid(2), gettid(2), kcmp(2), set_thread_area(2), set_tid_address(2),
       setns(2), tkill(2), unshare(2), wait(2), capabilities(7), namespaces(7), pthreads(7)

COLOPHON

       This page is part of release 4.15 of the Linux man-pages project.  A  description  of  the
       project,  information  about  reporting  bugs, and the latest version of this page, can be
       found at https://www.kernel.org/doc/man-pages/.