Ubuntu Manpage: mount_namespaces - overview of Linux mount namespaces

NAME

       mount_namespaces - overview of Linux mount namespaces

DESCRIPTION

       For an overview of namespaces, see namespaces(7).

       Mount  namespaces  provide  isolation  of  the  list  of  mounts  seen by the processes in each namespace
       instance.  Thus, the processes in each of  the  mount  namespace  instances  will  see  distinct  single-
       directory hierarchies.

       The  views  provided  by  the  /proc/pid/mounts, /proc/pid/mountinfo, and /proc/pid/mountstats files (all
       described in proc(5)) correspond to the mount namespace in which the process with the  PID  pid  resides.
       (All of the processes that reside in the same mount namespace will see the same view in these files.)

       A  new  mount namespace is created using either clone(2) or unshare(2) with the CLONE_NEWNS flag.  When a
       new mount namespace is created, its mount list is initialized as follows:

       •  If the namespace is created using clone(2), the mount list of the child's namespace is a copy  of  the
          mount list in the parent process's mount namespace.

       •  If  the  namespace  is  created using unshare(2), the mount list of the new namespace is a copy of the
          mount list in the caller's previous mount namespace.

       Subsequent modifications to the mount list (mount(2) and umount(2)) in either mount  namespace  will  not
       (by  default)  affect  the  mount  list  seen in the other namespace (but see the following discussion of
       shared subtrees).

SHARED SUBTREES

       After the implementation of mount namespaces was completed, experience showed  that  the  isolation  that
       they  provided  was, in some cases, too great.  For example, in order to make a newly loaded optical disk
       available in all mount namespaces, a mount operation was required in each namespace.  For this use  case,
       and  others,  the  shared  subtree  feature  was  introduced  in  Linux  2.6.15.  This feature allows for
       automatic, controlled  propagation  of  mount(2)  and  umount(2)  events  between  namespaces  (or,  more
       precisely,  between  the  mounts  that  are  members  of  a peer group that are propagating events to one
       another).

       Each mount is marked (via mount(2)) as having one of the following propagation types:

       MS_SHARED
              This mount shares events with members of a peer group.  mount(2) and umount(2) events  immediately
              under  this  mount  will  propagate  to  the  other  mounts  that  are  members of the peer group.
              Propagation here means that the same mount(2) or umount(2) will automatically occur under  all  of
              the  other  mounts  in  the peer group.  Conversely, mount(2) and umount(2) events that take place
              under peer mounts will propagate to this mount.

       MS_PRIVATE
              This mount is private; it does not have a peer  group.   mount(2)  and  umount(2)  events  do  not
              propagate into or out of this mount.

       MS_SLAVE
              mount(2)  and  umount(2)  events  propagate  into  this  mount  from a (master) shared peer group.
              mount(2) and umount(2) events under this mount do not propagate to any peer.

              Note that a mount can be the slave of another peer group while at the same time  sharing  mount(2)
              and  umount(2)  events with a peer group of which it is a member.  (More precisely, one peer group
              can be the slave of another peer group.)

       MS_UNBINDABLE
              This is like a private mount, and in addition this mount can't be bind mounted.  Attempts to  bind
              mount this mount (mount(2) with the MS_BIND flag) will fail.

              When  a  recursive  bind  mount  (mount(2)  with  the  MS_BIND and MS_REC flags) is performed on a
              directory subtree, any bind  mounts  within  the  subtree  are  automatically  pruned  (i.e.,  not
              replicated) when replicating that subtree to produce the target subtree.

       For a discussion of the propagation type assigned to a new mount, see NOTES.

       The  propagation type is a per-mount-point setting; some mounts may be marked as shared (with each shared
       mount being a member of a distinct peer group), while others are private (or slaved or unbindable).

       Note that a mount's propagation type determines whether mount(2)  and  umount(2)  of  mounts  immediately
       under  the  mount  are  propagated.  Thus, the propagation type does not affect propagation of events for
       grandchildren and further removed descendant mounts.  What happens if the mount itself  is  unmounted  is
       determined by the propagation type that is in effect for the parent of the mount.

       Members are added to a peer group when a mount is marked as shared and either:

       (a)  the mount is replicated during the creation of a new mount namespace; or

       (b)  a new bind mount is created from the mount.

       In both of these cases, the new mount joins the peer group of which the existing mount is a member.

       A  new peer group is also created when a child mount is created under an existing mount that is marked as
       shared.  In this case, the new child mount is also marked as shared and the resulting peer group consists
       of all the mounts that are replicated under the peers of parent mounts.

       A mount ceases to be a member of a peer group when either the mount is explicitly unmounted, or when  the
       mount  is  implicitly  unmounted  because  a  mount  namespace  is removed (because it has no more member
       processes).

       The propagation type of the mounts in a mount namespace can  be  discovered  via  the  "optional  fields"
       exposed  in  /proc/pid/mountinfo.  (See proc(5) for details of this file.)  The following tags can appear
       in the optional fields for a record in that file:

       shared:X
              This mount is shared in peer group X.  Each peer group has  a  unique  ID  that  is  automatically
              generated  by the kernel, and all mounts in the same peer group will show the same ID.  (These IDs
              are assigned starting from the value 1, and may be recycled when a peer group ceases to  have  any
              members.)

       master:X
              This mount is a slave to shared peer group X.

       propagate_from:X (since Linux 2.6.26)
              This  mount  is  a  slave and receives propagation from shared peer group X.  This tag will always
              appear in conjunction with a master:X tag.  Here, X is the closest dominant peer group  under  the
              process's  root  directory.  If X is the immediate master of the mount, or if there is no dominant
              peer  group  under  the  same  root,  then  only  the  master:X  field  is  present  and  not  the
              propagate_from:X field.  For further details, see below.

       unbindable
              This is an unbindable mount.

       If none of the above tags is present, then this is a private mount.

   MS_SHARED and MS_PRIVATE example
       Suppose  that  on  a  terminal in the initial mount namespace, we mark one mount as shared and another as
       private, and then view the mounts in /proc/self/mountinfo:

           sh1# mount --make-shared /mntS
           sh1# mount --make-private /mntP
           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           77 61 8:17 / /mntS rw,relatime shared:1
           83 61 8:15 / /mntP rw,relatime

       From the /proc/self/mountinfo output, we see that /mntS is a shared mount in peer group 1, and that /mntP
       has no optional tags, indicating that it is a private mount.  The first two fields in each record in this
       file are the unique ID for this mount, and the mount ID of the parent mount.  We can further inspect this
       file to see that the parent mount of /mntS and /mntP is the  root  directory,  /,  which  is  mounted  as
       private:

           sh1# cat /proc/self/mountinfo | awk '$1 == 61' | sed 's/ - .*//'
           61 0 8:2 / / rw,relatime

       On a second terminal, we create a new mount namespace where we run a second shell and inspect the mounts:

           $ PS1='sh2# ' sudo unshare -m --propagation unchanged sh
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           222 145 8:17 / /mntS rw,relatime shared:1
           225 145 8:15 / /mntP rw,relatime

       The  new  mount  namespace  received  a  copy  of the initial mount namespace's mounts.  These new mounts
       maintain the same propagation types, but have unique mount  IDs.   (The  --propagation  unchanged  option
       prevents unshare(1) from marking all mounts as private when creating a new mount namespace, which it does
       by default.)

       In the second terminal, we then create submounts under each of /mntS and /mntP and inspect the set-up:

           sh2# mkdir /mntS/a
           sh2# mount /dev/sdb6 /mntS/a
           sh2# mkdir /mntP/b
           sh2# mount /dev/sdb7 /mntP/b
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           222 145 8:17 / /mntS rw,relatime shared:1
           225 145 8:15 / /mntP rw,relatime
           178 222 8:22 / /mntS/a rw,relatime shared:2
           230 225 8:23 / /mntP/b rw,relatime

       From  the  above,  it  can  be  seen that /mntS/a was created as shared (inheriting this setting from its
       parent mount) and /mntP/b was created as a private mount.

       Returning to the first terminal and inspecting the set-up, we see that the new mount  created  under  the
       shared  mount  /mntS  propagated  to  its  peer mount (in the initial mount namespace), but the new mount
       created under the private mount /mntP did not propagate:

           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           77 61 8:17 / /mntS rw,relatime shared:1
           83 61 8:15 / /mntP rw,relatime
           179 77 8:22 / /mntS/a rw,relatime shared:2

   MS_SLAVE example
       Making a mount a slave allows it to receive propagated mount(2) and umount(2) events from a master shared
       peer group, while preventing it from propagating events to that master.  This is useful  if  we  want  to
       (say)  receive  a mount event when an optical disk is mounted in the master shared peer group (in another
       mount namespace), but want to prevent mount(2) and umount(2) events under the  slave  mount  from  having
       side effects in other namespaces.

       We  can  demonstrate  the  effect  of  slaving by first marking two mounts as shared in the initial mount
       namespace:

           sh1# mount --make-shared /mntX
           sh1# mount --make-shared /mntY
           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           132 83 8:23 / /mntX rw,relatime shared:1
           133 83 8:22 / /mntY rw,relatime shared:2

       On a second terminal, we create a new mount namespace and inspect the mounts:

           sh2# unshare -m --propagation unchanged sh
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime shared:2

       In the new mount namespace, we then mark one of the mounts as a slave:

           sh2# mount --make-slave /mntY
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime master:2

       From the above output, we see that /mntY is now a slave mount that is receiving propagation  events  from
       the shared peer group with the ID 2.

       Continuing in the new namespace, we create submounts under each of /mntX and /mntY:

           sh2# mkdir /mntX/a
           sh2# mount /dev/sda3 /mntX/a
           sh2# mkdir /mntY/b
           sh2# mount /dev/sda5 /mntY/b

       When  we inspect the state of the mounts in the new mount namespace, we see that /mntX/a was created as a
       new shared mount (inheriting the "shared" setting from its parent mount) and /mntY/b  was  created  as  a
       private mount:

           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime master:2
           173 168 8:3 / /mntX/a rw,relatime shared:3
           175 169 8:5 / /mntY/b rw,relatime

       Returning  to  the  first  terminal  (in  the  initial  mount  namespace),  we see that the mount /mntX/a
       propagated to the peer (the shared /mntX), but the mount /mntY/b was not propagated:

           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           132 83 8:23 / /mntX rw,relatime shared:1
           133 83 8:22 / /mntY rw,relatime shared:2
           174 132 8:3 / /mntX/a rw,relatime shared:3

       Now we create a new mount under /mntY in the first shell:

           sh1# mkdir /mntY/c
           sh1# mount /dev/sda1 /mntY/c
           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           132 83 8:23 / /mntX rw,relatime shared:1
           133 83 8:22 / /mntY rw,relatime shared:2
           174 132 8:3 / /mntX/a rw,relatime shared:3
           178 133 8:1 / /mntY/c rw,relatime shared:4

       When we examine the mounts in the second mount namespace, we see that in this case the new mount has been
       propagated to the slave mount, and that the new mount is itself a slave mount (to peer group 4):

           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime master:2
           173 168 8:3 / /mntX/a rw,relatime shared:3
           175 169 8:5 / /mntY/b rw,relatime
           179 169 8:1 / /mntY/c rw,relatime master:4

   MS_UNBINDABLE example
       One of the primary purposes of  unbindable  mounts  is  to  avoid  the  "mount  explosion"  problem  when
       repeatedly  performing  bind  mounts  of  a  higher-level subtree at a lower-level mount.  The problem is
       illustrated by the following shell session.

       Suppose we have a system with the following mounts:

           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY

       Suppose furthermore that we wish to recursively bind mount the root directory under several  users'  home
       directories.  We do this for the first user, and inspect the mounts:

           # mount --rbind / /home/cecilia/
           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY

       When we repeat this operation for the second user, we start to see the explosion problem:

           # mount --rbind / /home/henry
           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY
           /dev/sda1 on /home/henry
           /dev/sdb6 on /home/henry/mntX
           /dev/sdb7 on /home/henry/mntY
           /dev/sda1 on /home/henry/home/cecilia
           /dev/sdb6 on /home/henry/home/cecilia/mntX
           /dev/sdb7 on /home/henry/home/cecilia/mntY

       Under  /home/henry, we have not only recursively added the /mntX and /mntY mounts, but also the recursive
       mounts of those directories under /home/cecilia that were created in the previous step.   Upon  repeating
       the step for a third user, it becomes obvious that the explosion is exponential in nature:

           # mount --rbind / /home/otto
           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY
           /dev/sda1 on /home/henry
           /dev/sdb6 on /home/henry/mntX
           /dev/sdb7 on /home/henry/mntY
           /dev/sda1 on /home/henry/home/cecilia
           /dev/sdb6 on /home/henry/home/cecilia/mntX
           /dev/sdb7 on /home/henry/home/cecilia/mntY
           /dev/sda1 on /home/otto
           /dev/sdb6 on /home/otto/mntX
           /dev/sdb7 on /home/otto/mntY
           /dev/sda1 on /home/otto/home/cecilia
           /dev/sdb6 on /home/otto/home/cecilia/mntX
           /dev/sdb7 on /home/otto/home/cecilia/mntY
           /dev/sda1 on /home/otto/home/henry
           /dev/sdb6 on /home/otto/home/henry/mntX
           /dev/sdb7 on /home/otto/home/henry/mntY
           /dev/sda1 on /home/otto/home/henry/home/cecilia
           /dev/sdb6 on /home/otto/home/henry/home/cecilia/mntX
           /dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY

       The  mount  explosion  problem  in  the  above  scenario  can be avoided by making each of the new mounts
       unbindable.  The effect of doing this is that recursive mounts of the root directory will  not  replicate
       the unbindable mounts.  We make such a mount for the first user:

           # mount --rbind --make-unbindable / /home/cecilia

       Before going further, we show that unbindable mounts are indeed unbindable:

           # mkdir /mntZ
           # mount --bind /home/cecilia /mntZ
           mount: wrong fs type, bad option, bad superblock on /home/cecilia,
                  missing codepage or helper program, or other error

                  In some cases useful info is found in syslog - try
                  dmesg | tail or so.

       Now we create unbindable recursive bind mounts for the other two users:

           # mount --rbind --make-unbindable / /home/henry
           # mount --rbind --make-unbindable / /home/otto

       Upon  examining  the list of mounts, we see there has been no explosion of mounts, because the unbindable
       mounts were not replicated under each user's directory:

           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY
           /dev/sda1 on /home/henry
           /dev/sdb6 on /home/henry/mntX
           /dev/sdb7 on /home/henry/mntY
           /dev/sda1 on /home/otto
           /dev/sdb6 on /home/otto/mntX
           /dev/sdb7 on /home/otto/mntY

   Propagation type transitions
       The following table shows the effect that applying a new propagation type (i.e., mount  --make-xxxx)  has
       on  the existing propagation type of a mount.  The rows correspond to existing propagation types, and the
       columns are the new propagation settings.  For reasons of space, "private" is abbreviated as  "priv"  and
       "unbindable" as "unbind".
                     make-shared   make-slave      make-priv  make-unbind
       ─────────────┬───────────────────────────────────────────────────────
       shared       │shared        slave/priv [1]  priv       unbind
       slave        │slave+shared  slave [2]       priv       unbind
       slave+shared │slave+shared  slave           priv       unbind
       private      │shared        priv [2]        priv       unbind
       unbindable   │shared        unbind [2]      priv       unbind

       Note the following details to the table:

       [1]  If  a  shared  mount  is  the only mount in its peer group, making it a slave automatically makes it
            private.

       [2]  Slaving a nonshared mount has no effect on the mount.

   Bind (MS_BIND) semantics
       Suppose that the following command is performed:

           mount --bind A/a B/b

       Here, A is the source mount, B is the destination mount, a is a subdirectory path under the  mount  point
       A,  and  b  is a subdirectory path under the mount point B.  The propagation type of the resulting mount,
       B/b, depends on the propagation types of the mounts A and B, and is summarized in the following table.

                                  source(A)
                          shared  private    slave         unbind
       ──────────────────┬──────────────────────────────────────────
       dest(B)  shared   │shared  shared     slave+shared  invalid
                nonshared│shared  private    slave         invalid

       Note that a recursive bind of a subtree follows the same semantics as for a bind operation on each  mount
       in the subtree.  (Unbindable mounts are automatically pruned at the target mount point.)

       For further details, see Documentation/filesystems/sharedsubtree.rst in the kernel source tree.

   Move (MS_MOVE) semantics
       Suppose that the following command is performed:

           mount --move A B/b

       Here,  A  is  the  source mount, B is the destination mount, and b is a subdirectory path under the mount
       point B.  The propagation type of the resulting mount, B/b, depends  on  the  propagation  types  of  the
       mounts A and B, and is summarized in the following table.

                                  source(A)
                          shared  private    slave         unbind
       ──────────────────┬─────────────────────────────────────────────
       dest(B)  shared   │shared  shared     slave+shared  invalid
                nonshared│shared  private    slave         unbindable

       Note: moving a mount that resides under a shared mount is invalid.

       For further details, see Documentation/filesystems/sharedsubtree.rst in the kernel source tree.

   Mount semantics
       Suppose that we use the following command to create a mount:

           mount device B/b

       Here,  B is the destination mount, and b is a subdirectory path under the mount point B.  The propagation
       type of the resulting mount, B/b, follows the same rules as for a bind mount, where the propagation  type
       of the source mount is considered always to be private.

   Unmount semantics
       Suppose that we use the following command to tear down a mount:

           umount A

       Here, A is a mount on B/b, where B is the parent mount and b is a subdirectory path under the mount point
       B.   If  B  is shared, then all most-recently-mounted mounts at b on mounts that receive propagation from
       mount B and do not have submounts under them are unmounted.

   The /proc/ pid /mountinfo propagate_from tag
       The propagate_from:X tag is shown in the optional fields of a /proc/pid/mountinfo record in cases where a
       process can't see a slave's immediate master (i.e., the pathname of the master is not reachable from  the
       filesystem  root  directory)  and  so cannot determine the chain of propagation between the mounts it can
       see.

       In the following example, we first  create  a  two-link  master-slave  chain  between  the  mounts  /mnt,
       /tmp/etc,  and  /mnt/tmp/etc.   Then  the  chroot(1)  command  is  used  to make the /tmp/etc mount point
       unreachable from the root directory, creating a  situation  where  the  master  of  /mnt/tmp/etc  is  not
       reachable from the (new) root directory of the process.

       First,  we  bind  mount the root directory onto /mnt and then bind mount /proc at /mnt/proc so that after
       the later chroot(1) the proc(5) filesystem remains visible at  the  correct  location  in  the  chroot-ed
       environment.

           # mkdir -p /mnt/proc
           # mount --bind / /mnt
           # mount --bind /proc /mnt/proc

       Next, we ensure that the /mnt mount is a shared mount in a new peer group (with no peers):

           # mount --make-private /mnt  # Isolate from any previous peer group
           # mount --make-shared /mnt
           # cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5

       Next, we bind mount /mnt/etc onto /tmp/etc:

           # mkdir -p /tmp/etc
           # mount --bind /mnt/etc /tmp/etc
           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5
           267 40 8:2 /etc /tmp/etc ... shared:102

       Initially,  these  two  mounts  are  in  the  same  peer  group, but we then make the /tmp/etc a slave of
       /mnt/etc, and then make /tmp/etc shared as well, so that it can propagate events to the next slave in the
       chain:

           # mount --make-slave /tmp/etc
           # mount --make-shared /tmp/etc
           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5
           267 40 8:2 /etc /tmp/etc ... shared:105 master:102

       Then we bind mount /tmp/etc onto /mnt/tmp/etc.  Again, the two mounts are  initially  in  the  same  peer
       group, but we then make /mnt/tmp/etc a slave of /tmp/etc:

           # mkdir -p /mnt/tmp/etc
           # mount --bind /tmp/etc /mnt/tmp/etc
           # mount --make-slave /mnt/tmp/etc
           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5
           267 40 8:2 /etc /tmp/etc ... shared:105 master:102
           273 239 8:2 /etc /mnt/tmp/etc ... master:105

       From  the above, we see that /mnt is the master of the slave /tmp/etc, which in turn is the master of the
       slave /mnt/tmp/etc.

       We then chroot(1) to the /mnt directory, which renders the mount with ID 267 unreachable from  the  (new)
       root directory:

           # chroot /mnt

       When we examine the state of the mounts inside the chroot-ed environment, we see the following:

           # cat /proc/self/mountinfo | sed 's/ - .*//'
           239 61 8:2 / / ... shared:102
           248 239 0:4 / /proc ... shared:5
           273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102

       Above,  we see that the mount with ID 273 is a slave whose master is the peer group 105.  The mount point
       for that master is unreachable, and so a propagate_from tag is displayed,  indicating  that  the  closest
       dominant  peer group (i.e., the nearest reachable mount in the slave chain) is the peer group with the ID
       102 (corresponding to the /mnt mount point before the chroot(1) was performed).

STANDARDS

       Linux.

HISTORY

       Linux 2.4.19.

NOTES

       The propagation type assigned to a new mount depends on the propagation type of the parent mount.  If the
       mount has a parent (i.e., it is a non-root mount point)  and  the  propagation  type  of  the  parent  is
       MS_SHARED, then the propagation type of the new mount is also MS_SHARED.  Otherwise, the propagation type
       of the new mount is MS_PRIVATE.

       Notwithstanding  the  fact  that  the default propagation type for new mount is in many cases MS_PRIVATE,
       MS_SHARED is typically more useful.  For this reason, systemd(1) automatically  remounts  all  mounts  as
       MS_SHARED  on  system startup.  Thus, on most modern systems, the default propagation type is in practice
       MS_SHARED.

       Since, when one uses unshare(1) to create a mount  namespace,  the  goal  is  commonly  to  provide  full
       isolation  of  the  mounts  in the new namespace, unshare(1) (since util-linux 2.27) in turn reverses the
       step performed by systemd(1), by making all mounts private in the new  namespace.   That  is,  unshare(1)
       performs the equivalent of the following in the new mount namespace:

           mount --make-rprivate /

       To prevent this, one can use the --propagation unchanged option to unshare(1).

       An  application  that  creates  a new mount namespace directly using clone(2) or unshare(2) may desire to
       prevent propagation of mount events to other mount namespaces (as is done by unshare(1)).   This  can  be
       done  by  changing  the propagation type of mounts in the new namespace to either MS_SLAVE or MS_PRIVATE,
       using a call such as the following:

           mount(NULL, "/", MS_SLAVE | MS_REC, NULL);

       For a discussion of propagation types when moving mounts (MS_MOVE) and creating  bind  mounts  (MS_BIND),
       see Documentation/filesystems/sharedsubtree.rst.

   Restrictions on mount namespaces
       Note the following points with respect to mount namespaces:

       [1]  Each mount namespace has an owner user namespace.  As explained above, when a new mount namespace is
            created,  its  mount list is initialized as a copy of the mount list of another mount namespace.  If
            the new namespace and the namespace from which the mount list was copied are owned by different user
            namespaces, then the new mount namespace is considered less privileged.

       [2]  When creating a less privileged mount namespace, shared mounts are reduced to  slave  mounts.   This
            ensures  that  mappings  performed  in  less  privileged mount namespaces will not propagate to more
            privileged mount namespaces.

       [3]  Mounts that come as a single unit from a more privileged mount namespace are locked together and may
            not be separated in a less privileged mount namespace.  (The unshare(2) CLONE_NEWNS operation brings
            across all of the mounts from the original mount namespace as a single unit,  and  recursive  mounts
            that propagate between mount namespaces propagate as a single unit.)

            In  this  context,  "may  not be separated" means that the mounts are locked so that they may not be
            individually unmounted.  Consider the following example:

                $ sudo sh
                # mount --bind /dev/null /etc/shadow
                # cat /etc/shadow       # Produces no output

            The above steps, performed in a more privileged mount namespace, have  created  a  bind  mount  that
            obscures the contents of the shadow password file, /etc/shadow.  For security reasons, it should not
            be  possible  to  umount(2) that mount in a less privileged mount namespace, since that would reveal
            the contents of /etc/shadow.

            Suppose we now create a new mount namespace owned by a new user namespace.  The new mount  namespace
            will  inherit  copies of all of the mounts from the previous mount namespace.  However, those mounts
            will be locked because the new mount namespace is less  privileged.   Consequently,  an  attempt  to
            umount(2) the mount fails as show in the following step:

                # unshare --user --map-root-user --mount \
                               strace -o /tmp/log \
                               umount /mnt/dir
                umount: /etc/shadow: not mounted.
                # grep '^umount' /tmp/log
                umount2("/etc/shadow", 0)     = -1 EINVAL (Invalid argument)

            The  error  message  from  mount(8) is a little confusing, but the strace(1) output reveals that the
            underlying umount2(2) system call failed with the error EINVAL, which is the error that  the  kernel
            returns to indicate that the mount is locked.

            Note,  however,  that  it  is possible to stack (and unstack) a mount on top of one of the inherited
            locked mounts in a less privileged mount namespace:

                # echo 'aaaaa' > /tmp/a    # File to mount onto /etc/shadow
                # unshare --user --map-root-user --mount \
                    sh -c 'mount --bind /tmp/a /etc/shadow; cat /etc/shadow'
                aaaaa
                # umount /etc/shadow

            The final umount(8) command above, which is performed in the  initial  mount  namespace,  makes  the
            original /etc/shadow file once more visible in that namespace.

       [4]  Following  on from point [3], note that it is possible to umount(2) an entire subtree of mounts that
            propagated as a unit into a less  privileged  mount  namespace,  as  illustrated  in  the  following
            example.

            First,  we  create  new user and mount namespaces using unshare(1).  In the new mount namespace, the
            propagation type of all mounts is set to private.  We then create a shared bind mount at /mnt, and a
            small hierarchy of mounts underneath that mount.

                $ PS1='ns1# ' sudo unshare --user --map-root-user \
                                       --mount --propagation private bash
                ns1# echo $$        # We need the PID of this shell later
                778501
                ns1# mount --make-shared --bind /mnt /mnt
                ns1# mkdir /mnt/x
                ns1# mount --make-private -t tmpfs none /mnt/x
                ns1# mkdir /mnt/x/y
                ns1# mount --make-private -t tmpfs none /mnt/x/y
                ns1# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
                986 83 8:5 /mnt /mnt rw,relatime shared:344
                989 986 0:56 / /mnt/x rw,relatime
                990 989 0:57 / /mnt/x/y rw,relatime

            Continuing in the same shell session, we then create a second shell in a new user  namespace  and  a
            new (less privileged) mount namespace and check the state of the propagated mounts rooted at /mnt.

                ns1# PS1='ns2# ' unshare --user --map-root-user \
                                       --mount --propagation unchanged bash
                ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
                1239 1204 8:5 /mnt /mnt rw,relatime master:344
                1240 1239 0:56 / /mnt/x rw,relatime
                1241 1240 0:57 / /mnt/x/y rw,relatime

            Of  note  in  the  above  output  is that the propagation type of the mount /mnt has been reduced to
            slave, as explained in point [2].  This means that submount events will propagate  from  the  master
            /mnt in "ns1", but propagation will not occur in the opposite direction.

            From  a  separate  terminal  window,  we  then use nsenter(1) to enter the mount and user namespaces
            corresponding to "ns1".  In that terminal window, we then  recursively  bind  mount  /mnt/x  at  the
            location /mnt/ppp.

                $ PS1='ns3# ' sudo nsenter -t 778501 --user --mount
                ns3# mount --rbind --make-private /mnt/x /mnt/ppp
                ns3# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
                986 83 8:5 /mnt /mnt rw,relatime shared:344
                989 986 0:56 / /mnt/x rw,relatime
                990 989 0:57 / /mnt/x/y rw,relatime
                1242 986 0:56 / /mnt/ppp rw,relatime
                1243 1242 0:57 / /mnt/ppp/y rw,relatime shared:518

            Because  the  propagation  type  of  the  parent  mount,  /mnt, was shared, the recursive bind mount
            propagated a small subtree of mounts under the slave mount /mnt into "ns2", as can  be  verified  by
            executing the following command in that shell session:

                ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
                1239 1204 8:5 /mnt /mnt rw,relatime master:344
                1240 1239 0:56 / /mnt/x rw,relatime
                1241 1240 0:57 / /mnt/x/y rw,relatime
                1244 1239 0:56 / /mnt/ppp rw,relatime
                1245 1244 0:57 / /mnt/ppp/y rw,relatime master:518

            While  it is not possible to umount(2) a part of the propagated subtree (/mnt/ppp/y) in "ns2", it is
            possible to umount(2) the entire subtree, as shown by the following commands:

                ns2# umount /mnt/ppp/y
                umount: /mnt/ppp/y: not mounted.
                ns2# umount -l /mnt/ppp | sed 's/ - .*//'      # Succeeds...
                ns2# grep /mnt /proc/self/mountinfo
                1239 1204 8:5 /mnt /mnt rw,relatime master:344
                1240 1239 0:56 / /mnt/x rw,relatime
                1241 1240 0:57 / /mnt/x/y rw,relatime

       [5]  The  mount(2)  flags  MS_RDONLY,  MS_NOSUID,  MS_NOEXEC,  and   the   "atime"   flags   (MS_NOATIME,
            MS_NODIRATIME,  MS_RELATIME) settings become locked when propagated from a more privileged to a less
            privileged mount namespace, and may not be changed in the less privileged mount namespace.

            This point is illustrated in the following example where, in a more privileged mount  namespace,  we
            create a bind mount that is marked as read-only.  For security reasons, it should not be possible to
            make the mount writable in a less privileged mount namespace, and indeed the kernel prevents this:

                $ sudo mkdir /mnt/dir
                $ sudo mount --bind -o ro /some/path /mnt/dir
                $ sudo unshare --user --map-root-user --mount \
                               mount -o remount,rw /mnt/dir
                mount: /mnt/dir: permission denied.

       [6]  A  file  or  directory  that  is a mount point in one namespace that is not a mount point in another
            namespace, may be renamed, unlinked, or removed (rmdir(2)) in the mount namespace in which it is not
            a mount point (subject to the usual permission checks).  Consequently, the mount point is removed in
            the mount namespace where it was a mount point.

            Previously (before Linux 3.18), attempting to unlink, rename, or remove a file or directory that was
            a mount point in another mount namespace would  result  in  the  error  EBUSY.   That  behavior  had
            technical  problems  of  enforcement (e.g., for NFS) and permitted denial-of-service attacks against
            more privileged users (i.e., preventing individual files from being updated by bind mounting on  top
            of them).

EXAMPLES

       See pivot_root(2).

NAME

DESCRIPTION

SHARED SUBTREES

STANDARDS

HISTORY

NOTES

EXAMPLES

SEE ALSO