Bug 712089 - mount command should be able to handle mounting a file system as shared. [NEEDINFO]
Summary: mount command should be able to handle mounting a file system as shared.
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
Depends On:
Blocks: 755216
TreeView+ depends on / blocked
Reported: 2011-06-09 13:34 UTC by Daniel Walsh
Modified: 2015-11-02 19:21 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Last Closed: 2015-11-02 19:21:17 UTC
Type: ---
kzak: needinfo? (aviro)

Attachments (Terms of Use)

Description Daniel Walsh 2011-06-09 13:34:21 UTC
As we see the proliferation of using namespaces, I believe the user would expect to see all updates to the mount table from their namespace.

Current /etc/init.d/sandbox is used to only set / as mount shared

	mount --make-rshared / || return $? 

If systemd did this by default I can remove this init script altogether.

Without mounting the / and other file systems shared, any changes to the mount table will not be seen by other namespaces.  To show an example of where this is important.

xguest (kiosk) account uses pam_namespace to mount a temporary /tmp and $HOME.  To do this, it needs to separate from the main namespace.  If an xguest user sticks a usb stick into the machine, the system processes udev/hal or udiskd mount the usb disk in the roots namespace.  If the "/" file system is not shared, xguest user will not see the USB stick.

Comment 1 Daniel Berrangé 2011-09-21 11:54:01 UTC
IMHO any code relying on the existing private vs shared state inherited from the primary OS filesystem is flawed. Code should always explicitly set the desired sharing mode in its namespace before doing anything else.

So if xguest wants a guarantee that mounts from the primary OS will propagate into the private namespace, then after doing unshare(CLONE_NEWNS), pam_namespace must explicitly re-mount the existing root filesystem passing the  MS_SHARED|MS_REC flags. If it does that there is no need for the sandbox initscript, nor for systemd to set a default sharing policy to suit only xguest's needs, and not other apps. 

So IMHO the sensible thing todo is to just pick the safest default in systemd. Defaulting to MS_PRIVATE for / is more robust against broken applications which might create a new namespace and then forget to make their namespace private, leaking mounts from the host in the process.

Comment 2 Tomas Mraz 2011-09-21 12:17:04 UTC
Hmmm, that's interesting, as this concept of namespaces and shared hierarchies is so convoluted, so that only a few assorted kernel hackers understand it, can you please help me with specifying what the pam_namespace should really do?

Assume the following situation:

1. / is private as default
2. There is a regular private mount /media/flash in the parent namespace. It should be shared from the parent namespace with the child namespaces created with pam_namespace. Also pam_namespace does not know that this private mount exists in the parent namespace and it can be unmounted/remounted any time.
3. There is /tmp-inst/alice directory that should be mounted as private bind mount over /tmp in the child namespace.

Is this correct mode of operation?
1. unshare(CLONE_NEWNS)
2. mount("/", "/", NULL, MS_SHARED|MS_REC, NULL)
3. mount("/tmp", "/tmp", NULL, MS_BIND, NULL)
4. mount("/tmp", "/tmp", NULL, MS_PRIVATE, NULL)
5. mount("/tmp-inst/alice, "/tmp", NULL, MS_BIND, NULL)

Or can we avoid somehow the bind mount in step 3 - for example would this be correct?
1. unshare(CLONE_NEWNS)
2. mount("/tmp-inst/alice, "/tmp", NULL, MS_BIND, NULL) /* still private here */
3. mount("/", "/", NULL, MS_SHARED|MS_REC, NULL) /* this will not make the /tmp shared? or will it? */
That is what does the marking of the / shared affect?

Comment 3 Daniel Berrangé 2011-09-21 13:33:37 UTC
Sorry, I was mistaken in what I described, it only works in certain directions / certain combinations.

If the original namespace is MS_SHARED, then doing a mount MS_PRIVATE  after unshare, will prevent stuff propagating from the original namespace into the child.

The reverse does not, however, work. If the original namespace is MS_PRIVATE, then doing a mount MS_SHARED after unshare, will *not* enable mounts from the parent to propagate to the child. It will only apply to further nested child namespaces.

So unfortunately I think Dan Walsh is correct. If we need mounts from the primary namespace to propagate into child namespaces then we need to have / shared from the start. Any apps not wanting this must explicit mount with MS_PRIVATE after unsharing the mount namespace.

Comment 4 Tomas Mraz 2011-09-21 13:47:08 UTC
That's unfortunate as doing it the other way around would be much more elegant.

Comment 5 Fedora Admin XMLRPC Client 2011-10-20 16:25:24 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 6 Daniel Walsh 2011-11-04 15:37:54 UTC
I think the current thinking on this is that this should be a mount option, that could be specified in the /etc/fstab and then we could eliminate race conditions.

Lennart states that after discussing this issue:

"We would like to see this implemented in the kernel itself as a mount option like any other, that is applied to a mount point. We want this to guarantee that the propagation mode is applied atomically to the mounts at the time of mounting, and not done in a second step, to avoid any races. Also, with that in place we'd have a nice way to define the default, via a mount option for / in /etc/fstab. We wouldn't have to change anything in systemd then, but most likely /bin/mount would need a few minor updates.

All other options sound much worse since you either make since racy or things might end up being implemented in a layer were they better shoudn't be implemented. "

Comment 7 Daniel Walsh 2011-11-04 16:05:43 UTC
Al, Karel says that currently you can not execute multiple options with a MS_PROPOGATE in the same syscall.

mount /dev/sda1 /mnt -o foo,bar,shared

So mount would have to call two different Syscalls one to set foo,bar and one to set shared, which is a potential race condition.   Can the kernel be fixed to allow them all at the same time?

Comment 8 Tomas Mraz 2011-11-22 13:21:56 UTC
We really need to fix the mount() syscall in kernel to allow multiple flags. The bug 755216 is another reason why.

Comment 9 Josh Boyer 2011-11-22 14:03:26 UTC
This obviously needs to happen upstream.  Is someone actively working on a patch to do this?

(Marking as FutureFeature so it doesn't get auto-closed)

Comment 10 Joe Orton 2013-01-08 16:43:58 UTC
In presence of this bug I'm inclined to say that having PrivateTmp=true by default is an inappropriate default for many services.  It is very surprising behaviour that "mount /foo" does not show up in my running httpd daemon.  

I take it this isn't getting fixed any time soon?

Comment 11 Daniel Walsh 2013-01-08 22:16:29 UTC
I thought that systemd was going to take care of this, IE 

mount --make-rshared /

At boot time?

Comment 12 Michal Schmidt 2013-01-09 13:32:19 UTC
systemd calls this in early boot:

It does it since version 190. It's also been backported to F17 in systemd-44-23.fc17.

Comment 13 Daniel Walsh 2013-01-09 15:37:10 UTC
Strange I am seeing similar problems with xguest not seeing usb sticks being inserted.  Then if I login as root and execute 

mount --make-rshared /

And login as xguest again, the disk starts show up.

Looking at mountinfo shows everything shared.

Joe, if you stop the apache server and run

systemctl httpd.service stop
mount --make-rshared /
systemctl httpd.service start
mount /foo

Does httpd see the new mount points?

Comment 14 Josh Boyer 2015-11-02 19:21:17 UTC
Closing this.  There's been no new comment in over 2 years.  Please open a new bug if there's something actionable on this still.

Note You need to log in before you can comment on or make changes to this bug.