As we see the proliferation of using namespaces, I believe the user would expect to see all updates to the mount table from their namespace.
Current /etc/init.d/sandbox is used to only set / as mount shared
mount --make-rshared / || return $?
If systemd did this by default I can remove this init script altogether.
Without mounting the / and other file systems shared, any changes to the mount table will not be seen by other namespaces. To show an example of where this is important.
xguest (kiosk) account uses pam_namespace to mount a temporary /tmp and $HOME. To do this, it needs to separate from the main namespace. If an xguest user sticks a usb stick into the machine, the system processes udev/hal or udiskd mount the usb disk in the roots namespace. If the "/" file system is not shared, xguest user will not see the USB stick.
IMHO any code relying on the existing private vs shared state inherited from the primary OS filesystem is flawed. Code should always explicitly set the desired sharing mode in its namespace before doing anything else.
So if xguest wants a guarantee that mounts from the primary OS will propagate into the private namespace, then after doing unshare(CLONE_NEWNS), pam_namespace must explicitly re-mount the existing root filesystem passing the MS_SHARED|MS_REC flags. If it does that there is no need for the sandbox initscript, nor for systemd to set a default sharing policy to suit only xguest's needs, and not other apps.
So IMHO the sensible thing todo is to just pick the safest default in systemd. Defaulting to MS_PRIVATE for / is more robust against broken applications which might create a new namespace and then forget to make their namespace private, leaking mounts from the host in the process.
Hmmm, that's interesting, as this concept of namespaces and shared hierarchies is so convoluted, so that only a few assorted kernel hackers understand it, can you please help me with specifying what the pam_namespace should really do?
Assume the following situation:
1. / is private as default
2. There is a regular private mount /media/flash in the parent namespace. It should be shared from the parent namespace with the child namespaces created with pam_namespace. Also pam_namespace does not know that this private mount exists in the parent namespace and it can be unmounted/remounted any time.
3. There is /tmp-inst/alice directory that should be mounted as private bind mount over /tmp in the child namespace.
Is this correct mode of operation?
2. mount("/", "/", NULL, MS_SHARED|MS_REC, NULL)
3. mount("/tmp", "/tmp", NULL, MS_BIND, NULL)
4. mount("/tmp", "/tmp", NULL, MS_PRIVATE, NULL)
5. mount("/tmp-inst/alice, "/tmp", NULL, MS_BIND, NULL)
Or can we avoid somehow the bind mount in step 3 - for example would this be correct?
2. mount("/tmp-inst/alice, "/tmp", NULL, MS_BIND, NULL) /* still private here */
3. mount("/", "/", NULL, MS_SHARED|MS_REC, NULL) /* this will not make the /tmp shared? or will it? */
That is what does the marking of the / shared affect?
Sorry, I was mistaken in what I described, it only works in certain directions / certain combinations.
If the original namespace is MS_SHARED, then doing a mount MS_PRIVATE after unshare, will prevent stuff propagating from the original namespace into the child.
The reverse does not, however, work. If the original namespace is MS_PRIVATE, then doing a mount MS_SHARED after unshare, will *not* enable mounts from the parent to propagate to the child. It will only apply to further nested child namespaces.
So unfortunately I think Dan Walsh is correct. If we need mounts from the primary namespace to propagate into child namespaces then we need to have / shared from the start. Any apps not wanting this must explicit mount with MS_PRIVATE after unsharing the mount namespace.
That's unfortunate as doing it the other way around would be much more elegant.
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
I think the current thinking on this is that this should be a mount option, that could be specified in the /etc/fstab and then we could eliminate race conditions.
Lennart states that after discussing this issue:
"We would like to see this implemented in the kernel itself as a mount option like any other, that is applied to a mount point. We want this to guarantee that the propagation mode is applied atomically to the mounts at the time of mounting, and not done in a second step, to avoid any races. Also, with that in place we'd have a nice way to define the default, via a mount option for / in /etc/fstab. We wouldn't have to change anything in systemd then, but most likely /bin/mount would need a few minor updates.
All other options sound much worse since you either make since racy or things might end up being implemented in a layer were they better shoudn't be implemented. "
Al, Karel says that currently you can not execute multiple options with a MS_PROPOGATE in the same syscall.
mount /dev/sda1 /mnt -o foo,bar,shared
So mount would have to call two different Syscalls one to set foo,bar and one to set shared, which is a potential race condition. Can the kernel be fixed to allow them all at the same time?
We really need to fix the mount() syscall in kernel to allow multiple flags. The bug 755216 is another reason why.
This obviously needs to happen upstream. Is someone actively working on a patch to do this?
(Marking as FutureFeature so it doesn't get auto-closed)
In presence of this bug I'm inclined to say that having PrivateTmp=true by default is an inappropriate default for many services. It is very surprising behaviour that "mount /foo" does not show up in my running httpd daemon.
I take it this isn't getting fixed any time soon?
I thought that systemd was going to take care of this, IE
mount --make-rshared /
At boot time?
systemd calls this in early boot:
mount(NULL, "/", NULL, MS_REC|MS_SHARED, NULL);
It does it since version 190. It's also been backported to F17 in systemd-44-23.fc17.
Strange I am seeing similar problems with xguest not seeing usb sticks being inserted. Then if I login as root and execute
mount --make-rshared /
And login as xguest again, the disk starts show up.
Looking at mountinfo shows everything shared.
Joe, if you stop the apache server and run
systemctl httpd.service stop
mount --make-rshared /
systemctl httpd.service start
Does httpd see the new mount points?
Closing this. There's been no new comment in over 2 years. Please open a new bug if there's something actionable on this still.