Hide Forgot
Description of problem: One of the first things that is required when setting up a container with a private root filesystem, is to unmount all filesystems inherited from the host OS. For most filesystems this works without trouble, but for autofs this is not the case. For demonstration, consider an autofs map with one direct mount and one indirect mount: # cat /etc/auto.master /net -hosts /- /etc/auto.marrow # cat /etc/auto.marrow /mnt/demo marrow.example.com:/var/lib/libvirt/images When autofs initially starts these mount points are visible # grep -E ' (/net|/mnt)' /proc/mounts -hosts /net autofs rw,relatime,fd=6,pgrp=2938,timeout=300,minproto=5,maxproto=5,indirect 0 0 /etc/auto.marrow /mnt/demo autofs rw,relatime,fd=12,pgrp=2938,timeout=300,minproto=5,maxproto=5,direct 0 0 The attached demo program creates a container and attempts to unmount the requested filesystem inside the container. With a direct mount, which has not yet been triggered it fails: # ./autofsdemo /mnt/demo We are the parent We are the container! Found mount point 1 /mnt/demo auto.marrow autofs rw,relatime,fd=13,pgrp=1363,timeout=300,minproto=5,maxproto=5,direct Umount point 1 /mnt/demo Could not umount /mnt/demo: Operation not permitted If I then trigger the mount # ls /mnt/demo debian-6.0.2.1-amd64-netinst.iso debian6-x86_64.img f16_x86_64.img migtest.img rhel6x86_64.img sanlock tck # grep /mnt/demo /proc/mounts auto.marrow /mnt/demo autofs rw,relatime,fd=13,pgrp=1363,timeout=300,minproto=5,maxproto=5,direct 0 0 marrow.gsslab.fab.redhat.com:/var/lib/libvirt/images/ /mnt/demo nfs rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.33.8.114,mountvers=3,mountport=35386,mountproto=udp,local_lock=none,addr=10.33.8.114 0 0 # ./autofsdemo /mnt/demo We are the parent We are the container! Found mount point 1 /mnt/demo auto.marrow autofs rw,relatime,fd=13,pgrp=1363,timeout=300,minproto=5,maxproto=5,direct Found mount point 2 /mnt/demo marrow.gsslab.fab.redhat.com:/var/lib/libvirt/images/ nfs rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.33.8.114,mountvers=3,mountport=35386,mountproto=udp,local_lock=none,addr=10.33.8.114 Umount point 2 /mnt/demo Could not umount /mnt/demo: Operation not permitted We see I can't even unmount the NFS server mount, let alone the autofs mount. I believe indirect mounts will have similar problems, but I am currently unable to check that, pending resolution of bug 745781 Attempting to use umount2() with MNT_DETACH also fails, so I can't simply hide the mounts from the container either. Version-Release number of selected component (if applicable): kernel-3.1.0-0.rc9.git0.0.fc16.x86_64 autofs-5.0.6-2.fc16.x86_64 How reproducible: Always Steps to Reproduce: 1. See above + attached demo program
Created attachment 527963 [details] Demo unmount inside a container
Have you sent a query on this upstream at all?
I have mentioned this in private email to Ian Kent a few weeks back.
Adding Ian and David to CC.
A little more investigation shows that this is where the umount failure occurs: SYSCALL_DEFINE2(umount, char __user *, name, int, flags) { ..... retval = -EPERM; if (!capable(CAP_SYS_ADMIN)) { printk(KERN_INFO "umount: CAP_SYS_ADMIN check failed\n"); goto dput_and_out; } ... } The printk() is mine and triggers when I run the test. It seems to me that root user does not inherit CAP_SYS_ADMIN over the clone() call. The same test is done during mount so that must be forbidden as well. I don't doubt there will be other challenges but we can't even start to work out what they are if umount and mount are denied by the CAP_SYS_ADMIN check. A quick Goolge search shows that this difficulty is well known but I didn't see any sensible way of overcoming it. Mind you the date on posts was some months ago and that may have changed since. Ian
> It seems to me that root user does not inherit CAP_SYS_ADMIN over > the clone() call. This is very odd, capabilities are untouched across clone(), and containers definitely have the ability to mount()/umount() filesystems because in libvirt, by the time we hit the autofs problem, we've already mounted & unmount many many other filesystems since the clone() call.
> This is very odd, capabilities are untouched across clone(), Arrrrrrrrrrrggggh, my fault :-( My demo program has the CLONE_NEWUSER flag set. Prior to 3.x kernels, this was effectively a no-op, but it now has special meaning wrt containers. Please modify the demo program in this BZ to remove the CLONE_NEWUSER flag, and you should then see the real autofs problem.
Ok, after fixing the demo program, it appears that there is *not* any autofs problem in the kernel 3.1 or later. It must have been fixed sometime between 3.0 (where I originally saw the problem) and 3.1. Sorry for the noise.
(In reply to comment #9) > Ok, after fixing the demo program, it appears that there is *not* any autofs > problem in the kernel 3.1 or later. It must have been fixed sometime between > 3.0 (where I originally saw the problem) and 3.1. Sorry for the noise. Yes, there was a change that I believe went into 3.1.0-rc9. This is the only other (we had one other) reported problem caused by the initial vfs-automount implementation. That means I'll need to remember this if (when) I lobby to re-introduce that semantic behaviour. Thanks, is there anything else we need to investigate?
WRT to F16/rawhide I believe we're all OK with autofs + LXC now.