1390057 – unshare --mount-proc fails with CLONE_NEWUSER without CLONE_NEWPID and fork

Bug 1390057 - unshare --mount-proc fails with CLONE_NEWUSER without CLONE_NEWPID and fork

Summary: unshare --mount-proc fails with CLONE_NEWUSER without CLONE_NEWPID and fork

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	util-linux
Sub Component:
Version:	25
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Karel Zak
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-31 05:16 UTC by Arcadiy Ivanov
Modified:	2017-02-27 11:12 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-02-27 11:12:20 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Arcadiy Ivanov 2016-10-31 05:16:21 UTC

Description of problem:

This might not be util-linux at all but I don't know an appropriate component (systemd?). Please reassign if necessary.

Firstly, I want to acknowledge that the `/proc` mount is mounted as shared in Fedora. However, the behavior of Fedora wrt to the namespaces with CLONE_NEWUSER, CLONE_NEWNS and CLONE_NEWPID does not appears to be adherent to the specification.

--mount-proc[=mountpoint]
              Just  before  running  the  program, mount the proc filesystem at mountpoint (default is /proc).  This is useful when creating a new PID namespace.  It also implies creating a new mount namespace since the /proc mount
would otherwise mess up existing programs on the system.  The new proc filesystem is explicitly mounted as private (with MS_PRIVATE|MS_REC).

From everything I can gather CLONE_NEWUSER and CLONE_NEWNS without CLONE_NEWPID and a subsequent fork can never mount a namespace-specific `/proc` in Fedora 24. I fully reproduce it in all of my code as well. 

Observe:

bash-4.3$ unshare -U --mount-proc
unshare: mount /proc failed: Operation not permitted 

<CLONE_NEWUSER + `--mount-proc` implies CLONE_NEWNS. Unshare has all caps in a new namespace. Why didn't we mount /proc?>

bash-4.3$ unshare -Up --mount-proc
unshare: mount /proc failed: Operation not permitted 

<ok, CLONE_NEWPID but no fork - I buy that, nothing changed vs the first example for the unshare process itself>

bash-4.3$ unshare -Upf --mount-proc

<Success! But why?>

It's important on Fedora that `--mount-proc` of `unshare` actually first performs the equivalent of `--make-rprivate` on `/proc`. It does so.

if (procmnt &&
	    (mount("none", procmnt, NULL, MS_PRIVATE|MS_REC, NULL) != 0 ||
	     mount("proc", procmnt, "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0))
			err(EXIT_FAILURE, _("mount %s failed"), procmnt);

And the first mount operation succeeds with `unshare -Umr` without CLONE_NEWPID and a fork. But the second mount fails.

bash-4.3$ unshare -Umr     

<-r is required since bash exec'ed by unshare doesn't inherit its caps>

-bash-4.3# mount --make-rprivate /proc
-bash-4.3# mount -t proc proc /proc
mount: permission denied - ???
-bash-4.3# exit

Version-Release number of selected component (if applicable):
2.24-2.fc24.x86_64

How reproducible:
Always

Steps to Reproduce:
Above

Actual results:
CLONE_NEWUSER with CLONE_NEWNS cannot mount own /proc without CLONE_NEWPID and fork.

Expected results:
CLONE_NEWUSER with CLONE_NEWNS should be able to mount own /proc on their own.

Additional info:

From "man 7 mount_namespaces":
NOTES:
       Since, when one uses unshare(1) to create a mount namespace, the goal
       is commonly to provide full isolation of the mount points in the new
       namespace, unshare(1) (since util-linux version 2.27) in turn
       reverses the step performed by systemd(1), by making all mount points
       private in the new namespace.  That is, unshare(1) performs the
       equivalent of the following in the new mount namespace:

           mount --make-rprivate /

From "man 7 user_namespaces":

       Holding CAP_SYS_ADMIN within the user namespace associated with a
       process's mount namespace allows that process to create bind mounts
       and mount the following types of filesystems:

           * /proc (since Linux 3.8)
           [snip-snip]

This last one is a direct controlling specification of the prescribed behavior.

Comment 1 Karel Zak 2016-12-01 12:34:21 UTC

> unshare --user --mount-proc

In this case you want create user namespace (root-like permission in the namespace) and you want to mount *shared system* /proc. I have doubts it makes sense from security point of view, because as root in user namespace you can access all originally hidden stuff from system /proc.

Let's imagine:

$ cat /proc/1/maps
cat: /proc/1/maps: Permission denied

if the "unshare --user --mount-proc" will be successful then I can bypass the "Permission denied" and access another processes. Right?


IMHO require CLONE_NEWPID to unshare also the /proc makes sense.

And I guess that CLONE_NEWPID requires fork() to create a "init" process in the new pid namespace -- without fork() the new /proc will not contain any process.

  $ unshare --user --pid --fork --mount-proc

seems correct. 

Does it make sense? ;-)

Comment 2 Arcadiy Ivanov 2016-12-01 14:46:29 UTC

Karel, 

I'm not asking for access elevation on a shared /proc.

The issue is that --mount-proc implies CLONE_NEWNS per code. So the actual combination is CLONE_NEWUSER and CLONE_NEWNS. This is user NS + mount NS. If I have NEWNS I have to be able to mount my own /proc at any rate. 

If what you're saying is correct then the only way to mount /proc is to have CLONE_NEWPID, and documentation is adamant that /proc can be mounted with NEWUSER and NEWNS as there are specific behavior examples provided in 4.08 man-pages (http://man7.org/linux/man-pages/man7/user_namespaces.7.html).

CLONE_NEWUSER inherently cannot exceed access permissions of the outer NS and topmost user, therefore `cat /proc/1/maps` is only possible if you already have access to it in the parent NS. Holding CAP_SYS_ADMIN in your new UNS will not magically grant you access to a resource to which the constraining NS doesn't have access. And if I'm mounting a /proc within my own NS and UNS my /proc should be limited, I presume, to what I have access in my new NS, i.e. behavior similar to hidepid=2 but constrained by namespace (I haven't found documentation that actually specifies what a private /proc should contain under a NEWUSER + NEWNS without NEWPID).

And if you're right, then there is a huuuuuuuge ;) documentation problem. As I pointed out `man 7 user_namespaces` declares unambiguously that:

       Holding CAP_SYS_ADMIN within the user namespace associated with a
       process's mount namespace allows that process to create bind mounts
       and mount the following types of filesystems:

           * /proc (since Linux 3.8)

And significant portions of `man 7 mount_namespaces` have to be rewritten (http://man7.org/linux/man-pages/man7/mount_namespaces.7.html). Incidentally `man 7 mount_namespaces` are absent entirely from F25.

Comment 3 Karel Zak 2016-12-02 10:11:22 UTC

The --mount-proc has been designed for CLONE_NEWPID. I cannot imagine usable --mount-proc without CLONE_NEWNS, I guess nobody wants to mess system /proc with another new unshared /proc. You don't have to use this option and mount your /proc manually after unshare(1) call.

Frankly, I'm not sure about relation between CLONE_NEWUSER and CLONE_NEWPID. The comment #1 has been my guess :-) It would be better to ask Eric (added to CC:) for more information.

Comment 4 Eric W. Biederman 2016-12-02 15:28:54 UTC

To set the stage for answering the question I need to add a little bit of information.  The man pages are not the specification and frequently omit fiddly details that are too difficult to explain clearly.  There have been a couple of changes to especially with the mounting of proc and sysfs to keep things secure,
and I don't believe the man pages have always been updated.

The case being tested is what can be done with a user namespace where getting all of the fiddly details exactly right is important so that people can't break preexisting setups with what user namespaces allow.

The practical rule for mounting proc and sysfs as root in a user namespace are:
- The user namespace root must have created a mount namespace for the mounts
  to happen in.
- If you don't have a namespace or namespaces that would affect the contents of proc or sysfs when they are mounted you need to use a bind mount.
- If except for the change in contents in the filesystem the view of proc or sysfs that you get is not equivalent to your existing view (AKA someone more privileged than you has mounted over parts of proc or sysfs in significant ways)
then it is not allowed to mount a new proc or sysfs.

Similarly the mount options for proc and sysfs are restricted to the mount options of the mount that is already visible.

Proc and sysfs have to be restricted so that a less privileged user can not get more access to sensitive files that a more privileged user made unavailable.
 
In short the rule for proc and sysfs is as close to a bind mount as possible.

Comment 5 Karel Zak 2017-02-27 11:12:20 UTC

Thanks Eric.

I have added to unshare(1) man page a short note that sysfs and procfs mounting in user namespace have to be restricted to avoid access to sensitive data. IMHO it's good enough :-)

Note You need to log in before you can comment on or make changes to this bug.