501718 – /dev/pts must use the 'newinstance' mount flag to avoid security problem with containers

Bug 501718 - /dev/pts must use the 'newinstance' mount flag to avoid security problem with containers

Summary: /dev/pts must use the 'newinstance' mount flag to avoid security problem with...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-05-20 13:36 UTC by Daniel Berrangé
Modified:	2015-01-15 12:09 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Clone Of:
Clones:	527483 (view as bug list)
Environment:
Last Closed:	2015-01-15 12:09:16 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Daniel Berrangé 2009-05-20 13:36:51 UTC

Description of problem:
The Linux kernel now allows multiple instances of 'devpts' filesystem to be mounted on a host, and the pty indexes of each instance are independant. This feature allows Linux containers to be given private 'devpts' instances, preventing them from accessing PTYs from the host OS.

This is only secure, however, if every single mount of 'devpts' uses the 'newinstance' flag, including that of the initial host OS.

If the host OS does not use 'newinstance' mount option, then a container can still get access to the host's PTYs, which is clearly a security problem

When using 'newinstance', the /dev/ptmx file must also be symlinked, or bind-mounted to /dev/pts/ptmx

There is more information about the required setup in the kernel tree documentation, 'filesystems/devpts.txt'. The key point is in the last paragraph

[quote]
7. A mount of devpts without the 'newinstance' option results in binding to
initial kernel mount. This behavior while preserving legacy semantics,
does not provide strict isolation in a container environment. i.e by
mounting devpts without the 'newinstance' option, a container could
get visibility into the 'host' or root container's devpts.

To workaround this and have strict isolation, all mounts of devpts,
including the mount in the root container, should use the newinstance
option.
[/quote]

In Fedora 12, libvirt and the LXC toolchain will both be creating containers using 'newinstance' and want this to provide secure isolation of devpts. Thus we require that the host OS devpts is switched to use 'newinstance' and symlink of /dev/ptmx to /dev/pts/ptmx.

RPM claims the 'setup' RPM owns /etc/fstab, but that seems to be a ghost file, and AFAICT anaconda is responsible for initial creation. Thus I'm filing this bug against anaconda - feel free to reassign if this is the wrong place.

Version-Release number of selected component (if applicable):
anaconda-11.5.0.54-1.fc11

How reproducible:
Always

Steps to Reproduce:
1. Install a new Fedora host
2. 'mount | grep devpts'
3. ls -l /dev/ptmx

Actual results:
# mount | grep devpts
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
# ls -l /dev/ptmx
crw-rw-rw-. 1 root tty 5, 2 2009-05-20 14:35 /dev/ptmx

Expected results:
# mount | grep devpts
devpts on /dev/pts type devpts (rw,gid=5,mode=620,newinstance)
# ls -l /dev/ptmx
lrwxrwxrwx. 1 root root 8 2009-05-20 14:36 ptmx -> pts/ptmx

Additional info:

Comment 1 Daniel Berrangé 2009-05-20 13:47:59 UTC

Just to clarify, I do not want this for F11. It is too risky to make such a change for F11 at this stage.  Fedora 12 is my desired target for resolution.

Comment 2 Jeremy Katz 2009-05-20 15:34:24 UTC

If we want it mounted that way by default, then we should really change the defaults in the kernel.  Not require changes in anaconda, livecd-creator, appliance-creator, ...

Comment 3 Bug Zapper 2010-03-15 12:37:02 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 4 Bug Zapper 2011-06-02 18:04:39 UTC

This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 5 Daniel Berrangé 2012-09-25 09:41:04 UTC

Latest proposal for upstream kernel is to kill off the 'newinstance' flag and make all /dev/pts instances private by default.

https://lkml.org/lkml/2012/9/22/142

Comment 6 H. Peter Anvin 2013-02-12 05:43:38 UTC

This request has been pending for three years.  This still needs to be done.

Comment 7 Dave Jones 2013-02-12 16:21:39 UTC

That lkml conversation didn't seem to have any particular outcome afaics.

perhaps just making systemd do it itself would get this happening quicker.

Comment 8 Kay Sievers 2013-03-07 12:28:07 UTC

Daniel,
what is the exact reason to need "newinstance" in the init namespace? Isn't
it sufficient to use that in the container setup only?

We never share /dev from the host with the containers for obvious other
reasons, so why is there any change needed in the init /dev setup?

Comment 9 Daniel Berrangé 2013-03-07 12:36:42 UTC

This is best illustrated with an example

$ virsh -c lxc:/// console busy
Connected to domain busy
Escape character is ^]
# ls /dev/
console  full  ptmx  random  stdin   tty1  tty3  urandom
fd	 null  pts   stderr  stdout  tty2  tty4  zero


So inside my container I've got a private devpts instance which was mounted with the 'newinstance' flag.

But inside the container, if I'm evil I can do

# mkdir /mnt
# mount -t devpts none /mnt
# ls /mnt/
0   12	15  18	20  23	26  3	34  4	42  45	6  9
10  13	16  19	21  24	27  32	36  40	43  46	7  ptmx
11  14	17  2	22  25	28  33	37  41	44  5	8

and so I can access the host's devpts instance.

If the host OS had used the 'newinstance' flag in the initial mount namespace, then the container would not have been able to access the host's devpts instance.

Comment 10 Daniel Berrangé 2013-03-07 12:37:50 UTC

Opps, cut + paste the wrong bit - that first example should have been showing /dev/pts not /dev. The containers devpts is:

# ls /dev/pts
0  1  2  3  ptmx

Comment 11 Kay Sievers 2013-03-07 12:39:19 UTC

Being able to call mount() in a container and protecting against "evil"
makes not much sense to me. If you can mount random stuff in a container
you have lost anyway, right?

Comment 12 Daniel Berrangé 2013-03-07 12:55:15 UTC

Historically, yes, if you have mount() privileges it is mostly game unless you have SELinux further controlling what the container can mount.

A goal of the user namespace work though is to make it possible for a container to be able use mount() safely, without us needing SELinux. The uid mapping between container & host protects the host filesystems in general, so even if you can mount the host devpts user, you would be blocked from using existing the ptys there. The caveat is that the admin may have changed the /dev/pts/ptmx to be world writable. That would allow container access, unless the host /dev/pts were using the 'newinstance' flag. Fedora /dev/pts/ptmx is restricted c--------- so we're safe by default from that issue.

So I don't think lack of 'newinstance' is a security problem out of the box (once LXC users usernamespacs), but it could become one if the admin were to unwittingly change permissions of /dev/pts/ptmx for some reason, not realizing the consequences for containers.

Comment 13 Kay Sievers 2013-03-07 13:41:26 UTC

I see.

The status quo is kind of sufficient for container setups which restrict
mount(), the problem with devpts is not much bigger than with /sys or /dev
itself, where the container could reconfigure the entire hardware.

For the complete picture with "newinstance" in the init namespace with
devtmpfs, we would need to change the kernel. Systemd does not want to
play games with kernel-provided "dead" device nodes, does not want to
delete them, or overmount them. That needs all to be cleaned up on the
kernel side, not in userspace.

I guess the kernel could:
1)
make:
  CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
the default and not have it configurable at all.

2)
Make "ptmxmode=0666" the default and ignore "newinstance" entirely.

3)
get rid of: drivers/tty/pty.c
  device_create(tty_class, NULL, MKDEV(TTYAUX_MAJOR, 2), NULL, "ptmx");

4)
add a function to devtmpfs.c to be able to create symlinks in the devtmpfs
filesystem, and and call that to provide:
  /dev/ptmx --> /dev/pts/ptmx
instead of creating the (now useless) device node there.

If strict backwards compat is needed, the current behaviour can be
encapsulated in a new legacy CONFIG option for !DEVTMPFS systems.

Comment 14 H. Peter Anvin, Intel 2013-03-07 20:57:59 UTC

This seems reasonable to me, although we might want to leave an option in the kernel that provides strict legacy compatibility.  In other words trade CONFIG_DEVPTS_MULTIPLE_INSTANCES for CONFIG_DEVPTS_PTMX_LEGACY and default to off.

This will instantly break any user that relies on a fixed /dev, but the fix for them is quite simple (change /dev/ptmx from a device node to a symlink, and you're good to go.)

Comment 15 Fedora End Of Life 2013-04-03 18:41:59 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 16 Josh Boyer 2013-05-13 18:46:48 UTC

Kay, do you know if anyone did what you suggested?  I'm guessing no, but I don't know if anyone is working on it.

I'm tempted to close this as UPSTREAM.

Comment 17 Kay Sievers 2013-05-13 20:16:36 UTC

I'm not aware of any patches or current work.

Comment 18 H. Peter Anvin 2013-05-13 20:38:59 UTC

Since this is a proposal that involved breaking existing userspace, it needs to be pitched to the kernel community as such.  Linus Torvalds or Al Viro may have not-so-friendly things to say about it.

Comment 19 Kay Sievers 2013-05-13 20:53:52 UTC

I'm pretty sure, it could be wrapped in a kernel config option, if that
makes people with a static /dev happier.

Comment 20 Daniel Berrangé 2013-12-19 17:23:37 UTC

FYI from my POV this bug can just be closed WONTFIX. Without user namespaces enabled, nothing about LXC is at all secure if you are uid==0 in the container, so accessing devpts is the least of our worries. If you do have user namespaces enabled for a container, then you won't be permitted to mount a devpts instance without the newinstance flag avoiding this issue


static struct dentry *devpts_mount(struct file_system_type *fs_type,
        int flags, const char *dev_name, void *data)
{
....snip....
        /* Require newinstance for all user namespace mounts to ensure
         * the mount options are not changed.
         */
        if ((current_user_ns() != &init_user_ns) && !opts.newinstance)
                return ERR_PTR(-EINVAL);

Comment 21 H. Peter Anvin, Intel 2013-12-19 18:32:11 UTC

For newinstance to work properly, all instances including the root instance should have it enabled, though.

The sad part is that there seems to be a huge stalemate with regards to where this should be done.

Comment 22 Daniel Berrangé 2015-01-15 12:09:16 UTC

I'm closing this NOTABGU on basis that user namespaces are required in order to get any credible level of security for containers, at which point the lack of the newinstance flag in the host OS is not an problem.

Note You need to log in before you can comment on or make changes to this bug.