Bug 501718

Summary: /dev/pts must use the 'newinstance' mount flag to avoid security problem with containers
Product: [Fedora] Fedora Reporter: Daniel Berrange <berrange>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: crobinso, gansalmon, hpa, h.peter.anvin, itamar, james, johannbg, jonathan, kernel-maint, lnykryn, madhu.chinakonda, metherid, msekleta, plautrba, rmaximo, rvokal, systemd-maint, vanmeeuwen+fedora, vpavlin
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 527483 (view as bug list) Environment:
Last Closed: 2015-01-15 07:09:16 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Daniel Berrange 2009-05-20 09:36:51 EDT
Description of problem:
The Linux kernel now allows multiple instances of 'devpts' filesystem to be mounted on a host, and the pty indexes of each instance are independant. This feature allows Linux containers to be given private 'devpts' instances, preventing them from accessing PTYs from the host OS.

This is only secure, however, if every single mount of 'devpts' uses the 'newinstance' flag, including that of the initial host OS.

If the host OS does not use 'newinstance' mount option, then a container can still get access to the host's PTYs, which is clearly a security problem

When using 'newinstance', the /dev/ptmx  file must also be symlinked, or bind-mounted to /dev/pts/ptmx


There is more information about the required setup in the kernel tree documentation, 'filesystems/devpts.txt'. The key point is in the last paragraph

[quote]
7. A mount of devpts without the 'newinstance' option results in binding to
   initial kernel mount.  This behavior while preserving legacy semantics,
   does not provide strict isolation in a container environment. i.e by
   mounting devpts without the 'newinstance' option, a container could
   get visibility into the 'host' or root container's devpts.
   
   To workaround this and have strict isolation, all mounts of devpts,
   including the mount in the root container, should use the newinstance
   option.
[/quote]


In Fedora 12, libvirt and the LXC toolchain will both be creating containers using 'newinstance' and want this to provide secure isolation of devpts. Thus we require that the host OS devpts is switched to use 'newinstance' and symlink of /dev/ptmx to /dev/pts/ptmx.

RPM claims the 'setup' RPM owns /etc/fstab, but that seems to be a ghost file, and AFAICT anaconda is responsible for initial creation. Thus I'm filing this bug against anaconda - feel free to reassign if this is the wrong place.

Version-Release number of selected component (if applicable):
   anaconda-11.5.0.54-1.fc11

How reproducible:
Always

Steps to Reproduce:
1. Install a new Fedora host
2. 'mount | grep devpts'
3. ls -l /dev/ptmx
  
Actual results:
# mount | grep devpts
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
# ls -l /dev/ptmx
crw-rw-rw-. 1 root tty 5, 2 2009-05-20 14:35 /dev/ptmx


Expected results:
# mount | grep devpts
devpts on /dev/pts type devpts (rw,gid=5,mode=620,newinstance)
# ls -l /dev/ptmx
lrwxrwxrwx. 1 root root 8 2009-05-20 14:36 ptmx -> pts/ptmx


Additional info:
Comment 1 Daniel Berrange 2009-05-20 09:47:59 EDT
Just to clarify, I do not want this for F11. It is too risky to make such a change for F11 at this stage.  Fedora 12 is my desired target for resolution.
Comment 2 Jeremy Katz 2009-05-20 11:34:24 EDT
If we want it mounted that way by default, then we should really change the defaults in the kernel.  Not require changes in anaconda, livecd-creator, appliance-creator, ...
Comment 3 Bug Zapper 2010-03-15 08:37:02 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 4 Bug Zapper 2011-06-02 14:04:39 EDT
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 5 Daniel Berrange 2012-09-25 05:41:04 EDT
Latest proposal for upstream kernel is to kill off the 'newinstance' flag and make all /dev/pts instances private by default.

https://lkml.org/lkml/2012/9/22/142
Comment 6 H. Peter Anvin 2013-02-12 00:43:38 EST
This request has been pending for three years.  This still needs to be done.
Comment 7 Dave Jones 2013-02-12 11:21:39 EST
That lkml conversation didn't seem to have any particular outcome afaics.

perhaps just making systemd do it itself would get this happening quicker.
Comment 8 Kay Sievers 2013-03-07 07:28:07 EST
Daniel,
what is the exact reason to need "newinstance" in the init namespace? Isn't
it sufficient to use that in the container setup only?

We never share /dev from the host with the containers for obvious other
reasons, so why is there any change needed in the init /dev setup?
Comment 9 Daniel Berrange 2013-03-07 07:36:42 EST
This is best illustrated with an example

$ virsh -c lxc:/// console busy
Connected to domain busy
Escape character is ^]
# ls /dev/
console  full  ptmx  random  stdin   tty1  tty3  urandom
fd	 null  pts   stderr  stdout  tty2  tty4  zero


So inside my container I've got a private devpts instance which was mounted with the 'newinstance' flag.

But inside the container, if I'm evil I can do

# mkdir /mnt
# mount -t devpts none /mnt
# ls /mnt/
0   12	15  18	20  23	26  3	34  4	42  45	6  9
10  13	16  19	21  24	27  32	36  40	43  46	7  ptmx
11  14	17  2	22  25	28  33	37  41	44  5	8

and so I can access the host's devpts instance.

If the host OS had used the 'newinstance' flag in the initial mount namespace, then the container would not have been able to access the host's devpts instance.
Comment 10 Daniel Berrange 2013-03-07 07:37:50 EST
Opps, cut + paste the wrong bit - that first example should have been showing /dev/pts not /dev. The containers devpts is:

# ls /dev/pts
0  1  2  3  ptmx
Comment 11 Kay Sievers 2013-03-07 07:39:19 EST
Being able to call mount() in a container and protecting against "evil"
makes not much sense to me. If you can mount random stuff in a container
you have lost anyway, right?
Comment 12 Daniel Berrange 2013-03-07 07:55:15 EST
Historically, yes, if you have mount() privileges it is mostly game unless you have SELinux further controlling what the container can mount.

A goal of the user namespace work though is to make it possible for a container to be able use mount() safely, without us needing SELinux. The uid mapping between container & host protects the host filesystems in general, so even if you can mount the host devpts user, you would be blocked from using existing the ptys there. The caveat is that the admin may have changed the /dev/pts/ptmx to be world writable. That would allow container access, unless the host /dev/pts were using the 'newinstance' flag. Fedora /dev/pts/ptmx is restricted c--------- so we're safe by default from that issue.

So I don't think lack of 'newinstance' is a security problem out of the box (once LXC users usernamespacs), but it could become one if the admin were to unwittingly change permissions of /dev/pts/ptmx for some reason, not realizing the consequences for containers.
Comment 13 Kay Sievers 2013-03-07 08:41:26 EST
I see.

The status quo is kind of sufficient for container setups which restrict
mount(), the problem with devpts is not much bigger than with /sys or /dev
itself, where the container could reconfigure the entire hardware.

For the complete picture with "newinstance" in the init namespace with
devtmpfs, we would need to change the kernel. Systemd does not want to
play games with kernel-provided "dead" device nodes, does not want to
delete them, or overmount them. That needs all to be cleaned up on the
kernel side, not in userspace.

I guess the kernel could:
1)
make:
  CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
the default and not have it configurable at all.

2)
Make "ptmxmode=0666" the default and ignore "newinstance" entirely.

3)
get rid of: drivers/tty/pty.c
  device_create(tty_class, NULL, MKDEV(TTYAUX_MAJOR, 2), NULL, "ptmx");

4)
add a function to devtmpfs.c to be able to create symlinks in the devtmpfs
filesystem, and and call that to provide:
  /dev/ptmx --> /dev/pts/ptmx
instead of creating the (now useless) device node there.

If strict backwards compat is needed, the current behaviour can be
encapsulated in a new legacy CONFIG option for !DEVTMPFS systems.
Comment 14 H. Peter Anvin, Intel 2013-03-07 15:57:59 EST
This seems reasonable to me, although we might want to leave an option in the kernel that provides strict legacy compatibility.  In other words trade CONFIG_DEVPTS_MULTIPLE_INSTANCES for CONFIG_DEVPTS_PTMX_LEGACY and default to off.

This will instantly break any user that relies on a fixed /dev, but the fix for them is quite simple (change /dev/ptmx from a device node to a symlink, and you're good to go.)
Comment 15 Fedora End Of Life 2013-04-03 14:41:59 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
Comment 16 Josh Boyer 2013-05-13 14:46:48 EDT
Kay, do you know if anyone did what you suggested?  I'm guessing no, but I don't know if anyone is working on it.

I'm tempted to close this as UPSTREAM.
Comment 17 Kay Sievers 2013-05-13 16:16:36 EDT
I'm not aware of any patches or current work.
Comment 18 H. Peter Anvin 2013-05-13 16:38:59 EDT
Since this is a proposal that involved breaking existing userspace, it needs to be pitched to the kernel community as such.  Linus Torvalds or Al Viro may have not-so-friendly things to say about it.
Comment 19 Kay Sievers 2013-05-13 16:53:52 EDT
I'm pretty sure, it could be wrapped in a kernel config option, if that
makes people with a static /dev happier.
Comment 20 Daniel Berrange 2013-12-19 12:23:37 EST
FYI from my POV this bug can just be closed WONTFIX. Without user namespaces enabled, nothing about LXC is at all secure if you are uid==0 in the container, so accessing devpts is the least of our worries. If you do have user namespaces enabled for a container, then you won't be permitted to mount a devpts instance without the newinstance flag avoiding this issue


static struct dentry *devpts_mount(struct file_system_type *fs_type,
        int flags, const char *dev_name, void *data)
{
....snip....
        /* Require newinstance for all user namespace mounts to ensure
         * the mount options are not changed.
         */
        if ((current_user_ns() != &init_user_ns) && !opts.newinstance)
                return ERR_PTR(-EINVAL);
Comment 21 H. Peter Anvin, Intel 2013-12-19 13:32:11 EST
For newinstance to work properly, all instances including the root instance should have it enabled, though.

The sad part is that there seems to be a huge stalemate with regards to where this should be done.
Comment 22 Daniel Berrange 2015-01-15 07:09:16 EST
I'm closing this NOTABGU on basis that user namespaces are required in order to get any credible level of security for containers, at which point the lack of the newinstance flag in the host OS is not an problem.