Description of problem:
The Linux kernel now allows multiple instances of 'devpts' filesystem to be mounted on a host, and the pty indexes of each instance are independant. This feature allows Linux containers to be given private 'devpts' instances, preventing them from accessing PTYs from the host OS.
This is only secure, however, if every single mount of 'devpts' uses the 'newinstance' flag, including that of the initial host OS.
If the host OS does not use 'newinstance' mount option, then a container can still get access to the host's PTYs, which is clearly a security problem
When using 'newinstance', the /dev/ptmx file must also be symlinked, or bind-mounted to /dev/pts/ptmx
There is more information about the required setup in the kernel tree documentation, 'filesystems/devpts.txt'. The key point is in the last paragraph
7. A mount of devpts without the 'newinstance' option results in binding to
initial kernel mount. This behavior while preserving legacy semantics,
does not provide strict isolation in a container environment. i.e by
mounting devpts without the 'newinstance' option, a container could
get visibility into the 'host' or root container's devpts.
To workaround this and have strict isolation, all mounts of devpts,
including the mount in the root container, should use the newinstance
In Fedora 12, libvirt and the LXC toolchain will both be creating containers using 'newinstance' and want this to provide secure isolation of devpts. Thus we require that the host OS devpts is switched to use 'newinstance' and symlink of /dev/ptmx to /dev/pts/ptmx.
RPM claims the 'setup' RPM owns /etc/fstab, but that seems to be a ghost file, and AFAICT anaconda is responsible for initial creation. Thus I'm filing this bug against anaconda - feel free to reassign if this is the wrong place.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install a new Fedora host
2. 'mount | grep devpts'
3. ls -l /dev/ptmx
# mount | grep devpts
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
# ls -l /dev/ptmx
crw-rw-rw-. 1 root tty 5, 2 2009-05-20 14:35 /dev/ptmx
# mount | grep devpts
devpts on /dev/pts type devpts (rw,gid=5,mode=620,newinstance)
# ls -l /dev/ptmx
lrwxrwxrwx. 1 root root 8 2009-05-20 14:36 ptmx -> pts/ptmx
Just to clarify, I do not want this for F11. It is too risky to make such a change for F11 at this stage. Fedora 12 is my desired target for resolution.
If we want it mounted that way by default, then we should really change the defaults in the kernel. Not require changes in anaconda, livecd-creator, appliance-creator, ...
This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.
More information and reason for this action is here:
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '13'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 13's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 13 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Latest proposal for upstream kernel is to kill off the 'newinstance' flag and make all /dev/pts instances private by default.
This request has been pending for three years. This still needs to be done.
That lkml conversation didn't seem to have any particular outcome afaics.
perhaps just making systemd do it itself would get this happening quicker.
what is the exact reason to need "newinstance" in the init namespace? Isn't
it sufficient to use that in the container setup only?
We never share /dev from the host with the containers for obvious other
reasons, so why is there any change needed in the init /dev setup?
This is best illustrated with an example
$ virsh -c lxc:/// console busy
Connected to domain busy
Escape character is ^]
# ls /dev/
console full ptmx random stdin tty1 tty3 urandom
fd null pts stderr stdout tty2 tty4 zero
So inside my container I've got a private devpts instance which was mounted with the 'newinstance' flag.
But inside the container, if I'm evil I can do
# mkdir /mnt
# mount -t devpts none /mnt
# ls /mnt/
0 12 15 18 20 23 26 3 34 4 42 45 6 9
10 13 16 19 21 24 27 32 36 40 43 46 7 ptmx
11 14 17 2 22 25 28 33 37 41 44 5 8
and so I can access the host's devpts instance.
If the host OS had used the 'newinstance' flag in the initial mount namespace, then the container would not have been able to access the host's devpts instance.
Opps, cut + paste the wrong bit - that first example should have been showing /dev/pts not /dev. The containers devpts is:
# ls /dev/pts
0 1 2 3 ptmx
Being able to call mount() in a container and protecting against "evil"
makes not much sense to me. If you can mount random stuff in a container
you have lost anyway, right?
Historically, yes, if you have mount() privileges it is mostly game unless you have SELinux further controlling what the container can mount.
A goal of the user namespace work though is to make it possible for a container to be able use mount() safely, without us needing SELinux. The uid mapping between container & host protects the host filesystems in general, so even if you can mount the host devpts user, you would be blocked from using existing the ptys there. The caveat is that the admin may have changed the /dev/pts/ptmx to be world writable. That would allow container access, unless the host /dev/pts were using the 'newinstance' flag. Fedora /dev/pts/ptmx is restricted c--------- so we're safe by default from that issue.
So I don't think lack of 'newinstance' is a security problem out of the box (once LXC users usernamespacs), but it could become one if the admin were to unwittingly change permissions of /dev/pts/ptmx for some reason, not realizing the consequences for containers.
The status quo is kind of sufficient for container setups which restrict
mount(), the problem with devpts is not much bigger than with /sys or /dev
itself, where the container could reconfigure the entire hardware.
For the complete picture with "newinstance" in the init namespace with
devtmpfs, we would need to change the kernel. Systemd does not want to
play games with kernel-provided "dead" device nodes, does not want to
delete them, or overmount them. That needs all to be cleaned up on the
kernel side, not in userspace.
I guess the kernel could:
the default and not have it configurable at all.
Make "ptmxmode=0666" the default and ignore "newinstance" entirely.
get rid of: drivers/tty/pty.c
device_create(tty_class, NULL, MKDEV(TTYAUX_MAJOR, 2), NULL, "ptmx");
add a function to devtmpfs.c to be able to create symlinks in the devtmpfs
filesystem, and and call that to provide:
/dev/ptmx --> /dev/pts/ptmx
instead of creating the (now useless) device node there.
If strict backwards compat is needed, the current behaviour can be
encapsulated in a new legacy CONFIG option for !DEVTMPFS systems.
This seems reasonable to me, although we might want to leave an option in the kernel that provides strict legacy compatibility. In other words trade CONFIG_DEVPTS_MULTIPLE_INSTANCES for CONFIG_DEVPTS_PTMX_LEGACY and default to off.
This will instantly break any user that relies on a fixed /dev, but the fix for them is quite simple (change /dev/ptmx from a device node to a symlink, and you're good to go.)
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.
(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)
More information and reason for this action is here:
Kay, do you know if anyone did what you suggested? I'm guessing no, but I don't know if anyone is working on it.
I'm tempted to close this as UPSTREAM.
I'm not aware of any patches or current work.
Since this is a proposal that involved breaking existing userspace, it needs to be pitched to the kernel community as such. Linus Torvalds or Al Viro may have not-so-friendly things to say about it.
I'm pretty sure, it could be wrapped in a kernel config option, if that
makes people with a static /dev happier.
FYI from my POV this bug can just be closed WONTFIX. Without user namespaces enabled, nothing about LXC is at all secure if you are uid==0 in the container, so accessing devpts is the least of our worries. If you do have user namespaces enabled for a container, then you won't be permitted to mount a devpts instance without the newinstance flag avoiding this issue
static struct dentry *devpts_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
/* Require newinstance for all user namespace mounts to ensure
* the mount options are not changed.
if ((current_user_ns() != &init_user_ns) && !opts.newinstance)
For newinstance to work properly, all instances including the root instance should have it enabled, though.
The sad part is that there seems to be a huge stalemate with regards to where this should be done.
I'm closing this NOTABGU on basis that user namespaces are required in order to get any credible level of security for containers, at which point the lack of the newinstance flag in the host OS is not an problem.