Bug 1326502 - SELinux blocks transfer of file descriptors across containers
Summary: SELinux blocks transfer of file descriptors across containers
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: docker
Version: 25
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Daniel Walsh
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-12 20:11 UTC by Vaibhav Rastogi
Modified: 2016-08-19 22:24 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-19 22:24:12 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Code to reproduce bug (6.58 KB, application/x-gzip)
2016-04-12 20:11 UTC, Vaibhav Rastogi
no flags Details

Description Vaibhav Rastogi 2016-04-12 20:11:05 UTC
Created attachment 1146654 [details]
Code to reproduce bug

Description of problem:
Unix sockets can be used to transfer file descriptors across processes. Because containers share the same container state, this is also possible for processes running in different containers. However, SELinux blocks the file descriptor transfer (which I view as a bug) while otherwise allowing regular communication on Unix sockets across containers. File descriptors can be transferred if one of the (two) containers has the same IPC namespace as the host or if both the containers are in the same IPC namespace or SELinux is off.

Version-Release number of selected component (if applicable):
Tested on Centos-based Atomic running Docker 1.9.1. Also tested on Fedora 23 with the same Docker version.


How reproducible:
Always reproduced


Steps to Reproduce:
Use the attached code and create a Docker container using the provided Dockerfile. Here is a transcript. The message "printing to new stdout" is a printf done by the remote server after it receives client's stdout fd.

bash-4.2# docker build -t test .
...
bash-4.2# mkdir myvol
bash-4.2# docker run -it --rm --name server -v myvol:/tmp test /server

In another terminal...

bash-4.2# setenforce 0
bash-4.2# docker run -it --rm --name client -v myvol:/tmp test /client                       
sending message now
printing to new stdout
bash-4.2# setenforce 1
bash-4.2# docker run -it --rm --name client -v myvol:/tmp test /client
sending message now
bash-4.2# docker run -it --rm --name client -v myvol:/tmp --ipc host test /client
sending message now
printing to new stdout
bash-4.2# docker run -it --rm --name client -v myvol:/tmp --ipc container:server test /client
sending message now
printing to new stdout


Actual results:
Results shown above in the transcript

Expected results:
With setenforce 1 and no --ipc option, we should still see a "printing to new stdout" message.

Additional info:

Comment 1 Vaibhav Rastogi 2016-04-12 20:14:03 UTC
typo here: "Because containers share the same container state" => "Because containers share the same kernel state"

Comment 2 Daniel Walsh 2016-05-09 15:23:49 UTC
When you share the same IPC Namespace, your containers share the Same SELinux label, so to make this work, you need to have both containers share the same MCS Label.

Something like this will work

docker run -it --rm --security-opt label:level:s0:c1000,c1001 --name server -v myvol:/tmp test /server
docker run -it --rm --security-opt label:level:s0:c1000,c1001 --name client -v myvol:/tmp test /client

When a socket is created by a process it automatically gets assigned the same label as the process creating the socket.  If that socket is passed to another process the new process label has to have access to the socket with the "socket label".

So if I have a process labeled system_u:system_r:svirt_lxc_net_t:s0:c1,c2 and I pass that system_u:system_r:svirt_lxc_net_t:s0:c4,c5, SELinux will block the access.

Comment 3 Daniel Walsh 2016-05-09 16:53:07 UTC
Wrote a blog covering this.

http://danwalsh.livejournal.com/74421.html

Comment 4 Vaibhav Rastogi 2016-05-09 17:31:40 UTC
Thanks, this solution works well. However, the two containers now have no isolation from each other.

I do not know much about SELinux so have a couple of questions:

1. Why can regular communication happen on sockets (created as in OP in shared volumes) while only the transfer of file descriptors is restricted?
2. When a volume is mounted with the :z directive, the socket created inside it does not have MCS labels (the label, as shown by ls -lZ on host, is only system_u:object_r:svirt_sandbox_file_t:s0), and yet the file descriptors cannot be transferred. Why should this be the case?

Comment 5 Vaibhav Rastogi 2016-05-09 17:40:26 UTC
Thanks for the comment and the blog post, I didn't realize there was new activity until posted my comment. Feel free to close the bug again, but I would be very much educated with the answers to my questions. Also, is there a way we could enable passing of file descriptors on a particular socket while keeping the two containers isolated otherwise?

Will be happy to do this discussion in the comments to the blog post, if you prefer.

Comment 6 Daniel Walsh 2016-05-09 18:30:22 UTC
Could you attach the AVC's that you received?

And were you able to get two containers to talk over a shared socket, that should probably be blocked.

Comment 7 Vaibhav Rastogi 2016-05-09 19:26:23 UTC
Here is the example that illustrates it all. The first is a typescript from a terminal running the server. The second is the one with the client, interspersed with disabling/enabling of SELinux. The "hello world" message is always printed on the server (even when SELinux is enabled); this message is sent from the client. The message "printing to new stdout" is written by the server on the new fd. If the fd transfer succeeds, the message is printed on client else on the server. The AVC received is shown below the client terminal output. The path "/3" probably corresponds to fd 3.

[root@localhost code]# docker run -it --rm --name server -v /vagrant/code/myvol:/tmp:z test /server  
hello world
remote fd number 1
new fd number 3
hello world
remote fd number 1
new fd number 0
printing to new stdout


[root@localhost code]# setenforce 0
[root@localhost code]# docker run -it --rm --name client -v /vagrant/code/myvol:/tmp test /client
sending message now
printing to new stdout
[root@localhost code]# setenforce 1
[root@localhost code]# docker run -it --rm --name client -v /vagrant/code/myvol:/tmp test /client
sending message now


type=AVC msg=audit(1462820444.992:2087): avc:  denied  { read write } for  pid=23359 comm="server" path="/3" dev="devpts" ino=6 scontext=system_u:system_r:svirt_lxc_net_t:s0:c463,c604 tcontext=system_u:object_r:svirt_sandbox_file_t:s0:c386,c667 tclass=chr_file permissive=0

Containers talking over shared sockets appears like a common case, and I can see it elsewhere on the Web:
http://stackoverflow.com/questions/24956322/can-docker-port-forward-to-a-unix-file-socket-on-the-host-container
http://jpetazzo.github.io/2014/06/23/docker-ssh-considered-evil/#restart-my-service
http://stackoverflow.com/questions/32180589/docker-how-to-expose-a-socket-over-a-port-for-a-django-application

Servers like mysql and redis optionally provide services over Unix sockets and some containers use them as such.

Comment 8 Fedora Admin XMLRPC Client 2016-06-08 14:09:06 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 9 Jan Kurik 2016-07-26 04:15:26 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 10 Daniel Walsh 2016-08-19 22:24:12 UTC
The best way to handle this would be to tell them to share IPC between the containers.

docker run --ipc container=CONTINAINER1UUID ...

This will cause the SELinux labels to be the same and the IPC will work.


Note You need to log in before you can comment on or make changes to this bug.