Bug 1326502

Summary: SELinux blocks transfer of file descriptors across containers
Product: [Fedora] Fedora Reporter: Vaibhav Rastogi <vrastogi>
Component: dockerAssignee: Daniel Walsh <dwalsh>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 25CC: adimania, admiller, amurdaca, dwalsh, ichavero, jcajka, jchaloup, lsm5, marianne, miminar, nalin, riek, vbatts
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-19 22:24:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Code to reproduce bug none

Description Vaibhav Rastogi 2016-04-12 20:11:05 UTC
Created attachment 1146654 [details]
Code to reproduce bug

Description of problem:
Unix sockets can be used to transfer file descriptors across processes. Because containers share the same container state, this is also possible for processes running in different containers. However, SELinux blocks the file descriptor transfer (which I view as a bug) while otherwise allowing regular communication on Unix sockets across containers. File descriptors can be transferred if one of the (two) containers has the same IPC namespace as the host or if both the containers are in the same IPC namespace or SELinux is off.

Version-Release number of selected component (if applicable):
Tested on Centos-based Atomic running Docker 1.9.1. Also tested on Fedora 23 with the same Docker version.


How reproducible:
Always reproduced


Steps to Reproduce:
Use the attached code and create a Docker container using the provided Dockerfile. Here is a transcript. The message "printing to new stdout" is a printf done by the remote server after it receives client's stdout fd.

bash-4.2# docker build -t test .
...
bash-4.2# mkdir myvol
bash-4.2# docker run -it --rm --name server -v myvol:/tmp test /server

In another terminal...

bash-4.2# setenforce 0
bash-4.2# docker run -it --rm --name client -v myvol:/tmp test /client                       
sending message now
printing to new stdout
bash-4.2# setenforce 1
bash-4.2# docker run -it --rm --name client -v myvol:/tmp test /client
sending message now
bash-4.2# docker run -it --rm --name client -v myvol:/tmp --ipc host test /client
sending message now
printing to new stdout
bash-4.2# docker run -it --rm --name client -v myvol:/tmp --ipc container:server test /client
sending message now
printing to new stdout


Actual results:
Results shown above in the transcript

Expected results:
With setenforce 1 and no --ipc option, we should still see a "printing to new stdout" message.

Additional info:

Comment 1 Vaibhav Rastogi 2016-04-12 20:14:03 UTC
typo here: "Because containers share the same container state" => "Because containers share the same kernel state"

Comment 2 Daniel Walsh 2016-05-09 15:23:49 UTC
When you share the same IPC Namespace, your containers share the Same SELinux label, so to make this work, you need to have both containers share the same MCS Label.

Something like this will work

docker run -it --rm --security-opt label:level:s0:c1000,c1001 --name server -v myvol:/tmp test /server
docker run -it --rm --security-opt label:level:s0:c1000,c1001 --name client -v myvol:/tmp test /client

When a socket is created by a process it automatically gets assigned the same label as the process creating the socket.  If that socket is passed to another process the new process label has to have access to the socket with the "socket label".

So if I have a process labeled system_u:system_r:svirt_lxc_net_t:s0:c1,c2 and I pass that system_u:system_r:svirt_lxc_net_t:s0:c4,c5, SELinux will block the access.

Comment 3 Daniel Walsh 2016-05-09 16:53:07 UTC
Wrote a blog covering this.

http://danwalsh.livejournal.com/74421.html

Comment 4 Vaibhav Rastogi 2016-05-09 17:31:40 UTC
Thanks, this solution works well. However, the two containers now have no isolation from each other.

I do not know much about SELinux so have a couple of questions:

1. Why can regular communication happen on sockets (created as in OP in shared volumes) while only the transfer of file descriptors is restricted?
2. When a volume is mounted with the :z directive, the socket created inside it does not have MCS labels (the label, as shown by ls -lZ on host, is only system_u:object_r:svirt_sandbox_file_t:s0), and yet the file descriptors cannot be transferred. Why should this be the case?

Comment 5 Vaibhav Rastogi 2016-05-09 17:40:26 UTC
Thanks for the comment and the blog post, I didn't realize there was new activity until posted my comment. Feel free to close the bug again, but I would be very much educated with the answers to my questions. Also, is there a way we could enable passing of file descriptors on a particular socket while keeping the two containers isolated otherwise?

Will be happy to do this discussion in the comments to the blog post, if you prefer.

Comment 6 Daniel Walsh 2016-05-09 18:30:22 UTC
Could you attach the AVC's that you received?

And were you able to get two containers to talk over a shared socket, that should probably be blocked.

Comment 7 Vaibhav Rastogi 2016-05-09 19:26:23 UTC
Here is the example that illustrates it all. The first is a typescript from a terminal running the server. The second is the one with the client, interspersed with disabling/enabling of SELinux. The "hello world" message is always printed on the server (even when SELinux is enabled); this message is sent from the client. The message "printing to new stdout" is written by the server on the new fd. If the fd transfer succeeds, the message is printed on client else on the server. The AVC received is shown below the client terminal output. The path "/3" probably corresponds to fd 3.

[root@localhost code]# docker run -it --rm --name server -v /vagrant/code/myvol:/tmp:z test /server  
hello world
remote fd number 1
new fd number 3
hello world
remote fd number 1
new fd number 0
printing to new stdout


[root@localhost code]# setenforce 0
[root@localhost code]# docker run -it --rm --name client -v /vagrant/code/myvol:/tmp test /client
sending message now
printing to new stdout
[root@localhost code]# setenforce 1
[root@localhost code]# docker run -it --rm --name client -v /vagrant/code/myvol:/tmp test /client
sending message now


type=AVC msg=audit(1462820444.992:2087): avc:  denied  { read write } for  pid=23359 comm="server" path="/3" dev="devpts" ino=6 scontext=system_u:system_r:svirt_lxc_net_t:s0:c463,c604 tcontext=system_u:object_r:svirt_sandbox_file_t:s0:c386,c667 tclass=chr_file permissive=0

Containers talking over shared sockets appears like a common case, and I can see it elsewhere on the Web:
http://stackoverflow.com/questions/24956322/can-docker-port-forward-to-a-unix-file-socket-on-the-host-container
http://jpetazzo.github.io/2014/06/23/docker-ssh-considered-evil/#restart-my-service
http://stackoverflow.com/questions/32180589/docker-how-to-expose-a-socket-over-a-port-for-a-django-application

Servers like mysql and redis optionally provide services over Unix sockets and some containers use them as such.

Comment 8 Fedora Admin XMLRPC Client 2016-06-08 14:09:06 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 9 Jan Kurik 2016-07-26 04:15:26 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 10 Daniel Walsh 2016-08-19 22:24:12 UTC
The best way to handle this would be to tell them to share IPC between the containers.

docker run --ipc container=CONTINAINER1UUID ...

This will cause the SELinux labels to be the same and the IPC will work.