Bug 1340324

Summary: [docker1.10]Need more robust method to utilize docker client in node container
Product: OpenShift Container Platform Reporter: Scott Dodson <sdodson>
Component: ReleaseAssignee: Scott Dodson <sdodson>
Status: CLOSED ERRATA QA Contact: Ma xiaoqiang <xiama>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: adellape, aos-bugs, bleanhar, jialiu, jokerman, mmccomas, wsun, xtian
Target Milestone: ---Keywords: TestBlocker
Target Release: 3.2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Due to newer releases of docker changing the path of the docker executable, containerized nodes could fail to initialize the SDN because they could not execute docker properly. This bug fix updates the containerized node image to accommodate this change, and as a result containerized nodes work properly with current and future versions of docker.
Story Points: ---
Clone Of:
: 1342762 (view as bug list) Environment:
Last Closed: 2016-06-27 15:07:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1342762    

Description Scott Dodson 2016-05-27 02:14:54 UTC
Description of problem:
The SDN components of OpenShift call /usr/bin/docker. In an effort to ensure that the client matches the version of the daemon we're currently mounting /usr/bin/docker and /usr/bin/docker-current (if it exists) into the node container.

docker-1.10 will introduce dependencies on libseccomp. This means if the node container continues relying on /usr/bin/docker /usr/bin/docker-current being mounted from the host we'll also need to start mounting libseccomp into the container. We can do this, but this seems like we're going to be continuously chasing the problem here.

One way to handle this may be to provide a chroot wrapper in the node container. This seems to work, I need to test it more thoroughly.

https://github.com/openshift/origin/pull/9046

Comment 2 Scott Dodson 2016-06-04 19:05:59 UTC
To reproduce, install containerized node using docker-1.10. Exec into the node container and attempt to use docker, you'll get an error. Update to an image that fixes this by adding the wrapper script at /usr/local/bin/docker. Then try to use docker and it should work.

Comment 4 Johnny Liu 2016-06-08 12:34:37 UTC
Tested this bug with openshift3/node:v3.2.1.1 (2344f942d5a0), docker wrapper is working well. Sti build and deployment is completed successfully.


# docker exec -ti c5ddbdb49619 /bin/sh
sh-4.2# docker version
Client:
 Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-31.el7.x86_64
 Go version:      go1.4.2
 Git commit:      4779225/1.10.3
 Built:           
 OS/Arch:         linux/amd64

Server:
 Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-31.el7.x86_64
 Go version:      go1.4.2
 Git commit:      4779225/1.10.3
 Built:           
 OS/Arch:         linux/amd64


Found some minor issues in node system unit files:
# cat /etc/systemd/system/atomic-openshift-node.service
[Unit]
After=atomic-openshift-master.service
After=docker.service
After=openvswitch.service
PartOf=docker.service
Requires=docker.service
Requires=openvswitch.service
Wants=atomic-openshift-master.service
Requires=atomic-openshift-node-dep.service
After=atomic-openshift-node-dep.service

[Service]
EnvironmentFile=/etc/sysconfig/atomic-openshift-node
EnvironmentFile=/etc/sysconfig/atomic-openshift-node-dep
ExecStartPre=-/usr/bin/docker rm -f atomic-openshift-node
ExecStart=/usr/bin/docker run --name atomic-openshift-node --rm --privileged --net=host --pid=host --env-file=/etc/sysconfig/atomic-openshift-node -v /:/rootfs:ro -e CONFIG_FILE=${CONFIG_FILE} -e OPTIONS=${OPTIONS} -e HOST=/rootfs -e HOST_ETC=/host-etc -v /var/lib/origin:/var/lib/origin -v /etc/origin/node:/etc/origin/node -v /etc/localtime:/etc/localtime:ro -v /etc/machine-id:/etc/machine-id:ro -v /run:/run -v /sys:/sys:ro -v /usr/bin/docker:/usr/bin/docker:ro -v /var/lib/docker:/var/lib/docker -v /lib/modules:/lib/modules -v /etc/origin/openvswitch:/etc/openvswitch -v /etc/origin/sdn:/etc/openshift-sdn -v /etc/systemd/system:/host-etc/systemd/system -v /var/log:/var/log -v /dev:/dev $DOCKER_ADDTL_BIND_MOUNTS openshift3/node:${IMAGE_VERSION}
ExecStartPost=/usr/bin/sleep 10
ExecStop=/usr/bin/docker stop atomic-openshift-node
SyslogIdentifier=atomic-openshift-node
Restart=always
RestartSec=5s

[Install]
WantedBy=docker.service


Now we are using chroot to use host's rootfs, so I think we should cleanup the service system unit file, e.g:
1. remove "/etc/sysconfig/atomic-openshift-node-dep" files, remove "EnvironmentFile=/etc/sysconfig/atomic-openshift-node-dep" lines.
2. Remove "-v /usr/bin/docker:/usr/bin/docker:ro" just like what did in https://github.com/openshift/origin/pull/9046

Comment 5 Scott Dodson 2016-06-08 13:05:52 UTC
Jianlin,

I agree that we should clean that up, but I don't think we can do that until we can be sure that all, or a majority, of users are using these newer images. I think what we should do is leave it in for 3.2 installs and stop adding it for 3.3 installs. What do you think?

Comment 6 Brenton Leanhardt 2016-06-08 13:24:28 UTC
I definitely agree if it's not a blocking issue to consider this ON_QA and track the remaining problem in another bug.

Comment 7 Johnny Liu 2016-06-12 05:39:51 UTC
Verified this bug with atomicOpenShift-errata/3.2/2016-06-09.2 puddle, the same behavior in comment 4 is seen, based on comment 5 and 6, move this bug to "Verified".

Comment 9 errata-xmlrpc 2016-06-27 15:07:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1343