Description of problem: Installer is hanging at Task "Start the Container Engine service" when using docker system container: TASK [docker : Start the Container Engine service] ***************************** Friday 26 May 2017 00:42:40 +0000 (0:00:02.786) 0:03:26.592 ************ Version-Release number of selected component (if applicable): atomic-1.17.2-3.git2760e30.el7.x86_64 https://github.com/openshift/openshift-ansible/pull/4272 (should not be related to this PR. I guess it's a issue in master branch or `atomic` package) How reproducible: always Steps to Reproduce: 1.Trigger proxy installation with docker system contaienr # cat inventory <--snip--> openshift_docker_use_system_container=true openshift_docker_systemcontainer_image_registry_override=brew-xxx.redhat.com:8888/rhel7 openshift_http_proxy=http://xxx.redhat.com:3128 openshift_https_proxy=http://xxx.redhat.com:3128 Actual results: Installer is hanging at Task "Start the Container Engine service" when using docker system container: TASK [docker : Start the Container Engine service] ***************************** Friday 26 May 2017 00:42:40 +0000 (0:00:02.786) 0:03:26.592 ************ Expected results: Additional info: # cat /etc/systemd/system/container-engine.service [Unit] Description=Docker service After=network.target [Service] EnvironmentFile=-/etc/sysconfig/docker-storage EnvironmentFile=-/etc/sysconfig/docker-network Environment=GOTRACEBACK=crash ExecStartPre=/bin/bash -c 'export -p > /run/docker-bash-env' ExecStart=/bin/runc --systemd-cgroup run 'container-engine' ExecStop=/bin/runc --systemd-cgroup kill 'container-engine' Restart=on-failure WorkingDirectory=/var/lib/containers/atomic/container-engine.0 RuntimeDirectory=docker LimitNOFILE=1048576 LimitNPROC=1048576 LimitCORE=infinity TimeoutStartSec=0 Type=notify NotifyAccess=all [Install] WantedBy=multi-user.target # journalctl -u container-engine -- Logs begin at Thu 2017-05-25 20:35:27 EDT, end at Thu 2017-05-25 22:10:19 EDT. -- May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal systemd[1]: Starting Docker service... May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: NOCHANGE: partition 2 could only be grown by -287 [fudge=20480] May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: Physical volume "/dev/vda2" changed May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: 1 physical volume(s) resized / 0 physical volume(s) not resized May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: INFO: Found an already configured thin pool /dev/mapper/rhel-docker--pool in /etc/sysconfig/docker-storage May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: INFO: Device node /dev/mapper/rhel-docker--pool exists. May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: Logical volume rhel/docker-pool changed.
this may be caused by using systemd notifications in the Docker container. I've seen this issue when using an old version of runC, Atomic Host 7.3.4 already has an updated runC that supports sd-notify correctly. I've tried also on Atomic Host 7.3.5 and it works fine for me. What version of runc are you using? What does "systemctl status container-engine" show? Here I have: $ runc --version runc version 1.0.0-rc3 commit: cafb8d8755dc2b990fc73fbf7bff62f534da9219-dirty spec: 1.0.0-rc5 $ systemctl status container-engine ● container-engine.service - Docker service Loaded: loaded (/etc/systemd/system/container-engine.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/container-engine.service.d └─custom.conf Active: active (running) since Mon 2017-05-29 10:44:38 UTC; 36min ago Main PID: 743 (runc) Memory: 109.5M $ sudo atomic images list | grep container-engine brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhel7/container-engine latest a3ccc45997e0 2017-05-29 10:56 ostree
It works for runc 1.0.0-rc5 as well. However, I believe the versions before 7.3 not have a version of runc which is new enough to handle some of the feature updates.
After talking with some folks it looks like this is a container issue that is being worked on. Moving the ownership over to jhonce so he can update as work progresses.
An updated container build is ready for testing.
container-engine service can be started successfully. atomic-1.17.2-4.git2760e30.el7.x86_64 # atomic --version 1.17.1 runc-1.0.0-6.gite800860.el7.x86_64 # runc --version runc version 1.0.0-rc3 commit: cafb8d8755dc2b990fc73fbf7bff62f534da9219-dirty spec: 1.0.0-rc5 # docker version Client: Version: 1.12.6 API version: 1.24 Package version: docker-1.12.6-28.git1398f24.el7.x86_64 Go version: go1.7.4 Git commit: 1398f24/1.12.6 Built: Wed May 17 01:16:44 2017 OS/Arch: linux/amd64 Server: Version: 1.12.6 API version: 1.24 Package version: docker-1.12.6-28.git1398f24.el7.x86_64 Go version: go1.7.4 Git commit: 1398f24/1.12.6 Built: Wed May 17 01:16:44 2017 OS/Arch: linux/amd64 # atomic images list |grep container- brew-xxxx.redhat.com:8888/rhel7/container-engine latest 0fd49accf210 2017-06-05 02:16 ostree Moving to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716