Bug 1455749

Summary: Installer is hanging at Task "Start the Container Engine service" when using docker system container
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: ContainersAssignee: Jhon Honce <jhonce>
Status: CLOSED ERRATA QA Contact: Gan Huang <ghuang>
Severity: medium Docs Contact:
Priority: high    
Version: 3.6.0CC: aos-bugs, ghuang, gscrivan, jokerman, mmccomas, sdodson, smilner, smunilla
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 05:25:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1450307    

Description Gan Huang 2017-05-26 02:32:45 UTC
Description of problem:
Installer is hanging at Task "Start the Container Engine service" when using docker system container:

TASK [docker : Start the Container Engine service] *****************************
Friday 26 May 2017  00:42:40 +0000 (0:00:02.786)       0:03:26.592 ************ 



Version-Release number of selected component (if applicable):
atomic-1.17.2-3.git2760e30.el7.x86_64

https://github.com/openshift/openshift-ansible/pull/4272
(should not be related to this PR. I guess it's a issue in master branch or `atomic` package)

How reproducible:
always

Steps to Reproduce:
1.Trigger proxy installation with docker system contaienr
# cat inventory
<--snip-->
openshift_docker_use_system_container=true
openshift_docker_systemcontainer_image_registry_override=brew-xxx.redhat.com:8888/rhel7
openshift_http_proxy=http://xxx.redhat.com:3128
openshift_https_proxy=http://xxx.redhat.com:3128


Actual results:
Installer is hanging at Task "Start the Container Engine service" when using docker system container:

TASK [docker : Start the Container Engine service] *****************************
Friday 26 May 2017  00:42:40 +0000 (0:00:02.786)       0:03:26.592 ************ 


Expected results:


Additional info:
# cat /etc/systemd/system/container-engine.service
[Unit]
Description=Docker service
After=network.target

[Service]
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
Environment=GOTRACEBACK=crash
ExecStartPre=/bin/bash -c 'export -p > /run/docker-bash-env'
ExecStart=/bin/runc --systemd-cgroup run 'container-engine'
ExecStop=/bin/runc --systemd-cgroup kill 'container-engine'
Restart=on-failure
WorkingDirectory=/var/lib/containers/atomic/container-engine.0
RuntimeDirectory=docker
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
Type=notify
NotifyAccess=all

[Install]
WantedBy=multi-user.target


# journalctl -u container-engine
-- Logs begin at Thu 2017-05-25 20:35:27 EDT, end at Thu 2017-05-25 22:10:19 EDT. --
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal systemd[1]: Starting Docker service...
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: NOCHANGE: partition 2 could only be grown by -287 [fudge=20480]
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: Physical volume "/dev/vda2" changed
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: 1 physical volume(s) resized / 0 physical volume(s) not resized
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: INFO: Found an already configured thin pool /dev/mapper/rhel-docker--pool in /etc/sysconfig/docker-storage
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: INFO: Device node /dev/mapper/rhel-docker--pool exists.
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: Logical volume rhel/docker-pool changed.

Comment 1 Giuseppe Scrivano 2017-05-29 11:22:07 UTC
this may be caused by using systemd notifications in the Docker container.  I've seen this issue when using an old version of runC, Atomic Host 7.3.4 already has an updated runC that supports sd-notify correctly.

I've tried also on Atomic Host 7.3.5 and it works fine for me.

What version of runc are you using?  What does "systemctl status container-engine" show?

Here I have:

$ runc --version
runc version 1.0.0-rc3
commit: cafb8d8755dc2b990fc73fbf7bff62f534da9219-dirty
spec: 1.0.0-rc5

$ systemctl status container-engine
● container-engine.service - Docker service
   Loaded: loaded (/etc/systemd/system/container-engine.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/container-engine.service.d
           └─custom.conf
   Active: active (running) since Mon 2017-05-29 10:44:38 UTC; 36min ago
 Main PID: 743 (runc)
   Memory: 109.5M

$ sudo atomic images list | grep container-engine
   brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhel7/container-engine   latest    a3ccc45997e0   2017-05-29 10:56                  ostree

Comment 2 Steve Milner 2017-05-30 13:18:07 UTC
It works for runc 1.0.0-rc5 as well. However, I believe the versions before 7.3 not have a version of runc which is new enough to handle some of the feature updates.

Comment 3 Steve Milner 2017-05-31 13:58:26 UTC
After talking with some folks it looks like this is a container issue that is being worked on. Moving the ownership over to jhonce so he can update as work progresses.

Comment 4 Steve Milner 2017-06-01 18:57:26 UTC
An updated container build is ready for testing.

Comment 6 Gan Huang 2017-06-05 08:26:03 UTC
container-engine service can be started successfully.

atomic-1.17.2-4.git2760e30.el7.x86_64
# atomic --version
1.17.1

runc-1.0.0-6.gite800860.el7.x86_64
# runc --version
runc version 1.0.0-rc3
commit: cafb8d8755dc2b990fc73fbf7bff62f534da9219-dirty
spec: 1.0.0-rc5

# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Wed May 17 01:16:44 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Wed May 17 01:16:44 2017
 OS/Arch:         linux/amd64

# atomic images list |grep container-
   brew-xxxx.redhat.com:8888/rhel7/container-engine   latest   0fd49accf210   2017-06-05 02:16                  ostree    

Moving to verified.

Comment 8 errata-xmlrpc 2017-08-10 05:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716