Bug 1455749 - Installer is hanging at Task "Start the Container Engine service" when using docker system container
Summary: Installer is hanging at Task "Start the Container Engine service" when using ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Jhon Honce
QA Contact: Gan Huang
URL:
Whiteboard:
Depends On:
Blocks: 1450307
TreeView+ depends on / blocked
 
Reported: 2017-05-26 02:32 UTC by Gan Huang
Modified: 2017-08-16 19:51 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-10 05:25:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Gan Huang 2017-05-26 02:32:45 UTC
Description of problem:
Installer is hanging at Task "Start the Container Engine service" when using docker system container:

TASK [docker : Start the Container Engine service] *****************************
Friday 26 May 2017  00:42:40 +0000 (0:00:02.786)       0:03:26.592 ************ 



Version-Release number of selected component (if applicable):
atomic-1.17.2-3.git2760e30.el7.x86_64

https://github.com/openshift/openshift-ansible/pull/4272
(should not be related to this PR. I guess it's a issue in master branch or `atomic` package)

How reproducible:
always

Steps to Reproduce:
1.Trigger proxy installation with docker system contaienr
# cat inventory
<--snip-->
openshift_docker_use_system_container=true
openshift_docker_systemcontainer_image_registry_override=brew-xxx.redhat.com:8888/rhel7
openshift_http_proxy=http://xxx.redhat.com:3128
openshift_https_proxy=http://xxx.redhat.com:3128


Actual results:
Installer is hanging at Task "Start the Container Engine service" when using docker system container:

TASK [docker : Start the Container Engine service] *****************************
Friday 26 May 2017  00:42:40 +0000 (0:00:02.786)       0:03:26.592 ************ 


Expected results:


Additional info:
# cat /etc/systemd/system/container-engine.service
[Unit]
Description=Docker service
After=network.target

[Service]
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
Environment=GOTRACEBACK=crash
ExecStartPre=/bin/bash -c 'export -p > /run/docker-bash-env'
ExecStart=/bin/runc --systemd-cgroup run 'container-engine'
ExecStop=/bin/runc --systemd-cgroup kill 'container-engine'
Restart=on-failure
WorkingDirectory=/var/lib/containers/atomic/container-engine.0
RuntimeDirectory=docker
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
Type=notify
NotifyAccess=all

[Install]
WantedBy=multi-user.target


# journalctl -u container-engine
-- Logs begin at Thu 2017-05-25 20:35:27 EDT, end at Thu 2017-05-25 22:10:19 EDT. --
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal systemd[1]: Starting Docker service...
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: NOCHANGE: partition 2 could only be grown by -287 [fudge=20480]
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: Physical volume "/dev/vda2" changed
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: 1 physical volume(s) resized / 0 physical volume(s) not resized
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: INFO: Found an already configured thin pool /dev/mapper/rhel-docker--pool in /etc/sysconfig/docker-storage
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: INFO: Device node /dev/mapper/rhel-docker--pool exists.
May 25 20:42:41 qe-ghuang-master-nfs-1.novalocal runc[12600]: Logical volume rhel/docker-pool changed.

Comment 1 Giuseppe Scrivano 2017-05-29 11:22:07 UTC
this may be caused by using systemd notifications in the Docker container.  I've seen this issue when using an old version of runC, Atomic Host 7.3.4 already has an updated runC that supports sd-notify correctly.

I've tried also on Atomic Host 7.3.5 and it works fine for me.

What version of runc are you using?  What does "systemctl status container-engine" show?

Here I have:

$ runc --version
runc version 1.0.0-rc3
commit: cafb8d8755dc2b990fc73fbf7bff62f534da9219-dirty
spec: 1.0.0-rc5

$ systemctl status container-engine
● container-engine.service - Docker service
   Loaded: loaded (/etc/systemd/system/container-engine.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/container-engine.service.d
           └─custom.conf
   Active: active (running) since Mon 2017-05-29 10:44:38 UTC; 36min ago
 Main PID: 743 (runc)
   Memory: 109.5M

$ sudo atomic images list | grep container-engine
   brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhel7/container-engine   latest    a3ccc45997e0   2017-05-29 10:56                  ostree

Comment 2 Steve Milner 2017-05-30 13:18:07 UTC
It works for runc 1.0.0-rc5 as well. However, I believe the versions before 7.3 not have a version of runc which is new enough to handle some of the feature updates.

Comment 3 Steve Milner 2017-05-31 13:58:26 UTC
After talking with some folks it looks like this is a container issue that is being worked on. Moving the ownership over to jhonce so he can update as work progresses.

Comment 4 Steve Milner 2017-06-01 18:57:26 UTC
An updated container build is ready for testing.

Comment 6 Gan Huang 2017-06-05 08:26:03 UTC
container-engine service can be started successfully.

atomic-1.17.2-4.git2760e30.el7.x86_64
# atomic --version
1.17.1

runc-1.0.0-6.gite800860.el7.x86_64
# runc --version
runc version 1.0.0-rc3
commit: cafb8d8755dc2b990fc73fbf7bff62f534da9219-dirty
spec: 1.0.0-rc5

# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Wed May 17 01:16:44 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Wed May 17 01:16:44 2017
 OS/Arch:         linux/amd64

# atomic images list |grep container-
   brew-xxxx.redhat.com:8888/rhel7/container-engine   latest   0fd49accf210   2017-06-05 02:16                  ostree    

Moving to verified.

Comment 8 errata-xmlrpc 2017-08-10 05:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.