Bug 1230321 - Upgrading atomic 7.1.3 initial docker runs always time-out then eventually pass
Summary: Upgrading atomic 7.1.3 initial docker runs always time-out then eventually pass
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Atomic
Classification: Retired
Component: docker-io
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Lokesh Mandvekar
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-10 16:03 UTC by Timothy St. Clair
Modified: 2016-07-28 19:46 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-28 19:46:53 UTC
RHEL 7.3 requirements from Atomic Host:
Embargoed:


Attachments (Terms of Use)

Description Timothy St. Clair 2015-06-10 16:03:11 UTC
Description of problem:
Upgraded atomic from 7.1.2->7.1.3 on after startup it takes several attempts to start docker. 

Version-Release number of selected component (if applicable):
# docker version
Client version: 1.6.2
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): ac7d43f/1.6.2
OS/Arch (client): linux/amd64
Server version: 1.6.2
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): ac7d43f/1.6.2
OS/Arch (server): linux/amd64


How reproducible:
100%

Steps to Reproduce:
1. atomic host upgrade
2. systemctl reboot
3. systemctl start docker 

Actual results:
Docker times out several times. 

Expected results:
Startup without incident.

Additional info:

Comment 1 Daniel Walsh 2015-06-10 16:59:13 UTC
Anything in the logs?  Could this have something to do with devicemapper?

Comment 2 Timothy St. Clair 2015-06-10 17:06:49 UTC
very little details in the logs.  

n 10 10:17:15 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Starting Docker Application Container Engine...
n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Unit docker.service entered failed state.
n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Failed to start Docker Application Container Engine.
n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com docker[12239]: time="2015-06-10T09:58:43-04:00" level=info msg="Received signal 'terminated', starting shutdown of docker..."
n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: docker.service operation timed out. Terminating.
n 10 09:57:13 host06-rack10.scale.openstack.engineering.redhat.com docker[12239]: time="2015-06-10T09:57:13-04:00" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
n 10 09:57:13 host06-rack10.scale.openstack.engineering.redhat.com docker[12239]: time="2015-06-10T09:57:13-04:00" level=info msg="+job serveapi(unix:///var/run/docker.sock)"
n 10 09:57:13 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Starting Docker Application Container Engine...
n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Unit docker.service entered failed state.
n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Failed to start Docker Application Container Engine.
n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com docker[11877]: time="2015-06-10T09:56:46-04:00" level=info msg="Received signal 'terminated', starting shutdown of docker..."
n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: docker.service operation timed out. Terminating.
n 10 09:55:16 host06-rack10.scale.openstack.engineering.redhat.com docker[11877]: time="2015-06-10T09:55:16-04:00" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
n 10 09:55:16 host06-rack10.scale.openstack.engineering.redhat.com docker[11877]: time="2015-06-10T09:55:16-04:00" level=info msg="+job serveapi(unix:///var/run/docker.sock)"
n 10 09:55:16 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Starting Docker Application Container Engine...
n 10 09:54:14 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Dependency failed for Docker Application Container Engine.

Comment 3 Vivek Goyal 2015-06-10 17:15:12 UTC
Does not look like this is something related to devicemapper. Looks like systemd does not think that docker started properly and terminates it.

May be docker has started but did not communicate back to systemd properly? If docker did not start properly, something should have been in the logs.

Comment 4 Daniel Walsh 2015-06-10 17:35:29 UTC
Is docker setup to do sd_notofy?  Might be timing out?

Comment 5 Timothy St. Clair 2015-06-11 21:46:38 UTC
In upgrading our cluster it always takes on the 4th try on every machine, which is slightly odd.

Comment 6 Daniel Walsh 2015-06-15 12:18:38 UTC
Martin can you confirm if you are seeing similar?

Comment 7 Lokesh Mandvekar 2015-06-16 18:59:54 UTC
is this rhel or fedora ??

Comment 8 Eric Paris 2015-06-16 19:10:45 UTC
I believe he was doing RHEL 7.1.2 to RHEL 7.1.3

Comment 9 Timothy St. Clair 2015-06-18 16:32:39 UTC
This is atomic 7.1.2 & 7.1.3.  

Update: 

This appears to be an ordering issue with flannel on startup, which I thought was  all working. 
 
If I `ip link delete docker0` + systemctl start flannel + systemctl start docker 

all is well.

Comment 10 Eric Paris 2015-06-18 16:37:05 UTC
do you have flannel 'enabled' ?

Comment 11 Timothy St. Clair 2015-06-18 16:53:48 UTC
flannel is not enabled.  

But I would not expect docker start to lag if it's not enabled.

Comment 12 Timothy St. Clair 2015-06-18 19:37:05 UTC
scratch comment #9, I'm still seeing it on other 7.1.3 machines.

Comment 13 Lokesh Mandvekar 2015-06-18 19:48:46 UTC
is this even after the recent compose (should have docker 1.6.2-14 build) ??

Comment 14 Timothy St. Clair 2015-06-18 20:20:50 UTC
yes, latest 7.1.3 release.

Comment 15 Daniel Walsh 2015-08-21 03:27:25 UTC
Any change with 7.1.4?

Comment 16 Daniel Walsh 2015-09-28 18:37:25 UTC
Should be fixed in docker-1.9

7.2.1 release.

Comment 17 Lokesh Mandvekar 2016-07-28 19:46:53 UTC
Closing this one..


Note You need to log in before you can comment on or make changes to this bug.