Bug 1230321

Summary: Upgrading atomic 7.1.3 initial docker runs always time-out then eventually pass
Product: [Retired] Atomic Reporter: Timothy St. Clair <tstclair>
Component: docker-ioAssignee: Lokesh Mandvekar <lsm5>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: dwalsh, eparis, lsu, miabbott, mjenner, tstclair, vgoyal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-28 19:46:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Timothy St. Clair 2015-06-10 16:03:11 UTC
Description of problem:
Upgraded atomic from 7.1.2->7.1.3 on after startup it takes several attempts to start docker. 

Version-Release number of selected component (if applicable):
# docker version
Client version: 1.6.2
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): ac7d43f/1.6.2
OS/Arch (client): linux/amd64
Server version: 1.6.2
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): ac7d43f/1.6.2
OS/Arch (server): linux/amd64


How reproducible:
100%

Steps to Reproduce:
1. atomic host upgrade
2. systemctl reboot
3. systemctl start docker 

Actual results:
Docker times out several times. 

Expected results:
Startup without incident.

Additional info:

Comment 1 Daniel Walsh 2015-06-10 16:59:13 UTC
Anything in the logs?  Could this have something to do with devicemapper?

Comment 2 Timothy St. Clair 2015-06-10 17:06:49 UTC
very little details in the logs.  

n 10 10:17:15 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Starting Docker Application Container Engine...
n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Unit docker.service entered failed state.
n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Failed to start Docker Application Container Engine.
n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com docker[12239]: time="2015-06-10T09:58:43-04:00" level=info msg="Received signal 'terminated', starting shutdown of docker..."
n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: docker.service operation timed out. Terminating.
n 10 09:57:13 host06-rack10.scale.openstack.engineering.redhat.com docker[12239]: time="2015-06-10T09:57:13-04:00" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
n 10 09:57:13 host06-rack10.scale.openstack.engineering.redhat.com docker[12239]: time="2015-06-10T09:57:13-04:00" level=info msg="+job serveapi(unix:///var/run/docker.sock)"
n 10 09:57:13 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Starting Docker Application Container Engine...
n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Unit docker.service entered failed state.
n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Failed to start Docker Application Container Engine.
n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com docker[11877]: time="2015-06-10T09:56:46-04:00" level=info msg="Received signal 'terminated', starting shutdown of docker..."
n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: docker.service operation timed out. Terminating.
n 10 09:55:16 host06-rack10.scale.openstack.engineering.redhat.com docker[11877]: time="2015-06-10T09:55:16-04:00" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
n 10 09:55:16 host06-rack10.scale.openstack.engineering.redhat.com docker[11877]: time="2015-06-10T09:55:16-04:00" level=info msg="+job serveapi(unix:///var/run/docker.sock)"
n 10 09:55:16 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Starting Docker Application Container Engine...
n 10 09:54:14 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Dependency failed for Docker Application Container Engine.

Comment 3 Vivek Goyal 2015-06-10 17:15:12 UTC
Does not look like this is something related to devicemapper. Looks like systemd does not think that docker started properly and terminates it.

May be docker has started but did not communicate back to systemd properly? If docker did not start properly, something should have been in the logs.

Comment 4 Daniel Walsh 2015-06-10 17:35:29 UTC
Is docker setup to do sd_notofy?  Might be timing out?

Comment 5 Timothy St. Clair 2015-06-11 21:46:38 UTC
In upgrading our cluster it always takes on the 4th try on every machine, which is slightly odd.

Comment 6 Daniel Walsh 2015-06-15 12:18:38 UTC
Martin can you confirm if you are seeing similar?

Comment 7 Lokesh Mandvekar 2015-06-16 18:59:54 UTC
is this rhel or fedora ??

Comment 8 Eric Paris 2015-06-16 19:10:45 UTC
I believe he was doing RHEL 7.1.2 to RHEL 7.1.3

Comment 9 Timothy St. Clair 2015-06-18 16:32:39 UTC
This is atomic 7.1.2 & 7.1.3.  

Update: 

This appears to be an ordering issue with flannel on startup, which I thought was  all working. 
 
If I `ip link delete docker0` + systemctl start flannel + systemctl start docker 

all is well.

Comment 10 Eric Paris 2015-06-18 16:37:05 UTC
do you have flannel 'enabled' ?

Comment 11 Timothy St. Clair 2015-06-18 16:53:48 UTC
flannel is not enabled.  

But I would not expect docker start to lag if it's not enabled.

Comment 12 Timothy St. Clair 2015-06-18 19:37:05 UTC
scratch comment #9, I'm still seeing it on other 7.1.3 machines.

Comment 13 Lokesh Mandvekar 2015-06-18 19:48:46 UTC
is this even after the recent compose (should have docker 1.6.2-14 build) ??

Comment 14 Timothy St. Clair 2015-06-18 20:20:50 UTC
yes, latest 7.1.3 release.

Comment 15 Daniel Walsh 2015-08-21 03:27:25 UTC
Any change with 7.1.4?

Comment 16 Daniel Walsh 2015-09-28 18:37:25 UTC
Should be fixed in docker-1.9

7.2.1 release.

Comment 17 Lokesh Mandvekar 2016-07-28 19:46:53 UTC
Closing this one..