Description of problem: Upgraded atomic from 7.1.2->7.1.3 on after startup it takes several attempts to start docker. Version-Release number of selected component (if applicable): # docker version Client version: 1.6.2 Client API version: 1.18 Go version (client): go1.4.2 Git commit (client): ac7d43f/1.6.2 OS/Arch (client): linux/amd64 Server version: 1.6.2 Server API version: 1.18 Go version (server): go1.4.2 Git commit (server): ac7d43f/1.6.2 OS/Arch (server): linux/amd64 How reproducible: 100% Steps to Reproduce: 1. atomic host upgrade 2. systemctl reboot 3. systemctl start docker Actual results: Docker times out several times. Expected results: Startup without incident. Additional info:
Anything in the logs? Could this have something to do with devicemapper?
very little details in the logs. n 10 10:17:15 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Starting Docker Application Container Engine... n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Unit docker.service entered failed state. n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Failed to start Docker Application Container Engine. n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com docker[12239]: time="2015-06-10T09:58:43-04:00" level=info msg="Received signal 'terminated', starting shutdown of docker..." n 10 09:58:43 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: docker.service operation timed out. Terminating. n 10 09:57:13 host06-rack10.scale.openstack.engineering.redhat.com docker[12239]: time="2015-06-10T09:57:13-04:00" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)" n 10 09:57:13 host06-rack10.scale.openstack.engineering.redhat.com docker[12239]: time="2015-06-10T09:57:13-04:00" level=info msg="+job serveapi(unix:///var/run/docker.sock)" n 10 09:57:13 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Starting Docker Application Container Engine... n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Unit docker.service entered failed state. n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Failed to start Docker Application Container Engine. n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com docker[11877]: time="2015-06-10T09:56:46-04:00" level=info msg="Received signal 'terminated', starting shutdown of docker..." n 10 09:56:46 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: docker.service operation timed out. Terminating. n 10 09:55:16 host06-rack10.scale.openstack.engineering.redhat.com docker[11877]: time="2015-06-10T09:55:16-04:00" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)" n 10 09:55:16 host06-rack10.scale.openstack.engineering.redhat.com docker[11877]: time="2015-06-10T09:55:16-04:00" level=info msg="+job serveapi(unix:///var/run/docker.sock)" n 10 09:55:16 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Starting Docker Application Container Engine... n 10 09:54:14 host06-rack10.scale.openstack.engineering.redhat.com systemd[1]: Dependency failed for Docker Application Container Engine.
Does not look like this is something related to devicemapper. Looks like systemd does not think that docker started properly and terminates it. May be docker has started but did not communicate back to systemd properly? If docker did not start properly, something should have been in the logs.
Is docker setup to do sd_notofy? Might be timing out?
In upgrading our cluster it always takes on the 4th try on every machine, which is slightly odd.
Martin can you confirm if you are seeing similar?
is this rhel or fedora ??
I believe he was doing RHEL 7.1.2 to RHEL 7.1.3
This is atomic 7.1.2 & 7.1.3. Update: This appears to be an ordering issue with flannel on startup, which I thought was all working. If I `ip link delete docker0` + systemctl start flannel + systemctl start docker all is well.
do you have flannel 'enabled' ?
flannel is not enabled. But I would not expect docker start to lag if it's not enabled.
scratch comment #9, I'm still seeing it on other 7.1.3 machines.
is this even after the recent compose (should have docker 1.6.2-14 build) ??
yes, latest 7.1.3 release.
Any change with 7.1.4?
Should be fixed in docker-1.9 7.2.1 release.
Closing this one..