Bug 1310576

Summary: Sometime can't start atomic-openshift-master.service in container install env for docker1.9
Product: OpenShift Container Platform Reporter: DeShuai Ma <dma>
Component: ContainersAssignee: Jhon Honce <jhonce>
Status: CLOSED DEFERRED QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-23 20:31:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description DeShuai Ma 2016-02-22 09:51:22 UTC
Description of problem:
In container install env with docker1.9, sometime when restart atomic-openshift-master.service, it always failed with error "Could not find container for entity id <id>"

Version-Release number of selected component (if applicable):
openshift v3.1.1.904
kubernetes v1.2.0-alpha.7-703-gbc4550d
etcd 2.2.5
docker version: 1.9.1

How reproducible:
Sometime

Steps to Reproduce:
1.Restart atomic-openshift-master.service
$ systemctl restart atomic-openshift-master

error logs:
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: Failed to start atomic-openshift-master.service.
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: Unit atomic-openshift-master.service entered failed state.
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: atomic-openshift-master.service failed.
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: atomic-openshift-master.service holdoff time over, scheduling restart.
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: Starting atomic-openshift-master.service...
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com docker[16971]: Error response from daemon: no such id: atomic-openshift-master
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com docker[16971]: Error: failed to remove containers: [atomic-openshift-master]
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com docker[16976]: Error response from daemon: Could not find container for entity id 9c5f29934c463c4cd51b440b1c35cc3dcc276e06df3396369eabbbeebcc07c56
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: atomic-openshift-master.service: main process exited, code=exited, status=1/FAILURE
2.
3.

Actual results:
1.When this occur, after remove "/var/lib/docker/linkgraph.db", then restart master success.

Expected results:
1.Restart atomic-openshift-master.service success

Additional info:
Upstream related issue: 
https://github.com/docker/docker/issues/17691
https://github.com/kubernetes/kubernetes/issues/20904

Comment 1 Jhon Honce 2016-02-23 20:31:28 UTC
Fixed in Docker 1.10.2