Bug 1310576 - Sometime can't start atomic-openshift-master.service in container install env for docker1.9
Summary: Sometime can't start atomic-openshift-master.service in container install env...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Jhon Honce
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-22 09:51 UTC by DeShuai Ma
Modified: 2016-02-23 20:31 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-23 20:31:28 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description DeShuai Ma 2016-02-22 09:51:22 UTC
Description of problem:
In container install env with docker1.9, sometime when restart atomic-openshift-master.service, it always failed with error "Could not find container for entity id <id>"

Version-Release number of selected component (if applicable):
openshift v3.1.1.904
kubernetes v1.2.0-alpha.7-703-gbc4550d
etcd 2.2.5
docker version: 1.9.1

How reproducible:
Sometime

Steps to Reproduce:
1.Restart atomic-openshift-master.service
$ systemctl restart atomic-openshift-master

error logs:
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: Failed to start atomic-openshift-master.service.
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: Unit atomic-openshift-master.service entered failed state.
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: atomic-openshift-master.service failed.
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: atomic-openshift-master.service holdoff time over, scheduling restart.
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: Starting atomic-openshift-master.service...
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com docker[16971]: Error response from daemon: no such id: atomic-openshift-master
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com docker[16971]: Error: failed to remove containers: [atomic-openshift-master]
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com docker[16976]: Error response from daemon: Could not find container for entity id 9c5f29934c463c4cd51b440b1c35cc3dcc276e06df3396369eabbbeebcc07c56
Feb 22 17:10:34 openshift-135.lab.sjc.redhat.com systemd[1]: atomic-openshift-master.service: main process exited, code=exited, status=1/FAILURE
2.
3.

Actual results:
1.When this occur, after remove "/var/lib/docker/linkgraph.db", then restart master success.

Expected results:
1.Restart atomic-openshift-master.service success

Additional info:
Upstream related issue: 
https://github.com/docker/docker/issues/17691
https://github.com/kubernetes/kubernetes/issues/20904

Comment 1 Jhon Honce 2016-02-23 20:31:28 UTC
Fixed in Docker 1.10.2


Note You need to log in before you can comment on or make changes to this bug.