1364176 – Pod stuck in terminating state

Bug 1364176 - Pod stuck in terminating state

Summary: Pod stuck in terminating state

Keywords:
Status:	CLOSED DUPLICATE of bug 1362109
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Derek Carr
QA Contact:	DeShuai Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-04 15:51 UTC by Vikas Laad
Modified:	2016-08-12 15:25 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-12 15:25:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
docker logs (649.54 KB, application/x-gzip) 2016-08-04 15:51 UTC, Vikas Laad	no flags	Details
View All

Description Vikas Laad 2016-08-04 15:51:23 UTC

Created attachment 1187572 [details]
docker logs

Description of problem:
Ran a master vertical on 300 node cluster with 1000 projects. While deleting the projects one of the pods got stuck in Terminating state. Docker is not responding on the node where pod was created.

root@300node-support-2: ~ # oc get pods --all-namespaces
NAMESPACE           NAME                          READY     STATUS        RESTARTS   AGE
clusterproject266   deploymentconfig2v0-1-9us8s   1/1       Terminating   0          21h
default             docker-registry-1-340k3       1/1       Running       3          6d
default             docker-registry-1-v65wn       1/1       Running       2          6d
default             router-1-pd1vs                1/1       Running       3          6d
default             router-1-ug67u                1/1       Running       2          6d

root@300node-support-2: ~ # oc get projects 
NAME                DISPLAY NAME   STATUS
default                            Active
kube-system                        Active
logging                            Active
management-infra                   Active
openshift                          Active
openshift-infra                    Active
test                               Active
clusterproject266                  Terminating

Erros on the node where pod was initially created
Aug  4 10:48:17 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:48:17 300node-node-263 atomic-openshift-node: E0804 10:48:17.930969    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:
Aug  4 10:48:32 300node-node-263 atomic-openshift-node: I0804 10:48:32.946495    2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool
Aug  4 10:48:32 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:48:32 300node-node-263 atomic-openshift-node: E0804 10:48:32.955184    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:
Aug  4 10:48:36 300node-node-263 atomic-openshift-node: I0804 10:48:36.177548    2402 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Aug  4 10:48:45 300node-node-263 atomic-openshift-node: I0804 10:48:45.629315    2402 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Aug  4 10:48:47 300node-node-263 atomic-openshift-node: I0804 10:48:47.966770    2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool
Aug  4 10:48:47 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:48:47 300node-node-263 atomic-openshift-node: E0804 10:48:47.975480    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:
Aug  4 10:49:02 300node-node-263 atomic-openshift-node: I0804 10:49:02.987081    2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool
Aug  4 10:49:02 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:49:02 300node-node-263 atomic-openshift-node: E0804 10:49:02.996018    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:
Aug  4 10:49:04 300node-node-263 atomic-openshift-node: E0804 10:49:04.097705    2402 kubelet.go:2264] Error listing containers: dockertools.operationTimeout{err:(*errors.errorString)(0xc82000e7a0)}
Aug  4 10:49:04 300node-node-263 atomic-openshift-node: E0804 10:49:04.097733    2402 kubelet.go:2611] Failed cleaning pods: operation timeout: context deadline exceeded
Aug  4 10:49:08 300node-node-263 atomic-openshift-node: E0804 10:49:08.308619    2402 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Aug  4 10:49:18 300node-node-263 atomic-openshift-node: I0804 10:49:18.007706    2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool
Aug  4 10:49:18 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:49:18 300node-node-263 atomic-openshift-node: E0804 10:49:18.016503    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:
Aug  4 10:49:19 300node-node-263 atomic-openshift-node: E0804 10:49:19.603278    2402 desired_state_of_world_populator.go:162] kubeContainerRuntime.findAndRemoveDeletedPods returned error operation timeout: context deadline exceeded.
Aug  4 10:49:33 300node-node-263 atomic-openshift-node: I0804 10:49:33.027995    2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool
Aug  4 10:49:33 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:49:33 300node-node-263 atomic-openshift-node: E0804 10:49:33.036797    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:

Version-Release number of selected component (if applicable):
openshift v3.3.0.10
kubernetes v1.3.0+57fb9ac
etcd 2.3.0+git

How reproducible:
Difficult to reproduce

Steps to Reproduce:
1. Create 1000 projects using cluster loader for Master Vertical testing
2. Delete projects
3. Project/Pod stuck in terminating state

Actual results:


Expected results:


Additional info:

Comment 1 Jhon Honce 2016-08-04 15:54:24 UTC

Which docker rpm is installed on this system?

Comment 2 Vikas Laad 2016-08-04 16:36:15 UTC

docker.x86_64 1.10.3-46.el7.8

Comment 3 Andy Goldstein 2016-08-04 20:05:39 UTC

Moving to kubernetes because I doubt it's a docker issue (other than the fact that https://bugzilla.redhat.com/show_bug.cgi?id=1362109 corrupted the thin pool and Docker can't start).

Comment 4 Derek Carr 2016-08-12 15:25:17 UTC

Given the corrupted lvm, the kubelet would have no way of acknowledging the container as dead unless the docker daemon itself was restarted (which it could not because of the lvm).  I am marking this as a duplicate of the original issue: 1362109

*** This bug has been marked as a duplicate of bug 1362109 ***

Note You need to log in before you can comment on or make changes to this bug.