Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1364176

Summary: Pod stuck in terminating state
Product: OpenShift Container Platform Reporter: Vikas Laad <vlaad>
Component: NodeAssignee: Derek Carr <decarr>
Status: CLOSED DUPLICATE QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: agoldste, aos-bugs, jokerman, mmccomas, vlaad
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-12 15:25:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
docker logs none

Description Vikas Laad 2016-08-04 15:51:23 UTC
Created attachment 1187572 [details]
docker logs

Description of problem:
Ran a master vertical on 300 node cluster with 1000 projects. While deleting the projects one of the pods got stuck in Terminating state. Docker is not responding on the node where pod was created.

root@300node-support-2: ~ # oc get pods --all-namespaces
NAMESPACE           NAME                          READY     STATUS        RESTARTS   AGE
clusterproject266   deploymentconfig2v0-1-9us8s   1/1       Terminating   0          21h
default             docker-registry-1-340k3       1/1       Running       3          6d
default             docker-registry-1-v65wn       1/1       Running       2          6d
default             router-1-pd1vs                1/1       Running       3          6d
default             router-1-ug67u                1/1       Running       2          6d

root@300node-support-2: ~ # oc get projects 
NAME                DISPLAY NAME   STATUS
default                            Active
kube-system                        Active
logging                            Active
management-infra                   Active
openshift                          Active
openshift-infra                    Active
test                               Active
clusterproject266                  Terminating

Erros on the node where pod was initially created
Aug  4 10:48:17 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:48:17 300node-node-263 atomic-openshift-node: E0804 10:48:17.930969    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:
Aug  4 10:48:32 300node-node-263 atomic-openshift-node: I0804 10:48:32.946495    2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool
Aug  4 10:48:32 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:48:32 300node-node-263 atomic-openshift-node: E0804 10:48:32.955184    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:
Aug  4 10:48:36 300node-node-263 atomic-openshift-node: I0804 10:48:36.177548    2402 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Aug  4 10:48:45 300node-node-263 atomic-openshift-node: I0804 10:48:45.629315    2402 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Aug  4 10:48:47 300node-node-263 atomic-openshift-node: I0804 10:48:47.966770    2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool
Aug  4 10:48:47 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:48:47 300node-node-263 atomic-openshift-node: E0804 10:48:47.975480    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:
Aug  4 10:49:02 300node-node-263 atomic-openshift-node: I0804 10:49:02.987081    2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool
Aug  4 10:49:02 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:49:02 300node-node-263 atomic-openshift-node: E0804 10:49:02.996018    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:
Aug  4 10:49:04 300node-node-263 atomic-openshift-node: E0804 10:49:04.097705    2402 kubelet.go:2264] Error listing containers: dockertools.operationTimeout{err:(*errors.errorString)(0xc82000e7a0)}
Aug  4 10:49:04 300node-node-263 atomic-openshift-node: E0804 10:49:04.097733    2402 kubelet.go:2611] Failed cleaning pods: operation timeout: context deadline exceeded
Aug  4 10:49:08 300node-node-263 atomic-openshift-node: E0804 10:49:08.308619    2402 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Aug  4 10:49:18 300node-node-263 atomic-openshift-node: I0804 10:49:18.007706    2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool
Aug  4 10:49:18 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:49:18 300node-node-263 atomic-openshift-node: E0804 10:49:18.016503    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:
Aug  4 10:49:19 300node-node-263 atomic-openshift-node: E0804 10:49:19.603278    2402 desired_state_of_world_populator.go:162] kubeContainerRuntime.findAndRemoveDeletedPods returned error operation timeout: context deadline exceeded.
Aug  4 10:49:33 300node-node-263 atomic-openshift-node: I0804 10:49:33.027995    2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool
Aug  4 10:49:33 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode
Aug  4 10:49:33 300node-node-263 atomic-openshift-node: E0804 10:49:33.036797    2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output:

Version-Release number of selected component (if applicable):
openshift v3.3.0.10
kubernetes v1.3.0+57fb9ac
etcd 2.3.0+git

How reproducible:
Difficult to reproduce

Steps to Reproduce:
1. Create 1000 projects using cluster loader for Master Vertical testing
2. Delete projects
3. Project/Pod stuck in terminating state

Actual results:


Expected results:


Additional info:

Comment 1 Jhon Honce 2016-08-04 15:54:24 UTC
Which docker rpm is installed on this system?

Comment 2 Vikas Laad 2016-08-04 16:36:15 UTC
docker.x86_64 1.10.3-46.el7.8

Comment 3 Andy Goldstein 2016-08-04 20:05:39 UTC
Moving to kubernetes because I doubt it's a docker issue (other than the fact that https://bugzilla.redhat.com/show_bug.cgi?id=1362109 corrupted the thin pool and Docker can't start).

Comment 4 Derek Carr 2016-08-12 15:25:17 UTC
Given the corrupted lvm, the kubelet would have no way of acknowledging the container as dead unless the docker daemon itself was restarted (which it could not because of the lvm).  I am marking this as a duplicate of the original issue: 1362109

*** This bug has been marked as a duplicate of bug 1362109 ***