Created attachment 1187572 [details] docker logs Description of problem: Ran a master vertical on 300 node cluster with 1000 projects. While deleting the projects one of the pods got stuck in Terminating state. Docker is not responding on the node where pod was created. root@300node-support-2: ~ # oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE clusterproject266 deploymentconfig2v0-1-9us8s 1/1 Terminating 0 21h default docker-registry-1-340k3 1/1 Running 3 6d default docker-registry-1-v65wn 1/1 Running 2 6d default router-1-pd1vs 1/1 Running 3 6d default router-1-ug67u 1/1 Running 2 6d root@300node-support-2: ~ # oc get projects NAME DISPLAY NAME STATUS default Active kube-system Active logging Active management-infra Active openshift Active openshift-infra Active test Active clusterproject266 Terminating Erros on the node where pod was initially created Aug 4 10:48:17 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode Aug 4 10:48:17 300node-node-263 atomic-openshift-node: E0804 10:48:17.930969 2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output: Aug 4 10:48:32 300node-node-263 atomic-openshift-node: I0804 10:48:32.946495 2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool Aug 4 10:48:32 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode Aug 4 10:48:32 300node-node-263 atomic-openshift-node: E0804 10:48:32.955184 2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output: Aug 4 10:48:36 300node-node-263 atomic-openshift-node: I0804 10:48:36.177548 2402 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF Aug 4 10:48:45 300node-node-263 atomic-openshift-node: I0804 10:48:45.629315 2402 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF Aug 4 10:48:47 300node-node-263 atomic-openshift-node: I0804 10:48:47.966770 2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool Aug 4 10:48:47 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode Aug 4 10:48:47 300node-node-263 atomic-openshift-node: E0804 10:48:47.975480 2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output: Aug 4 10:49:02 300node-node-263 atomic-openshift-node: I0804 10:49:02.987081 2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool Aug 4 10:49:02 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode Aug 4 10:49:02 300node-node-263 atomic-openshift-node: E0804 10:49:02.996018 2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output: Aug 4 10:49:04 300node-node-263 atomic-openshift-node: E0804 10:49:04.097705 2402 kubelet.go:2264] Error listing containers: dockertools.operationTimeout{err:(*errors.errorString)(0xc82000e7a0)} Aug 4 10:49:04 300node-node-263 atomic-openshift-node: E0804 10:49:04.097733 2402 kubelet.go:2611] Failed cleaning pods: operation timeout: context deadline exceeded Aug 4 10:49:08 300node-node-263 atomic-openshift-node: E0804 10:49:08.308619 2402 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded Aug 4 10:49:18 300node-node-263 atomic-openshift-node: I0804 10:49:18.007706 2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool Aug 4 10:49:18 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode Aug 4 10:49:18 300node-node-263 atomic-openshift-node: E0804 10:49:18.016503 2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output: Aug 4 10:49:19 300node-node-263 atomic-openshift-node: E0804 10:49:19.603278 2402 desired_state_of_world_populator.go:162] kubeContainerRuntime.findAndRemoveDeletedPods returned error operation timeout: context deadline exceeded. Aug 4 10:49:33 300node-node-263 atomic-openshift-node: I0804 10:49:33.027995 2402 thin_pool_watcher.go:126] reserving metadata snapshot for thin-pool docker_vg-docker--pool Aug 4 10:49:33 300node-node-263 kernel: device-mapper: thin: 253:2: unable to service pool target messages in READ_ONLY or FAIL mode Aug 4 10:49:33 300node-node-263 atomic-openshift-node: E0804 10:49:33.036797 2402 thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error reserving metadata for thin-pool docker_vg-docker--pool: exit status 1 output: Version-Release number of selected component (if applicable): openshift v3.3.0.10 kubernetes v1.3.0+57fb9ac etcd 2.3.0+git How reproducible: Difficult to reproduce Steps to Reproduce: 1. Create 1000 projects using cluster loader for Master Vertical testing 2. Delete projects 3. Project/Pod stuck in terminating state Actual results: Expected results: Additional info:
Which docker rpm is installed on this system?
docker.x86_64 1.10.3-46.el7.8
Moving to kubernetes because I doubt it's a docker issue (other than the fact that https://bugzilla.redhat.com/show_bug.cgi?id=1362109 corrupted the thin pool and Docker can't start).
Given the corrupted lvm, the kubelet would have no way of acknowledging the container as dead unless the docker daemon itself was restarted (which it could not because of the lvm). I am marking this as a duplicate of the original issue: 1362109 *** This bug has been marked as a duplicate of bug 1362109 ***