Description of problem: Some pods (using persistent volume) are not evicted after node goes NotReady Version-Release number of selected component (if applicable): v3.11.248+ How reproducible: every time. Steps to Reproduce: 1. deploy OCP 3.11 cluster with Monitoring stack (Prometheus, Alertmanager) using persistent volumes (eg. Gluster) 2. down node like hardware fail (loss of atomic-openshift-node and crio service) 3. some pods (eg. Prometheus) are stuck forever being deleted never evicted Actual results: Pod is marked for deletion but never terminates and PVC is not detached. Expected results: Pod terminates after "Termination Grace Period" and releases PVC so new pod can start on other node. Additional info: $ oc get pvc -n openshift-monitoring NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-994475a1-4037-11eb-a07c-fa163ee92557 256Mi RWO glusterfs-storage-block 24d alertmanager-main-db-alertmanager-main-1 Bound pvc-b7c424cc-4037-11eb-a07c-fa163ee92557 256Mi RWO glusterfs-storage-block 24d alertmanager-main-db-alertmanager-main-2 Bound pvc-cdfea613-4037-11eb-a07c-fa163ee92557 256Mi RWO glusterfs-storage-block 24d prometheus-k8s-db-prometheus-k8s-0 Bound pvc-a058ce9a-4036-11eb-a07c-fa163ee92557 2Gi RWO glusterfs-storage-block 24d prometheus-k8s-db-prometheus-k8s-1 Bound pvc-f7fec9d5-4036-11eb-a07c-fa163ee92557 2Gi RWO glusterfs-storage-block 24d After downing "infra-0" node 20mins ago: NAME READY STATUS RESTARTS AGE IP NODE alertmanager-main-0 3/3 Running 0 48m 10.131.0.17 node-0.example.com alertmanager-main-1 3/3 Running 0 24d 10.130.0.11 infra-1.example.com alertmanager-main-2 3/3 Running 0 17m 10.128.2.15 infra-2.example.com cluster-monitoring-operator-5bf785d58c-z46ns 1/1 Running 0 24d 10.130.0.9 infra-1.example.com grafana-65c68b58f5-fg8mq 2/2 Running 0 17m 10.128.2.14 infra-2.example.com grafana-65c68b58f5-w499j 2/2 Unknown 0 24d 10.129.0.6 infra-0.example.com kube-state-metrics-7ffb5dc8ff-989cp 3/3 Running 0 48m 10.131.0.16 node-0.example.com node-exporter-6xfb8 2/2 Running 0 24d 10.74.180.149 infra-1.example.com node-exporter-7jq98 2/2 NodeLost 0 24d 10.74.182.171 infra-0.example.com node-exporter-c92k7 2/2 Running 4 24d 10.74.178.75 node-0.example.com node-exporter-ch6jg 2/2 Running 6 24d 10.74.180.200 infra-2.example.com node-exporter-vttsp 2/2 Running 0 24d 10.74.183.60 master-0.example.com prometheus-k8s-0 4/4 Unknown 1 24d 10.129.0.7 infra-0.example.com prometheus-k8s-1 4/4 Running 1 24d 10.130.0.8 infra-1.example.com prometheus-operator-687c7dd6c4-djzgl 1/1 Unknown 0 24d 10.129.0.5 infra-0.example.com prometheus-operator-687c7dd6c4-hccsd 1/1 Running 0 17m 10.131.0.18 node-0.example.com master-controllers log shows alertmanager PVC forcefully detached after maxWaitForUnmountDuration 6m0s: I0110 23:44:48.359850 1 controller_utils.go:224] Recording status change NodeNotReady event message for node infra-0.example.com I0110 23:44:48.359916 1 controller_utils.go:141] Update ready status of pods on node [infra-0.example.com] I0110 23:46:18.841192 1 node_lifecycle_controller.go:651] Node is unresponsive. Adding Pods on Node infra-0.example.com to eviction queues: 2021-01-10 23:46:18.841162346 -0500 EST m=+2082764.641742551 is later than 2021-01-10 23:44:48.359817099 -0500 EST m=+2082674.160397328 + 1m30s I0110 23:46:18.941484 1 controller_utils.go:212] Recording Deleting all Pods from Node infra-0.example.com. event message for node infra-0.example.com W0110 23:46:19.335042 1 reconciler.go:327] Multi-Attach error for volume "pvc-cdfea613-4037-11eb-a07c-fa163ee92557" (UniqueName: "kubernetes.io/iscsi/10.74.180.200:iqn.2016-12.org.gluster-block:0345570f-eada-480d-a240-50a64ea9cf46:0") from node "infra-2.example.com" Volume is already exclusively attached to node infra-0.example.com and can't be attached to another W0110 23:52:19.224855 1 reconciler.go:231] attacherDetacher.DetachVolume started for volume "pvc-cdfea613-4037-11eb-a07c-fa163ee92557" (UniqueName: "kubernetes.io/iscsi/10.74.180.200:iqn.2016-12.org.gluster-block:0345570f-eada-480d-a240-50a64ea9cf46:0") on node "infra-0.example.com" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching I0110 23:52:19.224945 1 operation_generator.go:419] DetachVolume.Detach succeeded for volume "pvc-cdfea613-4037-11eb-a07c-fa163ee92557" (UniqueName: "kubernetes.io/iscsi/10.74.180.200:iqn.2016-12.org.gluster-block:0345570f-eada-480d-a240-50a64ea9cf46:0") on node "infra-0.example.com" [master-controllers log] I0110 23:44:48.636533 1 controller_utils.go:158] Updating ready status of pod prometheus-k8s-0 to false I0110 23:46:19.266664 1 controller_utils.go:95] Starting deletion of pod openshift-monitoring/prometheus-k8s-0 $ oc describe pod prometheus-k8s-0 Node: infra-0.example.com/10.74.182.171 Start Time: Thu, 17 Dec 2020 02:13:56 -0500 Status: Terminating (lasts 4m) Termination Grace Period: 600s Reason: NodeLost Message: Node infra-0.example.com which was running pod prometheus-k8s-0 is unresponsive IP: 10.129.0.7 Controlled By: StatefulSet/prometheus-k8s
This has been fixed in 4.x. I talked to the engineer about backporting the patch into 3.11. The patch itself has a dependency on Golang, and 3.11 is getting mostly only security patches. The patch complexity is too large to place it into 3.11 and an upgrade to 4.x is advisable. The issue is with HTTP/2 timeouts. Stale http/2 sockets can be re-used and placed back into the connection pool. *** This bug has been marked as a duplicate of bug 1901208 ***