Bug 1954509

Summary:	FC volume is marked as unmounted after failed reconstruction
Product:	OpenShift Container Platform	Reporter:	Juan Luis de Sousa-Valadas <jdesousa>
Component:	Storage	Assignee:	Jan Safranek <jsafrane>
Storage sub component:	Kubernetes	QA Contact:	Chao Yang <chaoyang>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	medium	CC:	aos-bugs, chaoyang, jsafrane, piqin, rbdiri, rpalco
Version:	3.11.0
Target Milestone:	---
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: FibreChannel were not correctly unmounted from a node when a pod was deleted while a pod that used the volume was deleted in the API server while kubelet on the node was not running, e.g. because of a crash. Consequence: While the volume is still mounted on the original node, application pods may be scheduled to another node and the volume may be mounted there. Having a single volume mounted on two nodes corrupts the filesystem on the volume. Fix: FibreChannel volumes are unmounted when a new kubelet starts. At the same time, the volume cannot be mounted to another node until the new kubelet fully starts and confirms the volume is unmounted. Result: FibreChannel volumes are not corrupted.	Story Points:	---
Clone Of:
Clones:	1970977 (view as bug list)		Environment:
Last Closed:	2021-07-27 23:04:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1970977

Description Juan Luis de Sousa-Valadas 2021-04-28 10:04:43 UTC

Description of problem:

Version-Release number of selected component (if applicable):
3.11.420 but may happen in newer versions

How reproducible:
It's not very likely to happen in a real environment, but can be triggered easily.


Steps to Reproduce:
1. Create a pod with a read write once volume attached that doesn't have any kind of locking mechanism (such as fiber channel with XFS). I used the mysql-persistent template for simplicity.
2. The customer had this with slow writes enqueued but I reproduced this simply having a file descriptor open. That is
# touch /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/fc/50050768030539b6-lun-0/my-file

# python
And in the interactive shell:
fd = open /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/fc/50050768030539b6-lun-0/my-file

And leave the  process open

3. oc delete pod <pod name>

Example:
[root@hp-dl388g9-06 ~]# date; oc get  pod mysql-7-k5xn9 -o wide ; date; oc get pod -o wide
Wed Apr 28 17:44:03 CST 2021

NAME            READY     STATUS    RESTARTS   AGE       IP            NODE                                     NOMINATED NODE
mysql-7-k5xn9   1/1       Running   0          17m       10.128.0.29   hp-dl388g9-06.rhts.eng.pek2.redhat.com   <none>

pod "mysql-7-k5xn9" deleted

Wed Apr 28 17:44:14 CST 2021

NAME             READY     STATUS    RESTARTS   AGE       IP           NODE                                     NOMINATED NODE
mysql-7-l7n2g    0/1       Pending   0          0s       <none>       <none>                                   <none>

I added a few blank lines to make it readable but the rest of the output is not altered

Actual results:
Pod is deleted as soon as the container exits but before the volume is not mounted.

Expected results:
The pod shouldn't be deleted until the volume is unmounted or a sane timeout

Master Log:
Not relevant

Node Log (of failed PODs):
The output is huge so I'm snipping it
[root@hp-dl388g9-06 ~]# journalctl -u atomic-openshift-node --since '17:45:28'| grep pv0001 
<snip>
Apr 28 17:46:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: E0428 17:46:12.738027    4457 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/fc/[50050768030539b6]:0\"" failed. No retries permitted until 2021-04-28 17:48:14.737996265 +0800 CST m=+86963.949554690 (durationBeforeRetry 2m2s). Error: "UnmountDevice failed for volume \"pv0001\" (UniqueName: \"kubernetes.io/fc/[50050768030539b6]:0\") on node \"hp-dl388g9-06.rhts.eng.pek2.redhat.com\" : fc: failed to unmount: /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/fc/50050768030539b6-lun-0\nError: Unmount failed: exit status 32\nUnmounting arguments: /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/fc/50050768030539b6-lun-0\nOutput: umount: /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/fc/50050768030539b6-lun-0: target is busy.\n        (In some cases useful info about processes that use\n         the device is found by lsof(8) or fuser(1))\n\n"
Apr 28 17:46:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:46:12.833930    4457 reconciler.go:283] Starting operationExecutor.UnmountDevice for volume "pv0001" (UniqueName: "kubernetes.io/fc/[50050768030539b6]:0") on node "hp-dl388g9-06.rhts.eng.pek2.redhat.com"
Apr 28 17:46:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:46:12.934315    4457 reconciler.go:283] Starting operationExecutor.UnmountDevice for volume "pv0001" (UniqueName: "kubernetes.io/fc/[50050768030539b6]:0") on node "hp-dl388g9-06.rhts.eng.pek2.redhat.com"
<snip>
Apr 28 17:48:14 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: E0428 17:48:14.801517    4457 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/fc/[50050768030539b6]:0\"" failed. No retries permitted until 2021-04-28 17:50:16.801484201 +0800 CST m=+87086.013042638 (durationBeforeRetry 2m2s). Error: "UnmountDevice failed for volume \"pv0001\" (UniqueName: \"kubernetes.io/fc/[50050768030539b6]:0\") on node \"hp-dl388g9-06.rhts.eng.pek2.redhat.com\" : fc: failed to unmount: /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/fc/50050768030539b6-lun-0\nError: Unmount failed: exit status 32\nUnmounting arguments: /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/fc/50050768030539b6-lun-0\nOutput: umount: /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/fc/50050768030539b6-lun-0: target is busy.\n        (In some cases useful info about processes that use\n         the device is found by lsof(8) or fuser(1))\n\n"
<snip>


Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.556917    4457 status_manager.go:496] Patch status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" with "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2021-04-28T09:44:04Z\",\"message\":\"containers with unready status: [mysql]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"message\":\"containers with unready status: [mysql]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26\",\"image\":\"registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest\",\"imageID\":\"docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f\",\"lastState\":{},\"name\":\"mysql\",\"ready\":false,\"restartCount\":0,\"state\":{\"terminated\":{\"containerID\":\"docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26\",\"exitCode\":0,\"finishedAt\":\"2021-04-28T09:44:04Z\",\"reason\":\"Completed\",\"startedAt\":\"2021-04-28T09:27:37Z\"}}}]}}"
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.557014    4457 status_manager.go:503] Status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" updated successfully: (4, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:44:04 +0800 CST Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.73.131.115 PodIP:10.128.0.29 StartTime:2021-04-28 17:26:09 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:mysql State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2021-04-28 17:27:37 +0800 CST,FinishedAt:2021-04-28 17:44:04 +0800 CST,ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest ImageID:docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26}] QOSClass:Burstable})
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.557192    4457 kubelet_pods.go:914] Pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" is terminated, but some containers have not been cleaned up: {ID:{Type:docker ID:538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26} Name:mysql State:exited CreatedAt:2021-04-28 17:27:37.080195722 +0800 CST StartedAt:2021-04-28 17:27:37.245527103 +0800 CST FinishedAt:2021-04-28 17:44:04.045809505 +0800 CST ExitCode:0 Image:registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest ImageID:docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f Hash:239154134 RestartCount:0 Reason:Completed Message:}
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.558108    4457 kubelet.go:1833] SyncLoop (RECONCILE, "api"): "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.628232    4457 desired_state_of_world_populator.go:267] Removing volume from desired state for volume "default-token-fczng" (UniqueName: "kubernetes.io/secret/c44b81be-a803-11eb-a354-30e1715ed4a4-default-token-fczng") pod "mysql-7-k5xn9" (UID: "c44b81be-a803-11eb-a354-30e1715ed4a4")
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.628270    4457 desired_state_of_world_populator.go:267] Removing volume from desired state for volume "pv0001" (UniqueName: "kubernetes.io/fc/[50050768030539b6]:0") pod "mysql-7-k5xn9" (UID: "c44b81be-a803-11eb-a354-30e1715ed4a4")
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.565171    4457 kuberuntime_manager.go:852] getSandboxIDByPodUID got sandbox IDs ["ef783082928c6e529b2d56f2b91c5d3736d08c4f43201f33792a33a6147d2e69"] for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.568335    4457 generic.go:380] PLEG: Write status for mysql-7-k5xn9/valadas: &container.PodStatus{ID:"c44b81be-a803-11eb-a354-30e1715ed4a4", Name:"mysql-7-k5xn9", Namespace:"valadas", IP:"", ContainerStatuses:[]*container.ContainerStatus{}, SandboxStatuses:[]*v1alpha2.PodSandboxStatus{(*v1alpha2.PodSandboxStatus)(0xc422392c80)}} (err: <nil>)
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.568512    4457 kubelet_pods.go:1331] Generating status for "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.575873    4457 status_manager.go:496] Patch status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" with "{}"
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.575894    4457 status_manager.go:503] Status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" updated successfully: (5, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:44:04 +0800 CST Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.73.131.115 PodIP:10.128.0.29 StartTime:2021-04-28 17:26:09 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:mysql State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2021-04-28 17:27:37 +0800 CST,FinishedAt:2021-04-28 17:44:04 +0800 CST,ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest ImageID:docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26}] QOSClass:Burstable})
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.576217    4457 kubelet_pods.go:925] Pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" is terminated, but pod cgroup sandbox has not been cleaned up
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.116713    4457 status_manager.go:496] Patch status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" with "{}"
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.116740    4457 status_manager.go:503] Status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" updated successfully: (5, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:44:04 +0800 CST Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.73.131.115 PodIP:10.128.0.29 StartTime:2021-04-28 17:26:09 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:mysql State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2021-04-28 17:27:37 +0800 CST,FinishedAt:2021-04-28 17:44:04 +0800 CST,ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest ImageID:docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26}] QOSClass:Burstable})
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.123034    4457 status_manager.go:516] Pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" fully terminated and removed from etcd
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.125265    4457 kubelet.go:1836] SyncLoop (DELETE, "api"): "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.125467    4457 kubelet.go:1830] SyncLoop (REMOVE, "api"): "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.134648    4457 kubelet.go:2032] Failed to delete pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)", err: pod not found
Apr 28 17:44:52 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:52.519586    4457 manager.go:1054] Destroyed container: "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc44b81be_a803_11eb_a354_30e1715ed4a4.slice/docker-538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26.scope" (aliases: [k8s_mysql_mysql-7-k5xn9_valadas_c44b81be-a803-11eb-a354-30e1715ed4a4_0 538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26], namespace: "docker")
Apr 28 17:44:52 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:52.519612    4457 manager.go:1054] Destroyed container: "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc44b81be_a803_11eb_a354_30e1715ed4a4.slice/docker-ef783082928c6e529b2d56f2b91c5d3736d08c4f43201f33792a33a6147d2e69.scope" (aliases: [k8s_POD_mysql-7-k5xn9_valadas_c44b81be-a803-11eb-a354-30e1715ed4a4_0 ef783082928c6e529b2d56f2b91c5d3736d08c4f43201f33792a33a6147d2e69], namespace: "docker")


[root@hp-dl388g9-06 ~]# journalctl -u atomic-openshift-node --since ' 17:44:03' | grep mysql-7-k5xn9
Apr 28 17:44:03 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:03.988050    4457 kubelet.go:1836] SyncLoop (DELETE, "api"): "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:03 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:03.988202    4457 kubelet_pods.go:1331] Generating status for "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:03 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:03.993224    4457 status_manager.go:496] Patch status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" with "{}"
Apr 28 17:44:03 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:03.993243    4457 status_manager.go:503] Status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" updated successfully: (3, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:} {Type:Ready Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:27:37 +0800 CST Reason: Message:} {Type:ContainersReady Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason: Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.73.131.115 PodIP:10.128.0.29 StartTime:2021-04-28 17:26:09 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:mysql State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2021-04-28 17:27:37 +0800 CST,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest ImageID:docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26}] QOSClass:Burstable})
Apr 28 17:44:03 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:03.993299    4457 kubelet_pods.go:900] Pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" is terminated, but some containers are still running
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.057186    4457 server.go:469] Event(v1.ObjectReference{Kind:"Pod", Namespace:"valadas", Name:"mysql-7-k5xn9", UID:"c44b81be-a803-11eb-a354-30e1715ed4a4", APIVersion:"v1", ResourceVersion:"1185618", FieldPath:"spec.containers{mysql}"}): type: 'Normal' reason: 'Killing' Killing container with id docker://mysql:Need to kill Pod
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.059238    4457 plugins.go:391] Calling network plugin cni to tear down pod "mysql-7-k5xn9_valadas"
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.542353    4457 kuberuntime_manager.go:852] getSandboxIDByPodUID got sandbox IDs ["ef783082928c6e529b2d56f2b91c5d3736d08c4f43201f33792a33a6147d2e69"] for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.550049    4457 generic.go:380] PLEG: Write status for mysql-7-k5xn9/valadas: &container.PodStatus{ID:"c44b81be-a803-11eb-a354-30e1715ed4a4", Name:"mysql-7-k5xn9", Namespace:"valadas", IP:"", ContainerStatuses:[]*container.ContainerStatus{(*container.ContainerStatus)(0xc421634620)}, SandboxStatuses:[]*v1alpha2.PodSandboxStatus{(*v1alpha2.PodSandboxStatus)(0xc42248f360)}} (err: <nil>)
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.550114    4457 kubelet.go:1865] SyncLoop (PLEG): "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)", event: &pleg.PodLifecycleEvent{ID:"c44b81be-a803-11eb-a354-30e1715ed4a4", Type:"ContainerDied", Data:"538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26"}
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.550140    4457 kubelet_pods.go:1331] Generating status for "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.550204    4457 kubelet.go:1865] SyncLoop (PLEG): "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)", event: &pleg.PodLifecycleEvent{ID:"c44b81be-a803-11eb-a354-30e1715ed4a4", Type:"ContainerDied", Data:"ef783082928c6e529b2d56f2b91c5d3736d08c4f43201f33792a33a6147d2e69"}
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.550214    4457 kubelet_pods.go:1331] Generating status for "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.550220    4457 kubelet_pods.go:1331] Generating status for "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.556917    4457 status_manager.go:496] Patch status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" with "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2021-04-28T09:44:04Z\",\"message\":\"containers with unready status: [mysql]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"message\":\"containers with unready status: [mysql]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26\",\"image\":\"registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest\",\"imageID\":\"docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f\",\"lastState\":{},\"name\":\"mysql\",\"ready\":false,\"restartCount\":0,\"state\":{\"terminated\":{\"containerID\":\"docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26\",\"exitCode\":0,\"finishedAt\":\"2021-04-28T09:44:04Z\",\"reason\":\"Completed\",\"startedAt\":\"2021-04-28T09:27:37Z\"}}}]}}"
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.557014    4457 status_manager.go:503] Status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" updated successfully: (4, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:44:04 +0800 CST Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.73.131.115 PodIP:10.128.0.29 StartTime:2021-04-28 17:26:09 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:mysql State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2021-04-28 17:27:37 +0800 CST,FinishedAt:2021-04-28 17:44:04 +0800 CST,ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest ImageID:docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26}] QOSClass:Burstable})
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.557192    4457 kubelet_pods.go:914] Pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" is terminated, but some containers have not been cleaned up: {ID:{Type:docker ID:538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26} Name:mysql State:exited CreatedAt:2021-04-28 17:27:37.080195722 +0800 CST StartedAt:2021-04-28 17:27:37.245527103 +0800 CST FinishedAt:2021-04-28 17:44:04.045809505 +0800 CST ExitCode:0 Image:registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest ImageID:docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f Hash:239154134 RestartCount:0 Reason:Completed Message:}
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.558108    4457 kubelet.go:1833] SyncLoop (RECONCILE, "api"): "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.628232    4457 desired_state_of_world_populator.go:267] Removing volume from desired state for volume "default-token-fczng" (UniqueName: "kubernetes.io/secret/c44b81be-a803-11eb-a354-30e1715ed4a4-default-token-fczng") pod "mysql-7-k5xn9" (UID: "c44b81be-a803-11eb-a354-30e1715ed4a4")
Apr 28 17:44:04 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:04.628270    4457 desired_state_of_world_populator.go:267] Removing volume from desired state for volume "pv0001" (UniqueName: "kubernetes.io/fc/[50050768030539b6]:0") pod "mysql-7-k5xn9" (UID: "c44b81be-a803-11eb-a354-30e1715ed4a4")
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.565171    4457 kuberuntime_manager.go:852] getSandboxIDByPodUID got sandbox IDs ["ef783082928c6e529b2d56f2b91c5d3736d08c4f43201f33792a33a6147d2e69"] for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.568335    4457 generic.go:380] PLEG: Write status for mysql-7-k5xn9/valadas: &container.PodStatus{ID:"c44b81be-a803-11eb-a354-30e1715ed4a4", Name:"mysql-7-k5xn9", Namespace:"valadas", IP:"", ContainerStatuses:[]*container.ContainerStatus{}, SandboxStatuses:[]*v1alpha2.PodSandboxStatus{(*v1alpha2.PodSandboxStatus)(0xc422392c80)}} (err: <nil>)
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.568512    4457 kubelet_pods.go:1331] Generating status for "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.575873    4457 status_manager.go:496] Patch status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" with "{}"
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.575894    4457 status_manager.go:503] Status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" updated successfully: (5, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:44:04 +0800 CST Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.73.131.115 PodIP:10.128.0.29 StartTime:2021-04-28 17:26:09 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:mysql State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2021-04-28 17:27:37 +0800 CST,FinishedAt:2021-04-28 17:44:04 +0800 CST,ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest ImageID:docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26}] QOSClass:Burstable})
Apr 28 17:44:05 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:05.576217    4457 kubelet_pods.go:925] Pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" is terminated, but pod cgroup sandbox has not been cleaned up
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.116713    4457 status_manager.go:496] Patch status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" with "{}"
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.116740    4457 status_manager.go:503] Status for pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" updated successfully: (5, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:44:04 +0800 CST Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [mysql]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-04-28 17:26:09 +0800 CST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.73.131.115 PodIP:10.128.0.29 StartTime:2021-04-28 17:26:09 +0800 CST InitContainerStatuses:[] ContainerStatuses:[{Name:mysql State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2021-04-28 17:27:37 +0800 CST,FinishedAt:2021-04-28 17:44:04 +0800 CST,ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod:latest ImageID:docker-pullable://registry-proxy.engineering.redhat.com/rh-osbs/openshift3-ose-pod@sha256:09f7b9217ac9eb7982abdd0c3b9d8bfc33f7ccbe1681911502b4c182940e0e9f ContainerID:docker://538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26}] QOSClass:Burstable})
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.123034    4457 status_manager.go:516] Pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)" fully terminated and removed from etcd
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.125265    4457 kubelet.go:1836] SyncLoop (DELETE, "api"): "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.125467    4457 kubelet.go:1830] SyncLoop (REMOVE, "api"): "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)"
Apr 28 17:44:12 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:12.134648    4457 kubelet.go:2032] Failed to delete pod "mysql-7-k5xn9_valadas(c44b81be-a803-11eb-a354-30e1715ed4a4)", err: pod not found
Apr 28 17:44:52 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:52.519586    4457 manager.go:1054] Destroyed container: "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc44b81be_a803_11eb_a354_30e1715ed4a4.slice/docker-538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26.scope" (aliases: [k8s_mysql_mysql-7-k5xn9_valadas_c44b81be-a803-11eb-a354-30e1715ed4a4_0 538e998e7c9c1d2b71de7b27050e053536202c64fdb605371c66404795de9c26], namespace: "docker")
Apr 28 17:44:52 hp-dl388g9-06.rhts.eng.pek2.redhat.com atomic-openshift-node[4457]: I0428 17:44:52.519612    4457 manager.go:1054] Destroyed container: "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc44b81be_a803_11eb_a354_30e1715ed4a4.slice/docker-ef783082928c6e529b2d56f2b91c5d3736d08c4f43201f33792a33a6147d2e69.scope" (aliases: [k8s_POD_mysql-7-k5xn9_valadas_c44b81be-a803-11eb-a354-30e1715ed4a4_0 ef783082928c6e529b2d56f2b91c5d3736d08c4f43201f33792a33a6147d2e69], namespace: "docker")



PV Dump:
[root@hp-dl388g9-06 ~]# oc get pv -o yaml pv0001
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: "2021-04-21T10:36:40Z"
  finalizers:
  - kubernetes.io/pv-protection
  name: pv0001
  resourceVersion: "301979"
  selfLink: /api/v1/persistentvolumes/pv0001
  uid: 74f29958-a28d-11eb-be76-30e1715ed4a4
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 50Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: mysql
    namespace: valadas
    resourceVersion: "301975"
    uid: 9e373413-a28d-11eb-be76-30e1715ed4a4
  fc:
    fsType: xfs
    lun: 0
    targetWWNs:
    - 50050768030539b6
  persistentVolumeReclaimPolicy: Retain
status:
  phase: Bound

PVC Dump:
# oc get pvc -o yaml mysql
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftNewApp
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: "2021-04-21T10:37:49Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: mysql-persistent
    template: mysql-persistent-template
  name: mysql
  namespace: valadas
  resourceVersion: "301982"
  selfLink: /api/v1/namespaces/valadas/persistentvolumeclaims/mysql
  uid: 9e373413-a28d-11eb-be76-30e1715ed4a4
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  volumeName: pv0001
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 50Gi
  phase: Bound


StorageClass Dump (if StorageClass used by PV/PVC):
N/A

Additional info:
This may not seem like a big deal but it is a problem because:
1- Not every RWO device will have a locking mechanism at storage level, for instance fiber channel with XFS does *not* have any mechanism at all to prevent the filesystem to be mounted on both nodes *at the same time*. This means filesystem corruption. 
2- Replicasets and replicationcontrollers rely on the fact that the pod status is correct. If the kubelet reports a pod is deleted *before* the pod is actually deleted it's technically possible that there is a new replica created in a new node. Specially if the pod is evicted it's likely that the pod is slow.

Comment 2 Jan Safranek 2021-05-06 11:45:54 UTC

The volumes are unmounted (UnmountDevice) indeed *after* a Pod is deleted, however, Kubernetes has code that prevents the volume to be attached to another node before UnmountDevice really succeeds. Check node.status.volumesAttached and node.status.volumesInUse - volumesInUse should be cleaned after a volume is really unmounted and only after that a RWO volume can be attached somewhere else.

I know FC itself does not have 3rd party attach service, but that's the reason we have a fake one in Kubernetes that does nothing and only prevents one RWO volume to be "attached" to multiple nodes in parallel with error "Volume is already exclusively attached to one node and can't be attached to another". In https://bugzilla.redhat.com/show_bug.cgi?id=1903524 we even added extra "blockdev --flushbufs"  and "echo '- - -' to /sys/block/<device>/device/delete" to flush everything after a volume is unmounted. Is it possible that this does not work in the customer environment?


Can you please post kubelet + kube-controller-manager logs showing the volume was mounted on a node A *before* it's unmounted from node B? That should not happen and if it does, it's indeed a bug. "oc get node -o yaml" would help too, to see how the volume is added / removed from volumesAttached / volumesInUse.

Comment 3 Juan Luis de Sousa-Valadas 2021-05-06 14:02:06 UTC

Jan,

> Is it possible that this does not work in the customer environment?
I think that's the case, and I managed to force a reproducer in a red hat environment. I'll ask QA for another cluster to reproduce this.

> Can you please post kubelet + kube-controller-manager logs showing the volume was mounted on a node A *before* it's unmounted from node B? That should not happen and if it does, it's indeed a bug.

Unfortunately I don't have both kubelet logs, we the kube-controller-manager logs are in the sosreport but we only have the logs of the kubelet of node B.

Qin,
Would it be possible for QA to reproduce it or to give me a 3.11 cluster with fiber channel? Chao Yang got me one about 3 weeks ago when I filed the bug.

Comment 10 Jan Safranek 2021-05-07 14:09:05 UTC

Filed issue upstream: https://github.com/kubernetes/kubernetes/issues/101791

Comment 12 Jan Safranek 2021-05-07 16:27:07 UTC

Upstream PR: https://github.com/kubernetes/kubernetes/pull/101792
https://github.com/kubernetes/kubernetes/pull/94781/

It fixes "could not get consistent content of /proc/mounts after 3 attempts" (there will be 999 attempts now) and recovery from "multipath -f" errors, but still, I don't know how to reproduce it reliably:

1. create a pod that use a FC volume
2. stop kubelet
3. delete the pod with force (oc delete pod XYZ --force --grace-period=0)
4. start kubelet
5. hope that FC volume plugin hits "could not get consistent content of /proc/mounts after 3 attempts".

If you're lucky, then before the patch you'll see "could not get consistent content of /proc/mounts after 3 attempts" and the volume still mounted in /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/fc/... even after ~5 minutes.

After the patch it's impossible to get that unlucky (or you'll see "could not get consistent content of /proc/mounts after 999 attempts"). The volume should be unmounted + multipath device flushed + deleted.

(Instead of 3., I can see in the logs that a pod probe failed and it caused the pod to be deleted. Not sure how reliably you can reproduce that)

This needs some serious discussion upstream. As you can see, it needs several bad things happen at the same time and when it happens, I don't see a good workaround. Perhaps always drain a node before restarting kubelet, but it's hard to ensure with ansible.

Comment 13 Jan Safranek 2021-05-10 09:00:40 UTC

And also this PR could be helpful: https://github.com/kubernetes/kubernetes/pull/101862

I noticed that "multipath -f /dev/dm-4" (= the multipath device) sometimes fails with "device is busy", even after umount. Therefore kubelet should try it several times.

Comment 16 Jan Safranek 2021-06-07 10:03:10 UTC

I created cherry-pick to Kubernetes 1.21: https://github.com/kubernetes/kubernetes/pull/102656

Comment 18 Chao Yang 2021-06-15 06:18:35 UTC

Could not install cluster due to psi is not stable today.

Comment 20 Chao Yang 2021-06-16 15:39:51 UTC

1.Create pv/pvc/pod
/dev/sdd on /var/lib/kubelet/plugins/kubernetes.io/fc/50050768030539b6-lun-0 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)
/dev/sdd on /var/lib/kubelet/pods/e402f318-d871-47d8-86a0-a0c98ee89c9b/volumes/kubernetes.io~fc/pv0001 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)

2.Stop kubelet service
hp-dl388g9-06.rhts.eng.pek2.redhat.com   NotReady   worker   3h14m   v1.21.0-rc.0+120883f

3.oc delete pod deployment-1-6db9b5d75d-zrl7s --force --grace-period=0

4.oc describe pods
Events:
  Type     Reason              Age    From                     Message
  ----     ------              ----   ----                     -------
  Normal   Scheduled           3m12s  default-scheduler        Successfully assigned valadas/deployment-1-6db9b5d75d-g8shf to hp-dl388g9-03.rhts.eng.pek2.redhat.com
  Warning  FailedAttachVolume  3m13s  attachdetach-controller  Multi-Attach error for volume "pv0001" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount         69s    kubelet                  Unable to attach or mount volumes: unmounted volumes=[local], unattached volumes=[kube-api-access-mbqcc local]: timed out waiting for the condition
5. Start kubelet service
6.Pod is running 
oc get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP            NODE                                     NOMINATED NODE   READINESS GATES
deployment-1-6db9b5d75d-g8shf   1/1     Running   0          30m   10.131.2.85   hp-dl388g9-03.rhts.eng.pek2.redhat.com   <none>           <none>


Jun 16 12:27:40 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[2280]: I0616 12:27:40.867577    2280 operation_generator.go:616] MountVolume.MountDevice succeeded for volume "pv0001" (UniqueName: "kubernetes.io/fc/[50050768030539b6]:0") pod "deployment-1-6db9b5d75d-zrl7s" (UID: "e402f318-d871-47d8-86a0-a0c98ee89c9b") device mount path "/var/lib/kubelet/plugins/kubernetes.io/fc/50050768030539b6-lun-0"
Jun 16 12:27:40 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[2280]: I0616 12:27:40.874155    2280 operation_generator.go:698] MountVolume.SetUp succeeded for volume "pv0001" (UniqueName: "kubernetes.io/fc/[50050768030539b6]:0") pod "deployment-1-6db9b5d75d-zrl7s" (UID: "e402f318-d871-47d8-86a0-a0c98ee89c9b")
Jun 16 15:13:37 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[219234]: I0616 15:13:37.362108  219234 reconciler.go:429] "Reconciler sync states: could not find pod information in desired state, update it in actual state" reconstructedVolume=&{volumeName:kubernetes.io/fc/[50050768030539b6]:0 podName:e402f318-d871-47d8-86a0-a0c98ee89c9b volumeSpec:0xc000011b90 outerVolumeSpecName:pv0001 pod:0xc00148a400 volumeGidValue: devicePath: mounter:0xc0013eb260 deviceMounter:0xc000f7cd80 blockVolumeMapper:<nil>}
Jun 16 15:13:38 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[219234]: I0616 15:13:38.997411  219234 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"pv0001\" (UniqueName: \"kubernetes.io/fc/[50050768030539b6]:0\") pod \"e402f318-d871-47d8-86a0-a0c98ee89c9b\" (UID: \"e402f318-d871-47d8-86a0-a0c98ee89c9b\") "
Jun 16 15:13:39 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[219234]: W0616 15:13:39.015696  219234 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/e402f318-d871-47d8-86a0-a0c98ee89c9b/volumes/kubernetes.io~fc/pv0001" is not a mountpoint, deleting
Jun 16 15:13:39 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[219234]: I0616 15:13:39.026348  219234 operation_generator.go:829] UnmountVolume.TearDown succeeded for volume "kubernetes.io/fc/[50050768030539b6]:0" (OuterVolumeSpecName: "pv0001") pod "e402f318-d871-47d8-86a0-a0c98ee89c9b" (UID: "e402f318-d871-47d8-86a0-a0c98ee89c9b"). InnerVolumeSpecName "pv0001". PluginName "kubernetes.io/fc", VolumeGidValue ""
Jun 16 15:13:39 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[219234]: I0616 15:13:39.104454  219234 reconciler.go:312] "operationExecutor.UnmountDevice started for volume \"pv0001\" (UniqueName: \"kubernetes.io/fc/[50050768030539b6]:0\") on node \"hp-dl388g9-06.rhts.eng.pek2.redhat.com\" "
Jun 16 15:13:39 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[219234]: I0616 15:13:39.288127  219234 operation_generator.go:930] UnmountDevice succeeded for volume "pv0001" %!(EXTRA string=UnmountDevice succeeded for volume "pv0001" (UniqueName: "kubernetes.io/fc/[50050768030539b6]:0") on node "hp-dl388g9-06.rhts.eng.pek2.redhat.com" )
Jun 16 15:13:39 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[219234]: I0616 15:13:39.352556  219234 reconciler.go:319] "Volume detached for volume \"pv0001\" (UniqueName: \"kubernetes.io/fc/[50050768030539b6]:0\") on node \"hp-dl388g9-06.rhts.eng.pek2.redhat.com\" DevicePath \"\""

Comment 21 Chao Yang 2021-06-17 14:57:10 UTC

1.Create pv/pvc/pod
2.touch /var/lib/kubelet/plugins/kubernetes.io/fc/50050768030539b6-lun-0/my-file
3.Make this device is busy
4.Stop kubelet service
5.oc delete pod --force --grace-period=0
6.Start kubelet service
7.After a while, pod will be running on other nodes.
But We can see below message on the first node 
mount | grep sdd
/dev/sdd on /var/lib/kubelet/plugins/kubernetes.io/fc/50050768030539b6-lun-0 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)

Jun 17 14:41:02 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[233004]: I0617 14:41:02.395210  233004 reconciler.go:312] "operationExecutor.UnmountDevice started for volume \"pv0001\" (UniqueName: \"kubernetes.io/fc/[50050768030539b6]:0\") on node \"hp-dl388g9-06.rhts.eng.pek2.redhat.com\" "
Jun 17 14:41:02 hp-dl388g9-06.rhts.eng.pek2.redhat.com hyperkube[233004]: E0617 14:41:02.406178  233004 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/fc/[50050768030539b6]:0 podName: nodeName:}" failed. No retries permitted until 2021-06-17 14:43:04.406085866 +0000 UTC m=+3799.915289059 (durationBeforeRetry 2m2s). Error: "UnmountDevice failed for volume \"pv0001\" (UniqueName: \"kubernetes.io/fc/[50050768030539b6]:0\") on node \"hp-dl388g9-06.rhts.eng.pek2.redhat.com\" : fc: failed to unmount: /var/lib/kubelet/plugins/kubernetes.io/fc/50050768030539b6-lun-0\nError: unmount failed: exit status 32\nUnmounting arguments: /var/lib/kubelet/plugins/kubernetes.io/fc/50050768030539b6-lun-0\nOutput: umount: /var/lib/kubelet/plugins/kubernetes.io/fc/50050768030539b6-lun-0: target is busy.\n"
[root@hp-dl388g9-06 50050768030539b6-lun-0]# date
Thu Jun 17 14:42:05 UTC 2021

@jsafrane, I can see kubelet is always trying to unmount the device on the first node. Is this correct?

Comment 22 Jan Safranek 2021-06-17 19:56:32 UTC

When kubelet can't unmount a volume for 6 minutes, kube-controller-manager force-detaches the volume:

W0617 19:42:33.596490       1 reconciler.go:222] attacherDetacher.DetachVolume started for volume "pv0001" (UniqueName: "kubernetes.io/fc/[50050768030539b6]:0") on node "hp-dl388g9-03.rhts.eng.pek2.redhat.com" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching
I0617 19:42:33.596547       1 operation_generator.go:483] DetachVolume.Detach succeeded for volume "pv0001" (UniqueName: "kubernetes.io/fc/[50050768030539b6]:0") on node "hp-dl388g9-03.rhts.eng.pek2.redhat.com" 

Since there is no 3rd party attach/detach for FC (and iSCSI) volumes, the volume is marked as detached and then it's usable on the other node. This is not related to this BZ and it will happen even without kubelet restart. Just block the mount for 6 minutes and with any volume.

What I fixed is "could not get consistent content of /proc/mounts after 3 attempts" - now kubelet retries to read /proc/mount for 1 minute. But I don't know how to reproduce it, kernel must be seriously busy and/or constantly mounting/unmounting volumes when kubelet starts.

Comment 23 Chao Yang 2021-06-18 03:11:47 UTC

Verified on 4.8.0-0.nightly-2021-06-16-190035

Comment 27 errata-xmlrpc 2021-07-27 23:04:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438