Bug 1883993

Summary:	Openshift 4.5.8 Deleting pv disk vmdk after delete machine
Product:	OpenShift Container Platform	Reporter:	hgomes
Component:	Cloud Compute	Assignee:	dmoiseev
Cloud Compute sub component:	Other Providers	QA Contact:	Milind Yadav <miyadav>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	high	CC:	ableisch, amurdaca, ansverma, aos-bugs, dmoiseev, ebrizuel, gbravi, gfontana, hekumar, jcallen, jsafrane, mkrejci, rkant, sreber, suchaudh
Version:	4.5	Keywords:	Reopened
Target Milestone:	---	Flags:	jcallen: needinfo- jcallen: needinfo-
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Previously, during machine deletion process vmdk's created for PV's and attached to the node might be deleted with the machine in case of unreachable kubelet, which was leading to unrecoverable data deletion. Now vSphere cloud provider checks and detach these disks from vm if kubelet not reachable, which allows to reattach it to different node and do not loose data on it.	Story Points:	---
Clone Of:
Clones:	1884643 (view as bug list)		Environment:
Last Closed:	2021-07-27 22:33:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1884643, 1947372

Description hgomes 2020-09-30 17:18:49 UTC

Description of problem:

We identified a very critical situation in the Openshift 4.5.8 IPI VMware environment, we noticed that the vmdk's referring to the pv's attacked to the node, were deleted after deleting the machine.

We performed some tests and applied the 3 scenarios below to reproduce this behavior:

Scene 1:

- We created a postgresql pod using persistent disk rwo (vsphere) and allocated it in a specific node.
- We shut down the server abruptly to simulate a crash.
- At this moment the applications have migrated, but the disk has not been detached from the disconnected server, following the link below the documentation, we removed the node by deleting the machine for this disk.
- After the node was removed, we noticed that the postgresql application failed to "not found" in the volume.
- PV and PVC remained bound, without showing failure.
- Searching for the disk in the datastore, we did not find vmdk, deleted vmdk together with the node.

https://docs.openshift.com/container-platform/4.5/support/troubleshooting/troubleshooting-storage-issues.html#storage-multi-attach-error_troubleshooting-storage-issues

Scenario 2

- We recreated the postgresql pod with rwo persistent disk and moved it to a specific node.
- We removed the machine
- Applications were drained and pv's allocated disks were detached
- The disk was attacked on another node and the application was started correctly.

Scenario 3

- We recreated the postgresql pod with rwo persistent disk and moved it to a specific node.
- To simulate a situation with the server connected, however notready, we stopped the kubelet service on this node.
- After the server has a status notready, delete the machine.
- After a few minutes the server was removed and also removed the vmdk's, without detecting and attaching them to another node.

This is a very critical situation, since the volume check is only done if the node is fully functional, where it is not necessary to delete the node.

Version-Release number of selected component (if applicable):

Openshift 4.5.8 IPI VMware - Productive and Non-Productive Cluster

Actual results:

Expected results:

Additional info:

- Found a few Issues related to this:

https://github.com/kubernetes/kubernetes/issues/75738
https://github.com/kubernetes/enhancements/pull/719
https://github.com/vmware/vsphere-storage-for-kubernetes/issues/55
https://github.com/rancher/rancher/issues/24690

Comment 8 Giovanni Fontana 2020-10-01 13:29:40 UTC

Kumar,

I was wondering if the following use case could be implemented in the machine-api-operator in order to avoid accidental pv removal during machine deleting process:

- Before removing a VM on vsphere check all vmdks attached in the VM.
- If there is any vmdk that comes from PVs, machine-api will try to detach this volume from the VM (before VM deletion).
- If VMDKs are detached successfully the machine-api-operator continues to remove the VM.
- If any VMDK is not detached for some reason, machine-api fails to remove the machine with some intuitive message like "Failed to delete the VM due failure to detach one or more vmdks from Persistent Volume: VMDK: <VMDK_NAME> - Error: <MSG FROM VMWARE>"

Do you think this is feasible? I would like to propose this use case for implementation. Do you know how to make a proposal for a enhancement in the machine-api-operator?

Thanks a lot!

Appreciate that!

Regards,

Giovanni Fontana

Comment 12 Jan Safranek 2020-10-05 10:36:05 UTC

Did some testing today with 4.6 nightly (but these parts have not changed for quite some time):

1. When a pod that uses a volume attached to a shutdown node is deleted, Kubernetes waits forever for kubelet to confirm the volume has been unmounted (to prevent data corruption).
1. When such pod is deleted with force, "oc delete pod --force", the volume is detached after ~6 minutes and a new pod can start. So please delete pods with force and be patient.
2. "oc adm drain <node>" does not drain the nodes with force. It waits forever for such a pod to get deleted.

So, from storage point of view (vmware volume plugin / attach-detach controller), the system works as designed and prevents data corruption. On overall OCP level, something (MCO?) could force-delete pods on nodes that are confirmed to be shut down in the cloud - MCO has / could have such knowledge, while Kubernetes/kube-controller-manager does not.

Comment 13 Jan Safranek 2020-10-13 09:10:23 UTC

I've been thinking about this and there is not much we can do on the storage side. Kubernetes attach/detach controller does not know what's the status of the VM in vSphere and it protects data from corruption, i.e. never force-detaches volumes from nodes unless either the Node object is deleted or Pod is force-deleted (Terminating is not enough, as kubelet must confirm the volumes are unmounted and it's not running at that time).

There is already a bug to track documentation changes ("force delete pods before deleting VMs", #1884643).

Leaving to MCO team to judge if they can initiate force-delete pods from a node before removing it from vSphere or force-detaching volumes from it. Still, users can go to vSphere console directly and delete VMs manually, potentially loosing their data. And there is nothing we can do about that.

Comment 15 Antonio Murdaca 2020-11-17 08:22:26 UTC

(In reply to Jan Safranek from comment #13)
> I've been thinking about this and there is not much we can do on the storage
> side. Kubernetes attach/detach controller does not know what's the status of
> the VM in vSphere and it protects data from corruption, i.e. never
> force-detaches volumes from nodes unless either the Node object is deleted
> or Pod is force-deleted (Terminating is not enough, as kubelet must confirm
> the volumes are unmounted and it's not running at that time).
> 
> There is already a bug to track documentation changes ("force delete pods
> before deleting VMs", #1884643).
> 
> Leaving to MCO team to judge if they can initiate force-delete pods from a
> node before removing it from vSphere or force-detaching volumes from it.
> Still, users can go to vSphere console directly and delete VMs manually,
> potentially loosing their data. And there is nothing we can do about that.

The MCO wouldn't force-delete a pod as we also want to avoid any data corruption so until kube tells us it's gone, we still loop the drain. I understand the corruption shouldn't be a problem but the MCO won't grow the knoledge of force deleting a pod (in the short term) and it really sounds like both MCO and kube are doing the right thing but vsphere.

Comment 35 Anshul Verma 2021-07-02 07:42:46 UTC

Hello,

Did you get a chance to check the above comment made by Sudarshan?

Comment 47 errata-xmlrpc 2021-07-27 22:33:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438