Bug 2024328

Summary: [oVirt / RHV] PV disks are lost when machine deleted while node is disconnected
Product: OpenShift Container Platform Reporter: Andrew Austin <aaustin>
Component: Cloud ComputeAssignee: Alberto <agarcial>
Cloud Compute sub component: oVirt Provider QA Contact: michal <mgold>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: ableisch, lsvaty, mburman, mkalinin, pelauter, plarsen
Version: 4.8Flags: mburman: needinfo-
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:28:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2028509    

Description Andrew Austin 2021-11-17 20:19:00 UTC
Description of problem:

When a Machine is deleted on an OpenShift IPI cluster running on oVirt / RHV, any PV disks attached to the virtual machine are also deleted if the Node is in "Not Ready" status and unable to drain pods.

Version-Release number of selected component (if applicable):
OCP 4.8.13 on oVirt 4.4.8.5-1.el8

How reproducible:

Install an OpenShift IPI cluster on oVirt / RHV and have a workload with an oVirt CSI PV attached. If you set the VM's NIC to be down or the VM otherwise becomes Not Ready and you then delete the associated Machine object, the VM will be deleted from oVirt along with all of the PV disks attached to it. I first encountered this when a worker VM was deleted by a MachineHealthCheck due to being Not Ready for more than five minutes.

Steps to Reproduce:
1. Create an OpenShift IPI cluster on oVirt / RHV
2. Observe which worker VM has the image-registry pod and associated PV disk
3. Log into the RHV console and either power off or unplug the network interface of the VM
4. After the Node object in OCP shows "Not Ready," delete the associated Machine object. This may be done automatically by a Machine Health Check

Actual results:

The worker VM is deleted from oVirt along with all PV disks attached to it, leading to permanent data loss.

Expected results:

The PV disks are detached prior to deleting the VM so that PVs can be re-attached to another worker node.

Comment 1 Peter Lauterbach 2021-11-17 21:19:26 UTC
Since the deletion is unexpected and occurs in the default configuration, marking this urgent/urgent

Comment 2 Gal Zaidman 2021-11-18 15:05:09 UTC
The issue has probably existed in our code from day one which supprises me how are we hearing about it just now?
Any way in RHV when you delete a VM then all the attached disks are deleted as well by default since we are using templates to provision our VMs and we want the bootable disk to be removed but not the non bootable disks(PVs) to be deleted, we needed to detach the disks from the VM before deleting it.
The attached PR solves the issue and we will need to backport it as far as we can since it can cause data loss

Comment 4 Peter Lauterbach 2021-11-18 16:29:23 UTC
> The issue has probably existed in our code from day one which supprises me how are we hearing about it just now?

I've asked about the test suite that should have caught this, but the machine delete test don't include additional storage. We'll work with the cloud team to make it more robust, which will benefit all virtualization platforms.

I's not clear that this previously “undefined” or ambiguous behavior. Since we found a similar issue on VMware, is it left to each virtual infrastructure provider to figure out how to handle this use case? Or is it well defined, and I don't understand it yet?

Comment 6 Gal Zaidman 2021-11-24 07:50:38 UTC
(In reply to Peter Lauterbach from comment #4)
> > The issue has probably existed in our code from day one which supprises me how are we hearing about it just now?
> 
> I've asked about the test suite that should have caught this, but the
> machine delete test don't include additional storage. We'll work with the
> cloud team to make it more robust, which will benefit all virtualization
> platforms.
> 
> I's not clear that this previously “undefined” or ambiguous behavior. Since
> we found a similar issue on VMware, is it left to each virtual
> infrastructure provider to figure out how to handle this use case? Or is it
> well defined, and I don't understand it yet?

We will need to dive a bit into the why to understand what is going on here.
The normal flow should be that when a node is marked for deletion then it is drained from all pods, and only when the node is fully drained then delete the node.
The draining process should take care of detaching the disk before the VM is deleted, since the pod that is using the PVC is not found on the node yet then the disk should be detached.
On my dev and that was the case, but I haven't tried any edge cases like network failure between the node and the cluster on deletion.
The PR should handle making sure we don't delete stuff that shouldn't be deleted, but we need to understand if and when a Machine will be deleted when draining is not done - to eliminate other bugs that may hide behind this issue.

Comment 7 Peter Larsen 2021-11-24 20:41:54 UTC
(In reply to Peter Lauterbach from comment #4)
> > The issue has probably existed in our code from day one which supprises me how are we hearing about it just now?
> 
> I've asked about the test suite that should have caught this, but the
> machine delete test don't include additional storage. We'll work with the
> cloud team to make it more robust, which will benefit all virtualization
> platforms.
> 
> I's not clear that this previously “undefined” or ambiguous behavior. Since
> we found a similar issue on VMware, is it left to each virtual
> infrastructure provider to figure out how to handle this use case? Or is it
> well defined, and I don't understand it yet?

I just opened BZ #2026179 with the same problem, as the machineset scale event deleted all data used for ODF. My suggestion is to avoid using the "cascade delete all storage" option when removing a VM. Instead, explicitly detach ALL storage from the VM before deleting. Once the VM is deleted, lookup the osDisk that was created when the VM template was instantiated, and remove it. I'm not sure if a better option would be to have something that looks for "hanging" disk volumees by comparing each disk volume belonging to the OCP cluster to the list of PVs and those unattached and not part of a PV reference, can be deleted once all VMs are removed.  This would allow for the odd template where more than one disk is used as part of the VM template.

Bottom line is that we need to verify each volume attached to the VM against the existing PV definitions. If there is a match, the disk-volume cannot be deleted. HOW (the actual process) of doing that is probably not as important - the above will work in all cases, and if it fails it leaves unused disk-volumes behind, instead of removing too much data. That seems as a safer option.

Comment 8 michal 2021-12-02 10:56:12 UTC
rhv: 4.4.9.5
ocp: 4.10.0-0.nightly-2021-11-29-191648


steps:
1) create pvc
2) create deployment -> pod was created
3) verify that pod run on machine
4) delete machine and verify that pod move to other machine

1) create pvc
2) create deployment -> pod was created
3) verify that pod run on machine
4) disconnect network from vm that pod run on
5) delete the vm that you disconnect the network
6) verify that pod move to different machine

actual:
pod appear in different machine and PVC doesn't delete

Comment 9 Peter Lauterbach 2021-12-02 13:09:58 UTC
How far back into OCP z-stream will this be backported, I do not see any linked bz.  Definitely both OCP 4.9 and OCP 4.8 are needed.

Comment 18 errata-xmlrpc 2022-03-10 16:28:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056