1983567 – Disk is missing after importing VM from Storage Domain that was detached from another DC.

Bug 1983567 - Disk is missing after importing VM from Storage Domain that was detached from another DC.

Summary: Disk is missing after importing VM from Storage Domain that was detached from...

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.4.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.5.3
Target Release:	---
Assignee:	Pavel Bar
QA Contact:	Ilia Markelov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1541529
TreeView+	depends on / blocked

Reported:	2021-07-19 06:36 UTC by Germano Veit Michel
Modified:	2024-10-01 19:01 UTC (History)
CC List:	9 users (show)
Fixed In Version:	ovirt-engine-4.5.3.1
Doc Type:	Bug Fix
Doc Text:	There may be stale data in some DB tables, resulting in missing disks after importing a VM (after Storage Domain was imported from a source RHV to destination RHV, and the VM was imported too). Bug fixes BZ#1910858 and BZ#1705338 solved similar issues, and since this bug is hard to reproduce, it may have been fixed by these 2 fixes. In this release, everything works, the VM is imported with all the attached disks.
Clone Of:
Environment:
Last Closed:	2022-10-11 12:00:51 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	oVirt ovirt-engine pull 681	0	None	Merged	Fix SD detach flow ("unregistered_ovf_of_entities" DB table)	2022-09-27 14:14:02 UTC

Description Germano Veit Michel 2021-07-19 06:36:11 UTC

Description of problem:

This is about a Storage Domain being detached from RHV SRC and attached to RHV DST. A VM had 2 disks on the SRC RHV but upon being imported at destination RHV it had just 1 Disk attached to the it. The other disk was nowehere in the database (not attached to the VM and not unregistered), but still sitting on the Storage Domain.

Details will follow in the next comment.


Version-Release number of selected component (if applicable):
rhvm-4.4.4.7-0.2.el8ev.noarch

How reproducible:
* 0%, tried several ways, including going back and forth multiple times between RHV, using VirtIO for one disk and Virtio-SCSI for another, basing the VMs on template etc.

Comment 4 Germano Veit Michel 2021-07-19 06:41:10 UTC

So basically the disk was on the Storage Domain, was seen by the engine and added to unregistered entities on import SD, but upon the VM being imported the disk is gone: not attached to the VM, not on base disks and not unregistered. Yet, the image was still on the Storage Domain, just not referenced the DB anymore.

I've requested the customer to get additional data (DB dump before import of VM) and manually collect the OVFs for the next time they perform this activity, in case it reproduces again.

Any ideas on what could have happened? Maybe some code review could reveal such possibility?

Comment 6 Arik 2021-07-19 12:21:47 UTC

(In reply to Germano Veit Michel from comment #4)
> Any ideas on what could have happened? Maybe some code review could reveal
> such possibility?

If the VM started with those disks, it rules out that it's the same as bz 1970792
Maybe we didn't get to updating the OVFSTORE after the VM was set with the disk and before the storage domain was detached - was the VM created with that disk or was is plugged to the VM later?

Comment 7 Arik 2021-07-19 12:40:06 UTC

(In reply to Arik from comment #6)
> (In reply to Germano Veit Michel from comment #4)
> > Any ideas on what could have happened? Maybe some code review could reveal
> > such possibility?
> 
> If the VM started with those disks, it rules out that it's the same as bz
> 1970792
> Maybe we didn't get to updating the OVFSTORE after the VM was set with the
> disk and before the storage domain was detached - was the VM created with
> that disk or was is plugged to the VM later?

Actually I see now that the VM was started with both disks few days before the storage domain was detached and that there's an OVF update when detaching the storage domain that I was not aware of
So the above is probably irrelevant

Comment 8 Eyal Shenitzky 2021-07-19 14:16:01 UTC

Germano,

Can you provide more details on the environment? this is the only case they encounter it?

Comment 9 Germano Veit Michel 2021-07-21 00:57:07 UTC

(In reply to Arik from comment #6)
> (In reply to Germano Veit Michel from comment #4)
> > Any ideas on what could have happened? Maybe some code review could reveal
> > such possibility?
> 
> If the VM started with those disks, it rules out that it's the same as bz
> 1970792
> Maybe we didn't get to updating the OVFSTORE after the VM was set with the
> disk and before the storage domain was detached - was the VM created with
> that disk or was is plugged to the VM later?

I cannot be that right? That is expected when the SD is in maintenance so the VM edit wont update the OVFs on a particular SD.

Here on this bug all VMs disks are on a single SD, and the SD was put in maintenance and OVFs updated after the VM was shutdown,
when it already had its disks. OVFs were updated sucessfully by the engine at SD maintenance, no VM config change was done after.

I tried the hotplug theory to attempt to reproduce, but OVF was updated fine.

Comment 10 Germano Veit Michel 2021-07-21 00:59:23 UTC

Sorry Arik, I've replied to your needinfo from comment #6 without reading comment #7. All good, I agree.

Comment 12 Germano Veit Michel 2021-07-21 01:06:19 UTC

I was thinking a bit more about this...

a) If the missing Disk was not included in the OVF, the Disk would have been present in the unregistered_disks and available for loose disk import, its not.
b) If the missing Disk was present in the OVF, the Disk would have been imported (in base_disks) and attached to the VM during VM import. its not too.

It feels like it was referenced on the OVF and removed from unregistered, otherwise would have been left untouched? But at the end it was not really imported and attached. Maybe a code review on the ImportVm stuff to see if any ideas come up on how this could happen?

Customer has been given instructions to capture the DB before import, and also a copy of the OVF disks on the SD before attach. In case it happens again we should have more data to narrow down more.

Comment 13 Fulvio Carrus 2021-09-20 13:20:17 UTC

Hi team,
I stumbled upon bz 1910858 by chance, and it could explain *everything* - if we consider that :
- CPU and RAM are kept in the same XML as the disk list
- the issues reported by the Customer also mention wrong CPU and RAM values, almost as if they "reverted" to some time earlier
- the Customer moves the SDs from one manager to the other regularly (every 6 months) and VMs can vary in between
- both the managers are not up to date

Could this be the same issue?

Comment 14 Germano Veit Michel 2021-09-20 21:21:15 UTC

(In reply to Fulvio Carrus from comment #13)
> Hi team,
> I stumbled upon bz 1910858 by chance, and it could explain *everything* - if
> we consider that :
> - CPU and RAM are kept in the same XML as the disk list
> - the issues reported by the Customer also mention wrong CPU and RAM values,
> almost as if they "reverted" to some time earlier
> - the Customer moves the SDs from one manager to the other regularly (every
> 6 months) and VMs can vary in between
> - both the managers are not up to date
> 
> Could this be the same issue?

Assuming the option for root cause that the OVF is not updated properly but the SRC RHV-M logs sucessful OVF update on the SD, possibly yes.
The previously given AP where we capture the OVF in between would help clarifing this too.

Comment 17 Arik 2022-05-11 09:20:40 UTC

Yeah, I also think that is might be a result of having stale data in vm_ovf_generation
Not sure if the fix for bz 1910858 would resolve it, but I believe that if we provide a good fix for bz 1705338 it would do it

Comment 18 Arik 2022-05-24 09:00:15 UTC

We discussed offline that we should check the engine log and see what's is going on during import-vm

Comment 20 Casper (RHV QE bot) 2022-09-19 15:01:13 UTC

This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000).

Comment 22 Arik 2022-09-27 14:14:03 UTC

Pavel, please set the doc text

Comment 26 Casper (RHV QE bot) 2022-10-11 12:00:51 UTC

This bug has low overall severity and passed an automated regression suite, and is not going to be further verified by QE. If you believe special care is required, feel free to re-open to ON_QA status.

Note You need to log in before you can comment on or make changes to this bug.