1978672 – VMs with block based storage do not recover from hibernation (suspend)

Bug 1978672 - VMs with block based storage do not recover from hibernation (suspend)

Summary: VMs with block based storage do not recover from hibernation (suspend)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	4.4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.4.9
Target Release:	---
Assignee:	Liran Rotenberg
QA Contact:	Tamir
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1417161
TreeView+	depends on / blocked

Reported:	2021-07-02 13:09 UTC by Frank DeLorey
Modified:	2022-12-07 15:26 UTC (History)
CC List:	14 users (show)
Fixed In Version:	vdsm-4.40.90.2
Doc Type:	Bug Fix
Doc Text:	Previously, virtual machines failed to restore when running hibernation on block based storage. In the current release, the data is written as raw data allowing the virtual machine restore to succeed.
Clone Of:
Environment:
Last Closed:	2021-11-16 15:12:47 UTC
oVirt Team:	Virt
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	6163692	None	None	None	2021-07-02 13:58:17 UTC
Red Hat Product Errata	RHBA-2021:4704	None	None	None	2021-11-16 15:12:59 UTC
oVirt gerrit	115668	master	ABANDONED	configurators: move back to lzop for memory dump	2021-09-13 15:39:02 UTC
oVirt gerrit	116982	master	MERGED	configurators: switch to raw memory dump	2021-10-07 08:23:14 UTC
oVirt gerrit	116996	ovirt-4.4.z	MERGED	configurators: switch to raw memory dump	2021-10-07 08:25:45 UTC

Description Frank DeLorey 2021-07-02 13:09:58 UTC

Description of problem:
If a VM with block based storage is put into suspend mode (hibernation) it fails to come out when run.

Version-Release number of selected component (if applicable):
RHV 4.4.6

How reproducible:
Every time

Steps to Reproduce:
1.Select a VM with block based storage and suspend it
2.Select RUN to bring the VM out of hibernation
3.Fails to exit hibernation

Actual results:
Fails 100% of the time to exit from being suspended. The VM must be rebooted or shutdown to come back up and even then the state in the UI never goes to UP it stays at "Powering UP" or "Rebooting"

Expected results:

Should resume without any failures

Additional info:

This works as expected with file based storage. Customer also state dthat this was working in RHV 4.4.5 but stopped working after uograding to RHV 4.4.6

Comment 5 Arik 2021-07-05 10:56:07 UTC

same as bz 1733804 and bz 1708031 that was supposed to be resolved by the fix for bz 1503468 but for latter was closed as WORKSFORME
I managed to reproduce it as well

Comment 6 Arik 2021-07-05 11:10:10 UTC

It seems to work for snapshots with memory and the handling of the memory volumes should be the same..

Comment 7 Arik 2021-07-05 15:52:05 UTC

See https://bugzilla.redhat.com/show_bug.cgi?id=1503468#c77

Comment 11 Arik 2021-07-15 06:54:39 UTC

How about creating a loopback device with --sizelimit in case the memory dump resides on a block device when restoring the memory?

Comment 15 Tamir 2021-10-14 15:12:23 UTC

Verified on RHV 4.4.9-4. All looks good to me.

Env:
  - Engine instance with RHV 4.4.9-4 (ovirt-engine-4.4.9.1-0.13.el8ev) and RHEL 8.5 installed.
  - 3 Hosts with RHV 4.4.9-4 and RHEL 8.5, vdsm-4.40.90.2-1.el8ev, ovirt-engine-4.4.9.1-0.13.el8ev.

Steps:

In Admin Portal:

1. Create a 4.6 data center and a 4.6 cluster.
2. Install the hosts.
3. Add NFS, iSCSI and GlusterFS storage domains.
4. Create 3 RHEL 8.5 VMs, Each VM has a bootable disk in 1 of those storage domains.
5. Run the VMs.
6. Suspend the VMs.
7. Run the VMs.

Results (As Expected):
1. The 4.6 data center and cluster were created.
2. The hosts were installed.
3. The NFS, iSCSI and GlusterFS storage domains were added.
4. The 3 RHEL 8.5 VMs were created.
5. The VMs ran.
6. The VMs were suspended.
7. The VMs ran successfully.

Snapshot test:

Setup: shutdown the VMs.

1. Run the iSCSI VM.
2. Create a snapshot with all disks.
3. Stop the VM.
4. Add 2 more iSCSI disks.
5. Start the VM.
6. Snapshot it with all disks.
7. Stop the VM.
8. Preview the first snapshot.
9. Run the VM (With all the disks/part of those. I checked both cases).
10. Suspend the VM.
11. Run the VM.

Results:
In steps 1, 5, 9, 11 the VM is running correctly without any error.
In steps 3, 7 the VM is stopped.
In steps 2, 6 The snapshot was created.
In step 4, The disks were added.
In step 8, The first snapshot is previewed.
In step 10, The VM is suspended without any errors.

Comment 16 Arik 2021-10-14 21:26:00 UTC

Tamir, only steps 10 and 11 are relevant in this context and what we are interested in the most is where the memory dump volume is stored
We need to make sure that when suspending a VM and its memory dump volume resides on any of the three storages you have (especially iSCSI since we had a problem on block storage before), we are able to resume from suspension properly (not only that the VM runs but also that the memory is restored)
Can you please make sure that is covered?

Comment 17 Tamir 2021-10-19 14:54:43 UTC

Verified on RHV 4.4.9-5. Thanks for the comment Arik.

Env:
- Engine instance with RHV 4.4.9-5 (ovirt-engine-4.4.9.2-0.6.el8ev) and RHEL 8.5 installed.
- 3 Hosts with RHV 4.4.9-5 and RHEL 8.5, vdsm-4.40.90.2-1.el8ev, ovirt-engine-4.4.9.2-0.6.el8ev.

Steps:

In Admin Portal:

1. Create a 4.6 data center and a 4.6 cluster.
2. Install the hosts.
3. Add NFS, iSCSI and GlusterFS storage domains.
4. Create 3 RHEL 8.5 VMs, Each VM has a bootable disk in 1 of those storage domains.
5. Run the VMs.
6. Create a mount dir (mkdir /mnt/ramdisk)
7. Mount ramfs onto the mount dir (mount -t ramfs -o size=20m ramfs /mnt/ramdisk).
8. Create a small file with content in the mount dir.
9. Run Firefox and open 3 different tabs
10. Suspend the VMs.
11. Run the VMs.
12. Check that firefox is still open with those 3 tabs.
13. Check that the file contents match to those from step 8.

* The NFS and GlusterFS are tested for regression purposes.

Results (As Expected):
1. The 4.6 data center and cluster were created.
2. The hosts were installed.
3. The NFS, iSCSI and GlusterFS storage domains were added.
4. The 3 RHEL 8.5 VMs were created.
5. The VMs ran.
6. The mount dir was created.
7. The ramfs was mounted.
8. The file with content was created in the mount dir.
9. Firefox is running with 3 different tabs
10. The VMs were suspended.
11. The VMs ran successfully.
12. Firefox was still open with those 3 tabs.
13. The file exists with the same data.

Snapshot test:

Setup: shutdown the VMs.

1. Run the iSCSI VM.
2. Open Firefox with 3 different tabs
3. Create a snapshot with all disks.
4. Add 2 more iSCSI disks.
5. Snapshot it with all disks.
6. Stop the VM.
7. Preview the first snapshot.
8. Run the VM (With all the disks/part of those. I checked both cases).
9. Open another window of Firefox with one tab.
10. Suspend the VM.
11. Run the VM.
12. Check that all the Firefox instances are open as stated before.

Results:
In steps 1, 8, 11, the VM is running correctly without any error.
In steps 6, the VM is stopped.
In steps 3, 5, The snapshot was created.
In step 4, The disks were added.
In step 7, The first snapshot is previewed.
In step 10, The VM is suspended without any errors.
In steps 2, 9, a firefox window is opened with the number of tabs specified.
In step 12, The firefox instances are still open with the same tabs.

Comment 21 errata-xmlrpc 2021-11-16 15:12:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) [ovirt-4.4.9]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4704

Note You need to log in before you can comment on or make changes to this bug.