1897906 – Enable suspending a VM with an NVDIMM device

Bug 1897906 - Enable suspending a VM with an NVDIMM device

Summary: Enable suspending a VM with an NVDIMM device

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Virt
Sub Component:
Version:	4.4.3.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Milan Zamazal
QA Contact:	meital avital
Docs Contact:
URL:
Whiteboard:
Depends On:	1902691 1923905
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-15 14:05 UTC by Nisim Simsolo
Modified:	2022-04-13 07:35 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2022-04-13 07:35:41 UTC
oVirt Team:	Virt
Embargoed:
Dependent Products:
Flags:	pm-rhel: ovirt-4.5?

Attachments	(Terms of Use)
qemu log (132.80 KB, text/plain) 2020-11-15 14:08 UTC, Nisim Simsolo	no flags	Details
vdsm.log (3.06 MB, text/plain) 2020-11-15 14:08 UTC, Nisim Simsolo	no flags	Details
engine.log (2.29 MB, text/plain) 2020-11-15 14:09 UTC, Nisim Simsolo	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	113199	0	master	MERGED	core: Prevent hibernating VMs with NVDIMMs	2021-02-14 07:54:59 UTC

Description Nisim Simsolo 2020-11-15 14:05:01 UTC

Description of problem:
Suspending VM with NVDIMM device is hanging forever (in "saving" state), eventually, this VM cannot be used again, and there's no reasonable workaround for using this VM again (Any action on the VM such as power off, or resume is grayed out and host cannot be moved to maintenance).

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.3.8-0.1.el8ev.noarch
vdsm-4.40.35-1.el8ev.x86_64
libvirt-daemon-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64
qemu-kvm-5.1.0-13.module+el8.3.0+8382+afc3bbea.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Run VM with NVDIMM device attached.
2. Suspend VM.
3.

Actual results:
Suspending VM hangs forever

Expected results:
Suspend Vm should succeed, or declined by WebAdmin if suspending VM with NVDIMM should not be allowed.

Additional info:
Attached vdsm.log and engine.log (VM suspended at:
2020-11-15 14:54:22,371+02 INFO  [org.ovirt.engine.core.bll.HibernateVmCommand] (EE-ManagedThreadFactory-engine-Thread-546738) [5c759d51-ca38-478c-a50b-6d33e0b94ec6] Running command: HibernateVmCommand internal:
 false. Entities affected :  ID: a125d2eb-91fa-4399-b931-6c1aea6d9d55 Type: VMAction group HIBERNATE_VM with role type USER

Comment 1 Nisim Simsolo 2020-11-15 14:08:11 UTC

Created attachment 1729522 [details]
qemu log

Comment 2 Nisim Simsolo 2020-11-15 14:08:41 UTC

Created attachment 1729523 [details]
vdsm.log

Comment 3 Nisim Simsolo 2020-11-15 14:09:11 UTC

Created attachment 1729524 [details]
engine.log

Comment 4 Milan Zamazal 2020-11-19 11:17:52 UTC

It works for me, with a much more smaller NVDIMM (~5 GB) than yours (~256 GB). Saving state takes about half minute for my fully emulated device. I can see in the attached logs that it's still saving state after more than hour in your case. There is no relevant error and no end.

Nisim, what NVDIMM modes did you use on the host and in the guest? And it is a hardware device, right? How long did you actually wait? Would it be possible to retest with fsdax and devdax modes (the latter requires switching SELinux to permissive mode)? I think it would be interesting to see if it happens with those modes too.

Comment 5 Nisim Simsolo 2020-11-19 12:37:58 UTC

> Nisim, what NVDIMM modes did you use on the host and in the guest? And it is
> a hardware device, right? 
HW device, in devdax mode.

> How long did you actually wait? 
more than 2 hours

> Would it be possible to retest with fsdax and devdax modes (the latter requires
> switching SELinux to permissive mode)? I think it would be interesting to
> see if it happens with those modes too.

Yes, I will update you with the outcome.

Comment 6 Milan Zamazal 2020-11-27 11:20:43 UTC

Nisim, did you have a chance to test with the different modes already?

Comment 7 Nisim Simsolo 2020-11-29 11:59:02 UTC

(In reply to Milan Zamazal from comment #6)
> Nisim, did you have a chance to test with the different modes already?

Yes, it behaves the same when using fsdax and devdax (with permissive SELinux)

Comment 8 Milan Zamazal 2020-11-30 12:41:18 UTC

Thanks for testing, a QEMU bug filed: https://bugzilla.redhat.com/1902691

Comment 9 Milan Zamazal 2021-01-04 12:54:44 UTC

Let's disable suspending VMs with NVDIMMs for now, see Bug 1912426. We will handle this bug and enable suspending VMs with NVDIMMs again once a platform fix is available.

Comment 10 Jing Qi 2021-01-11 02:49:36 UTC

Verified with libvirt upstream code version v7.0.0-rc1
& qemu-kvm-5.1.0-17.module+el8.3.1+9213+7ace09c3.x86_64

Start vm with below xml -
<memory model='nvdimm' access='shared'>
      <source>
        <path>/dev/dax0.0</path>
        <alignsize unit='KiB'>2048</alignsize>
        <pmem/>
      </source>
      <target>
        <size unit='KiB'>262144000</size>
        <node>0</node>
        <label>
          <size unit='KiB'>128</size>
        </label>
      </target>
      <address type='dimm' slot='0'/>
    </memory>

From the qemu-cmd line, there is no "prealloc" and there is no long waiting time when issuing "start vm" command.
-object memory-backend-file,id=memnvdimm0,mem-path=/dev/dax0.0,share=yes,size=268435456000,align=2097152,pmem=yes -device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0

Comment 11 Milan Zamazal 2021-01-28 10:17:30 UTC

The bot shouldn't change the status just because a bug is mentioned anywhere in the commit message...

Comment 12 Arik 2022-04-13 07:35:41 UTC

Closing since the platform bugs have not been prioritized for el8
If we upgrade to el9 and the depended platform bugs are resolved, we should do that

Note You need to log in before you can comment on or make changes to this bug.