Bug 1897906 - Enable suspending a VM with an NVDIMM device
Summary: Enable suspending a VM with an NVDIMM device
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.4.3.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Milan Zamazal
QA Contact: meital avital
URL:
Whiteboard:
Depends On: 1902691 1923905
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-15 14:05 UTC by Nisim Simsolo
Modified: 2022-04-13 07:35 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-04-13 07:35:41 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)
qemu log (132.80 KB, text/plain)
2020-11-15 14:08 UTC, Nisim Simsolo
no flags Details
vdsm.log (3.06 MB, text/plain)
2020-11-15 14:08 UTC, Nisim Simsolo
no flags Details
engine.log (2.29 MB, text/plain)
2020-11-15 14:09 UTC, Nisim Simsolo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 113199 0 master MERGED core: Prevent hibernating VMs with NVDIMMs 2021-02-14 07:54:59 UTC

Description Nisim Simsolo 2020-11-15 14:05:01 UTC
Description of problem:
Suspending VM with NVDIMM device is hanging forever (in "saving" state), eventually, this VM cannot be used again, and there's no reasonable workaround for using this VM again (Any action on the VM such as power off, or resume is grayed out and host cannot be moved to maintenance).

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.3.8-0.1.el8ev.noarch
vdsm-4.40.35-1.el8ev.x86_64
libvirt-daemon-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64
qemu-kvm-5.1.0-13.module+el8.3.0+8382+afc3bbea.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Run VM with NVDIMM device attached.
2. Suspend VM.
3.

Actual results:
Suspending VM hangs forever

Expected results:
Suspend Vm should succeed, or declined by WebAdmin if suspending VM with NVDIMM should not be allowed.

Additional info:
Attached vdsm.log and engine.log (VM suspended at:
2020-11-15 14:54:22,371+02 INFO  [org.ovirt.engine.core.bll.HibernateVmCommand] (EE-ManagedThreadFactory-engine-Thread-546738) [5c759d51-ca38-478c-a50b-6d33e0b94ec6] Running command: HibernateVmCommand internal:
 false. Entities affected :  ID: a125d2eb-91fa-4399-b931-6c1aea6d9d55 Type: VMAction group HIBERNATE_VM with role type USER

Comment 1 Nisim Simsolo 2020-11-15 14:08:11 UTC
Created attachment 1729522 [details]
qemu log

Comment 2 Nisim Simsolo 2020-11-15 14:08:41 UTC
Created attachment 1729523 [details]
vdsm.log

Comment 3 Nisim Simsolo 2020-11-15 14:09:11 UTC
Created attachment 1729524 [details]
engine.log

Comment 4 Milan Zamazal 2020-11-19 11:17:52 UTC
It works for me, with a much more smaller NVDIMM (~5 GB) than yours (~256 GB). Saving state takes about half minute for my fully emulated device. I can see in the attached logs that it's still saving state after more than hour in your case. There is no relevant error and no end.

Nisim, what NVDIMM modes did you use on the host and in the guest? And it is a hardware device, right? How long did you actually wait? Would it be possible to retest with fsdax and devdax modes (the latter requires switching SELinux to permissive mode)? I think it would be interesting to see if it happens with those modes too.

Comment 5 Nisim Simsolo 2020-11-19 12:37:58 UTC
> Nisim, what NVDIMM modes did you use on the host and in the guest? And it is
> a hardware device, right? 
HW device, in devdax mode.

> How long did you actually wait? 
more than 2 hours

> Would it be possible to retest with fsdax and devdax modes (the latter requires
> switching SELinux to permissive mode)? I think it would be interesting to
> see if it happens with those modes too.

Yes, I will update you with the outcome.

Comment 6 Milan Zamazal 2020-11-27 11:20:43 UTC
Nisim, did you have a chance to test with the different modes already?

Comment 7 Nisim Simsolo 2020-11-29 11:59:02 UTC
(In reply to Milan Zamazal from comment #6)
> Nisim, did you have a chance to test with the different modes already?

Yes, it behaves the same when using fsdax and devdax (with permissive SELinux)

Comment 8 Milan Zamazal 2020-11-30 12:41:18 UTC
Thanks for testing, a QEMU bug filed: https://bugzilla.redhat.com/1902691

Comment 9 Milan Zamazal 2021-01-04 12:54:44 UTC
Let's disable suspending VMs with NVDIMMs for now, see Bug 1912426. We will handle this bug and enable suspending VMs with NVDIMMs again once a platform fix is available.

Comment 10 Jing Qi 2021-01-11 02:49:36 UTC
Verified with libvirt upstream code version v7.0.0-rc1
& qemu-kvm-5.1.0-17.module+el8.3.1+9213+7ace09c3.x86_64

Start vm with below xml -
<memory model='nvdimm' access='shared'>
      <source>
        <path>/dev/dax0.0</path>
        <alignsize unit='KiB'>2048</alignsize>
        <pmem/>
      </source>
      <target>
        <size unit='KiB'>262144000</size>
        <node>0</node>
        <label>
          <size unit='KiB'>128</size>
        </label>
      </target>
      <address type='dimm' slot='0'/>
    </memory>

From the qemu-cmd line, there is no "prealloc" and there is no long waiting time when issuing "start vm" command.
-object memory-backend-file,id=memnvdimm0,mem-path=/dev/dax0.0,share=yes,size=268435456000,align=2097152,pmem=yes -device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0

Comment 11 Milan Zamazal 2021-01-28 10:17:30 UTC
The bot shouldn't change the status just because a bug is mentioned anywhere in the commit message...

Comment 12 Arik 2022-04-13 07:35:41 UTC
Closing since the platform bugs have not been prioritized for el8
If we upgrade to el9 and the depended platform bugs are resolved, we should do that


Note You need to log in before you can comment on or make changes to this bug.