Bug 1902691

Summary: virDomainSave a VM with an NVDIMM device is very slow
Product: Red Hat Enterprise Linux 9 Reporter: Milan Zamazal <mzamazal>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
libvirt sub component: General QA Contact: liang cong <lcong>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: medium CC: ailan, dgilbert, eric.auger, imammedo, jinqi, jinzhao, jsuchane, juzhang, lmen, smitterl, virt-maint, xuzhang, yalzhang, yuhuang
Version: 9.1Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1923905 (view as bug list) Environment:
Last Closed: 2022-10-19 03:29:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1923905    
Bug Blocks: 1897906    

Description Milan Zamazal 2020-11-30 12:39:47 UTC
Description of problem:

When a VM is started from libvirt with an NVDIMM device, e.g.

  <memory access="shared" model="nvdimm">
      <source>
          <path>/dev/pmem0</path>
           <alignsize unit="KiB">2048</alignsize>
           <pmem />
      </source>
      <target>
          <size unit="KiB">259917824</size>
          <node>0</node>
          <label>
              <size unit="KiB">128</size>
          </label>
      </target>
      <alias name="ua-0b946fd6-9a90-4882-a23c-2d027965e8cd" />
  </memory>

resulting in the following on the QEMU command line:

  -object memory-backend-file,id=memua-0b946fd6-9a90-4882-a23c-2d027965e8cd,prealloc=yes,mem-path=/dev/pmem0,share=yes,size=266155851776,align=2097152,pmem=yes \
  -device nvdimm,node=0,label-size=131072,memdev=memua-0b946fd6-9a90-4882-  a23c-2d027965e8cd,id=ua-0b946fd6-9a90-4882-a23c-2d027965e8cd,slot=0 \

it can take a very long time to get it suspended via virDomainSuspend libvirt call. For instance, suspending a VM with the 256 GB NVDIMM above, backed by a hardware NVDIMM device, doesn't finish within 2 hours.

Version-Release number of selected component (if applicable):

libvirt-daemon-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64
qemu-kvm-5.1.0-13.module+el8.3.0+8382+afc3bbea.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Start a VM with an NVDIMM device.
2. Suspend the VM.

Actual results:

It takes a long time to suspend the VM, if it ever finishes.

Expected results:

The VM is suspended in a reasonable time.

Additional info:

See https://bugzilla.redhat.com/1897906 for additional details.

Comment 1 Milan Zamazal 2020-12-10 11:15:25 UTC
Is there any idea how to proceed with the bug? Is there a chance to get it fixed or should RHV document that suspending a VM doesn't work when NVDIMM is present?

Comment 2 John Ferlan 2020-12-17 19:46:24 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Comment 3 Igor Mammedov 2020-12-29 12:19:04 UTC
Suspend on bare QEMU (stop HMP command) is instantaneous.
In comment 1 case, I'd guess libvirt tries to save RAM, which includes NVDIMM atm => long execution time.
We probably should not be saving address ranges belonging to NVDIMM.

Moving BZ to libvirt for now to discuss how to deal with it and then BZ could be cloned to qemu
to implement QEMU part of it.

Comment 4 Jing Qi 2021-01-07 06:04:19 UTC
Version:
libvirt-daemon-6.6.0-11.module+el8.3.1+9196+74a80ca4.x86_64
qemu-kvm-5.1.0-17.module+el8.3.1+9213+7ace09c3.x86_64

kernel version : 4.18.0-240.10.1.el8_3.x86_64 

I tried to use a machine with CPU "Intel Purley 4s (Lightning Ridge) CPU:CascadeLake B0 QS , (8) Optane 512 DIMMS" to do the test --
1. Start the domain with below configuration -
<maxMemory slots='16' unit='KiB'>1048576000</maxMemory>
  <memory unit='KiB'>1703936</memory>
  <currentMemory unit='KiB'>1572864</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB'/>
    </hugepages>
  </memoryBacking>
...
 <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>qemu64</model>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='lahf_lm'/>
    <feature policy='disable' name='svm'/>
    <numa>
      <cell id='0' cpus='0-1' memory='524288' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='524288' unit='KiB'/>
    </numa>
  </cpu>
...
<memory model='nvdimm' access='shared'>
      <source>
        <path>/dev/pmem0</path>
        <alignsize unit='KiB'>2048</alignsize>
        <pmem/>
      </source>
      <target>
        <size unit='KiB'>131072</size>
        <node>0</node>
        <label>
          <size unit='KiB'>128</size>
        </label>
      </target>
      <alias name='ua-0b946fd6-9a90-4882-a23c-2d027965e8cd'/>
      <address type='dimm' slot='0'/>
    </memory>
  </devices>

2. # virsh suspend vm1
Domain vm1 suspended

3. # virsh resume vm1
Domain vm1 resumed

The scenario works in pure libvirt environment.

Comment 5 Milan Zamazal 2021-01-07 09:18:31 UTC
Jing, do you mean suspending the VM is reasonably fast in your environment? Can it be because you use hugepages?

Comment 6 Jing Qi 2021-01-07 10:06:15 UTC
Yes, the vm can suspend and resume quickly from my test.  
Did more test without hugepage used, the same result was got.

Comment 7 Milan Zamazal 2021-01-07 10:19:12 UTC
Ah, I missed your target NVDIMM device size is small, only 128 MB, this is indeed fast. Could you try it with a larger device? The problem with a hanging suspend was observed when using an NVDIMM device of 256 GB size.

Comment 8 Igor Mammedov 2021-01-07 10:43:35 UTC
Saving VM's RAM to file closest to offline migration
Adding David to CC to look at the issue from QEMU side.

Comment 9 Jaroslav Suchanek 2021-01-07 16:24:23 UTC
Michal, can you please look into this from libvirt side? Thanks.

Comment 10 Michal Privoznik 2021-01-07 19:39:42 UTC
(In reply to Milan Zamazal from comment #0)
>
> it can take a very long time to get it suspended via virDomainSuspend
> libvirt call. For instance, suspending a VM with the 256 GB NVDIMM above,
> backed by a hardware NVDIMM device, doesn't finish within 2 hours.
> 
> Version-Release number of selected component (if applicable):
> 
> libvirt-daemon-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64
> qemu-kvm-5.1.0-13.module+el8.3.0+8382+afc3bbea.x86_64
> 
> How reproducible:
> 
> 100%
> 
> Steps to Reproduce:
> 1. Start a VM with an NVDIMM device.
> 2. Suspend the VM.
> 

Milan, are you sure about virDomainSuspend()? Because all that it does is stopping vCPUs by issuing "stop" on the monitor (plus saving new internal state, but that's not connected to NVDIMMs in any way). Could it be virDomainSave() or virDomainManagedSave() that you had on mind?

Comment 11 Jing Qi 2021-01-08 02:55:40 UTC
(In reply to Milan Zamazal from comment #7)
> Ah, I missed your target NVDIMM device size is small, only 128 MB, this is
> indeed fast. Could you try it with a larger device? The problem with a
> hanging suspend was observed when using an NVDIMM device of 256 GB size.

 <memory model='nvdimm' access='shared'>
      <source>
        <path>/dev/dax0.0</path>
        <alignsize unit='KiB'>2048</alignsize>
      </source>
      <target>
        <size unit='KiB'>262144000</size>
        <node>0</node>
        <label>
          <size unit='KiB'>128</size>
        </label>
      </target>
      <address type='dimm' slot='0'/>
    </memory>

In the guest, occupied the space in /dev/pmem0.
And from df command, got 61% space occupied.
 
/dev/pmem0            262013956 159466148 102547808  61% /mnt


# virsh suspend rhel8
Domain rhel8 suspended

# virsh resume rhel8
Domain rhel8 resumed

It's still quite quick to do suspend/resume for it.

Comment 12 Michal Privoznik 2021-01-08 09:13:47 UTC
(In reply to Jing Qi from comment #11)
> 
> It's still quite quick to do suspend/resume for it.

Yes, because 'virsh suspend' just stops vCPUs, there is no memory saving and thus it's almost instant. I'm suspecting that Milan might have thought about different API. Milan?

Comment 13 Igor Mammedov 2021-01-08 10:04:06 UTC
(In reply to Michal Privoznik from comment #12)
> (In reply to Jing Qi from comment #11)
> > 
> > It's still quite quick to do suspend/resume for it.
> 
> Yes, because 'virsh suspend' just stops vCPUs, there is no memory saving and
> thus it's almost instant. I'm suspecting that Milan might have thought about
> different API. Milan?

'memory saving' was my assumption, and we may need to make up some sort of
new command or an option to existing one to skip nvdimm regions.
Of cause if one would move such saved VM to another host, one should also
move manually corresponding NVDIMM content (same like with any other storage).

Comment 14 Milan Zamazal 2021-01-08 10:33:55 UTC
Oh yes, I indeed meant virDomainSave() call. Sorry for the confusion.

Comment 15 Milan Zamazal 2021-01-08 10:40:53 UTC
(In reply to Igor Mammedov from comment #13)

> Of cause if one would move such saved VM to another host, one should also
> move manually corresponding NVDIMM content (same like with any other
> storage).

Yes, we currently pin VMs with NVDIMMs to particular NVDIMM devices and the corresponding hosts and they can't be migrated elsewhere.

Comment 16 Michal Privoznik 2021-01-08 10:54:00 UTC
I'm not really sure what's the right thing to do here. On one hand, NVDIMM is mapped into guest memory and thus when saving the guest memory (for later use) NVDIMMs should be saved with it. However, NVDIMMs are persistent so we might get away with offloading this responsibility to users/mgmt apps. Of course, things will go terribly wrong when users don't save and restore NVDIMM themselves.

Comment 17 Milan Zamazal 2021-01-08 11:23:27 UTC
In theory, one could save a VM, use the attached NVDIMM for another purpose after and then restore the original VM. That would break of course but would anybody do the same with a normal hard drive attached to a VM? If such a scenario might make sense with an NVDIMM, would adding a corresponding virDomainSave() flag to not save NVDIMM memory be a solution?

Comment 18 Michal Privoznik 2021-01-08 11:45:02 UTC
(In reply to Milan Zamazal from comment #17)
> In theory, one could save a VM, use the attached NVDIMM for another purpose
> after and then restore the original VM. That would break of course but would
> anybody do the same with a normal hard drive attached to a VM? If such a
> scenario might make sense with an NVDIMM, would adding a corresponding
> virDomainSave() flag to not save NVDIMM memory be a solution?

Yes, that could work. Libvirt would then set another flag for 'savevm' that would make it skip all NVDIMMs. Hopefully, no one will need more fine grained approach (choosing per each NVDIMM whether to save it or not). Igor, would this work for qemu?

Comment 19 Igor Mammedov 2021-01-25 10:12:12 UTC
(In reply to Michal Privoznik from comment #18)
> (In reply to Milan Zamazal from comment #17)
> > In theory, one could save a VM, use the attached NVDIMM for another purpose
> > after and then restore the original VM. That would break of course but would
> > anybody do the same with a normal hard drive attached to a VM? If such a
> > scenario might make sense with an NVDIMM, would adding a corresponding
> > virDomainSave() flag to not save NVDIMM memory be a solution?
> 
> Yes, that could work. Libvirt would then set another flag for 'savevm' that
> would make it skip all NVDIMMs. Hopefully, no one will need more fine
> grained approach (choosing per each NVDIMM whether to save it or not). Igor,
> would this work for qemu?

It should work fine for QEMU

Comment 20 Michal Privoznik 2021-02-02 08:09:45 UTC
I just realized that when I was mentioning 'savevm' I really meant 'migrate' command, because virDomainSave() uses migration into an FD (which is just opened user provided path).

Comment 21 John Ferlan 2021-09-08 13:19:25 UTC
Bulk update - Move RHEL-AV bugs to RHEL

Comment 23 Milan Zamazal 2022-05-03 15:55:43 UTC
This is now targeted to RHEL 9 so it won't fix the problem in RHV. But a contingent fix might be still useful if another product would like to support both NVDIMM and suspending a VM.

Comment 24 liang cong 2022-05-23 06:06:04 UTC
This bug fixed depends on the qemu bug:bug#1923905,confirmed with michal, this bug will be fixed once the dependent bug is fixed. So stale this bug to 2022-06-30

Comment 25 Michal Privoznik 2022-06-01 13:41:58 UTC
Setting stale date to match the QEMU bug.

Comment 29 John Ferlan 2022-07-20 11:09:08 UTC
NB: Removed RHV as dependent product since this is a RHEL9 issue

Comment 30 yalzhang@redhat.com 2022-10-19 03:29:08 UTC
close it as the dependent qemu bug is closed. And for RHV, this feature is not an issue any more(refer to bug 1897906#c9)