Bug 1923905

Summary: Allow 'migrate' to skip NVDIMMs
Product: Red Hat Enterprise Linux 8 Reporter: Michal Privoznik <mprivozn>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Live Migration QA Contact: Mario Casquero <mcasquer>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: medium CC: ailan, chayang, dgilbert, eric.auger, imammedo, jen, jinqi, jinzhao, jsuchane, juzhang, lmen, mzamazal, smitterl, virt-maint, xiaohli, xuzhang, yuhuang
Version: 8.3Keywords: RFE, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1902691 Environment:
Last Closed: 2022-08-02 07:28:09 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1897906, 1902691    

Description Michal Privoznik 2021-02-02 08:05:44 UTC
+++ This bug was initially created as a clone of Bug #1902691 +++

Description of problem:

When a VM is started from libvirt with an NVDIMM device, e.g.

  <memory access="shared" model="nvdimm">
      <source>
          <path>/dev/pmem0</path>
           <alignsize unit="KiB">2048</alignsize>
           <pmem />
      </source>
      <target>
          <size unit="KiB">259917824</size>
          <node>0</node>
          <label>
              <size unit="KiB">128</size>
          </label>
      </target>
      <alias name="ua-0b946fd6-9a90-4882-a23c-2d027965e8cd" />
  </memory>

resulting in the following on the QEMU command line:

  -object memory-backend-file,id=memua-0b946fd6-9a90-4882-a23c-2d027965e8cd,prealloc=yes,mem-path=/dev/pmem0,share=yes,size=266155851776,align=2097152,pmem=yes \
  -device nvdimm,node=0,label-size=131072,memdev=memua-0b946fd6-9a90-4882-  a23c-2d027965e8cd,id=ua-0b946fd6-9a90-4882-a23c-2d027965e8cd,slot=0 \

it can take a very long time to get it suspended via virDomainSave() libvirt call. For instance, suspending a VM with the 256 GB NVDIMM above, backed by a hardware NVDIMM device, doesn't finish within 2 hours.

Version-Release number of selected component (if applicable):

libvirt-daemon-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64
qemu-kvm-5.1.0-13.module+el8.3.0+8382+afc3bbea.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Start a VM with an NVDIMM device.
2. Suspend the VM.

Actual results:

It takes a long time to suspend the VM, if it ever finishes.

Expected results:

The VM is suspended in a reasonable time.

Additional info:

Libvirt call virDomainSave results in stopping vCPUs and then migrating into an FD (which represents a file where user wants the state to be saved). Problem here is that since NVDIMMs are mapped into guest's memory their content is saved into FD too. This might be unnecessary - NVDIMMs are capable of keeping data persistent.

{"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-383"}

Comment 1 John Ferlan 2021-02-11 19:30:05 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Comment 3 John Ferlan 2021-09-09 18:54:41 UTC
Bulk update: Move RHEL-AV bugs to RHEL8.

Add RHV as dependent product since that's what the cloned from bug has. That means that if this bug needs resolution/testing for RHEL9, then a clone must be created.

Comment 4 Eric Auger 2022-01-25 15:32:19 UTC
Hi Dave, John,

I have some interest working on this BZ and I see it is opened for a while. I just would like to know if I would tread on someone's toes or if the fix is under development by sbdy else at another level of the stack (saw Bug 1902691 - virDomainSave a VM with an NVDIMM device is very slow). Of course I would need to get access to an x86 beaker machine with such NVDIMM slot.

Thanks

Eric

Comment 5 Jaroslav Suchanek 2022-06-01 13:42:19 UTC
Dave, please see Eric's question in comment 4. Thanks.

Comment 6 Dr. David Alan Gilbert 2022-06-14 09:35:15 UTC
(In reply to Eric Auger from comment #4)
> Hi Dave, John,
> 
> I have some interest working on this BZ and I see it is opened for a while.
> I just would like to know if I would tread on someone's toes or if the fix
> is under development by sbdy else at another level of the stack (saw Bug
> 1902691 - virDomainSave a VM with an NVDIMM device is very slow). Of course
> I would need to get access to an x86 beaker machine with such NVDIMM slot.

I think it's fine from my side if you'd like to work on that.
IMHO what's really needed is being able to skip any given RAMblock or device,
probably as a flag to memory-backend ?


> 
> Thanks
> 
> Eric

Comment 7 Eric Auger 2022-06-20 15:37:10 UTC
Hi Dave,

In January this year I had cycles to work on some new BZs. However unfortunately this is no more the case at the moment. Moving the BZ to the backlog again. I will check the BZ assignment later if I have the opportunity again to work on this.

Comment 13 RHEL Program Management 2022-08-02 07:28:09 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 14 Mario Casquero 2022-09-19 07:28:46 UTC
Seems this bug blocks https://bugzilla.redhat.com/show_bug.cgi?id=1902691 so is there any plan to fix this or not?

Comment 15 Chao Yang 2022-09-26 03:04:25 UTC
(In reply to Mario Casquero from comment #14)
> Seems this bug blocks https://bugzilla.redhat.com/show_bug.cgi?id=1902691 so
> is there any plan to fix this or not?

Hi Eric,

Could you please help answer the question above? Sorry for bothering if you are not the correct contact

Comment 16 Eric Auger 2022-09-26 07:27:46 UTC
(In reply to Chao Yang from comment #15)
> (In reply to Mario Casquero from comment #14)
> > Seems this bug blocks https://bugzilla.redhat.com/show_bug.cgi?id=1902691 so
> > is there any plan to fix this or not?
> 
> Hi Eric,
> 
> Could you please help answer the question above? Sorry for bothering if you
> are not the correct contact

As mentionned in https://bugzilla.redhat.com/show_bug.cgi?id=1923905#c7 I don't have cycles anymore to look at this BZ. If you want this to be addressed, I guess you should reopen the BZ and increase the priority. Then I hope the BZ will be triaged accordingly.

Comment 17 Chao Yang 2022-09-27 02:32:58 UTC
Hi Michal,

Do you still want this feature in qemu? If yes please reopen it, thank you!

Comment 19 Jeff Nelson 2022-09-27 14:32:36 UTC
BZ is closed; clearing needinfo.