Bug 2180679

Summary: Restore guest fails after unplugging dimm memory device with virtio-mem memory device attached
Product: Red Hat Enterprise Linux 9 Reporter: liang cong <lcong>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
libvirt sub component: General QA Contact: liang cong <lcong>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: chayang, jdenemar, jinzhao, juzhang, lmen, menli, mprivozn, qizhu, virt-maint, zhguo
Version: 9.2Keywords: AutomationTriaged, Triaged, Upstream
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-9.4.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:31:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 9.4.0
Embargoed:

Description liang cong 2023-03-22 04:05:18 UTC
Description of problem:
Restore guest fails after unplugging dimm memory device with virtio-mem memory device attached

Version-Release number of selected component (if applicable):
libvirt-9.0.0-9.el9_2.x86_64
qemu-kvm-7.2.0-12.el9_2.x86_64

How reproducible:
100%


Steps to Reproduce:
1. In guest set kernel parameter memhp_default_state to online_movable
# grubby --update-kernel=ALL --remove-args=memhp_default_state --args=memhp_default_state=online_movable
# reboot



2. Define and start the guest with dimm, virtio-mem memory devices related config xml as below:
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>0</node>
   </target>
 </memory>

<memory model='virtio-mem'>
    <target>
      <size unit='KiB'>2097152</size>
      <node>0</node>
      <block unit='KiB'>2048</block>
      <requested unit='KiB'>1048576</requested>
    </target>
</memory>
...


3. Wait for guest booting up and virtio-mem current size is not 0. 
# virsh dumpxml vm1 --xpath "//memory[@model='virtio-mem']"
<memory model="virtio-mem">
  <target>
    <size unit="KiB">2097152</size>
    <node>0</node>
    <block unit="KiB">2048</block>
    <requested unit="KiB">1048576</requested>
    <current unit="KiB">1048576</current>
  </target>
  <alias name="virtiomem0"/>
  <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</memory>



4. Prepare a dimm memory device with config xml:
# cat memory1.xml
<memory model='dimm' discard='no'>
      <source>
        <pagesize unit='KiB'>4</pagesize>
      </source>
      <target>
        <size unit='KiB'>262144</size>
        <node>0</node>
      </target>
      <alias name='dimm0'/>
      <address type='dimm' slot='0' base='0x100000000'/>
    </memory>


5. Hot unplug the dimm memory device with config xml in step4
# virsh detach-device vm1 memory1.xml
Device detached successfully


6. Save and restore the domain.
# virsh save vm1 vm1.save
Domain 'vm1' saved to vm1.save

# virsh restore vm1.save
error: Failed to restore domain from vm1.save
error: internal error: qemu unexpectedly closed the monitor: 2023-03-20T04:36:49.299229Z qemu-kvm: Property 'memaddr' changed from 0x110000000 to 0x100000000
2023-03-20T04:36:49.299264Z qemu-kvm: Failed to load virtio-mem-device:tmp
2023-03-20T04:36:49.299269Z qemu-kvm: Failed to load virtio-mem:virtio
2023-03-20T04:36:49.299274Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:02.6:00.0/virtio-mem'
2023-03-20T04:36:49.299755Z qemu-kvm: load of migration failed: Invalid argument



Actual results:
Restore fails with error info in step6


Expected results:
Restore successfully


Additional info:
If dimm is defined after virtio-mem, or virtio-mem current size is 0 then no error.

Comment 1 Michal Privoznik 2023-03-23 12:35:05 UTC
I believe this is a QEMU bug. I'm able to reproduce and the cmd line that's generated on the first start of the VM contains the following:

-object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/var/lib/libvirt/qemu/ram/3-fedora/dimm0","discard-data":false,"share":false,"prealloc":true,"prealloc-threads":16,"size":268435456}' \
-device '{"driver":"pc-dimm","node":0,"memdev":"memdimm0","id":"dimm0","slot":0}' \
-object '{"qom-type":"memory-backend-file","id":"memvirtiomem0","mem-path":"/hugepages2M/libvirt/qemu/3-fedora","discard-data":true,"share":false,"reserve":false,"size":2147483648}' \
-device '{"driver":"virtio-mem-pci","node":0,"block-size":2097152,"requested-size":1073741824,"memdev":"memvirtiomem0","prealloc":true,"id":"virtiomem0","bus":"pci.0","addr":"0xa"}' \

and after the dimm0 is unplugged, saved and restored this is the cmd line that libvirt tries to run:

-object '{"qom-type":"memory-backend-file","id":"memvirtiomem0","mem-path":"/hugepages2M/libvirt/qemu/4-fedora","discard-data":true,"share":false,"reserve":false,"size":2147483648}' \
-device '{"driver":"virtio-mem-pci","node":0,"block-size":2097152,"requested-size":1073741824,"memdev":"memvirtiomem0","prealloc":true,"id":"virtiomem0","bus":"pci.0","addr":"0xa"}' \

IOW - the pc-dimm device is gone (which is expected - it was unplugged). Let me switch over to QEMU for further investigation.

Comment 2 David Hildenbrand 2023-03-24 22:20:29 UTC
QEMU will auto-assign memory addresses for memory devices, however these addresses cannot really get migrated, because they have to be known when plugging such a device. So upper layers have to provide these properties when creating the devices (e.g., read property on source and configure for destination). Similar to, DIMM slots or PCI addresses.

That's nothing special about virtio-mem (it's the only one that actually checks instead of crashing the guest later :) ), it's the same for all memory devices, just the "address property" name differs.

For DIMMs/NVDIMMS it's the "addr" property. For virtio-mem/virtio-pmem, it's the "memaddr" property.

I would assume that such handling is already in place for DIMMs/NVDIMMs? Otherwise unplugging some DIMMs and migrating the VM would crash the VM on the destination when the location of some DIMMs changes in guest address space.

@Michal, is such handling for DIMMs already in place? Then we need similar handling for virtio-mem/virtio-pmem in libvirt. If it's not around for DIMMs, then we need it there as well ...

Comment 3 Michal Privoznik 2023-03-28 08:49:04 UTC
Yeah, it's implemented for DIMMs and indeed virtio-mem was the missing piece. After I made memaddr stable I can restore from a savefile. So this is indeed a libvirt bug. Let me switch back to libvirt and post patches.

Comment 4 Michal Privoznik 2023-03-28 11:58:37 UTC
Patches posted on the list:

https://listman.redhat.com/archives/libvir-list/2023-March/239051.html

Comment 5 Michal Privoznik 2023-05-26 14:48:36 UTC
Merged upstream as:

a1bdffdd96 qemu_command: Generate .memaddr for virtio-mem and virtio-pmem
2c15506254 qemu: Fill virtio-mem/virtio-pmem .memaddr at runtime
677156f662 conf: Introduce <address/> for virtio-mem and virtio-pmem

v9.4.0-rc1-5-ga1bdffdd96

Comment 6 liang cong 2023-06-05 06:58:41 UTC
Pre-verified on upstream build libvirt v9.4.0-12-gf26923fb2e 

Test steps:
1. In guest set kernel parameter memhp_default_state to online_movable
# grubby --update-kernel=ALL --remove-args=memhp_default_state --args=memhp_default_state=online_movable
# reboot



2. Define and start the guest with dimm, virtio-mem memory devices related config xml as below:
   <memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
   </target>
 </memory>
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
   </target>
 </memory>
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
   </target>
 </memory>
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
   </target>
 </memory>
<memory model='virtio-mem'>
    <target>
      <size unit='KiB'>2097152</size>
      <node>0</node>
      <block unit='KiB'>2048</block>
      <requested unit='KiB'>1048576</requested>
    </target>
</memory>
...


3. Wait for guest booting up and virtio-mem current size is not 0. 
# virsh dumpxml vm1 --xpath "//memory[@model='virtio-mem']"
<memory model="virtio-mem">
  <target>
    <size unit="KiB">2097152</size>
    <node>0</node>
    <block unit="KiB">2048</block>
    <requested unit="KiB">1048576</requested>
    <current unit="KiB">0</current>
    <address base="0x120000000"/>
  </target>
  <alias name="virtiomem0"/>
  <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</memory>


4. Prepare a dimm memory device with config xml:
# cat memory1.xml
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
  </target>
  <alias name="dimm2"/>
  <address type="dimm" slot="2" base="0x110000000"/>
</memory>


5. Hot unplug the dimm memory device with config xml in step4
# virsh detach-device vm1 memory1.xml
Device detached successfully


6. Save and restore the domain.
# virsh save vm1 vm1.save
Domain 'vm1' saved to vm1.save

# virsh restore vm1.save

Comment 7 liang cong 2023-07-25 07:40:48 UTC
Verified on build:
# rpm -q libvirt qemu-kvm
libvirt-9.5.0-3.el9.x86_64
qemu-kvm-8.0.0-9.el9.x86_64

Test steps:
1. In guest set kernel parameter memhp_default_state to online_movable
# grubby --update-kernel=ALL --remove-args=memhp_default_state --args=memhp_default_state=online_movable
# reboot



2. Define and start the guest with dimm, virtio-mem memory devices related config xml as below:
   <memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
   </target>
 </memory>
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
   </target>
 </memory>
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
   </target>
 </memory>
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
   </target>
 </memory>
<memory model='virtio-mem'>
    <target>
      <size unit='KiB'>2097152</size>
      <node>0</node>
      <block unit='KiB'>2048</block>
      <requested unit='KiB'>1048576</requested>
    </target>
</memory>
...


3. Wait for guest booting up and virtio-mem current size is not 0. 
# virsh dumpxml vm1 --xpath "//memory[@model='virtio-mem']"
<memory model="virtio-mem">
  <target>
    <size unit="KiB">2097152</size>
    <node>0</node>
    <block unit="KiB">2048</block>
    <requested unit="KiB">1048576</requested>
    <current unit="KiB">1048576</current>
    <address base="0x120000000"/>
  </target>
  <alias name="virtiomem0"/>
  <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</memory>



4. Prepare a dimm memory device with config xml:
# cat mem.xml
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
  </target>
  <alias name="dimm3"/>
  <address type="dimm" slot="3" base="0x118000000"/>
</memory>



5. Hot unplug the dimm memory device with config xml in step4
# virsh detach-device vm1 mem.xml
Device detached successfully


6. Save and restore the domain.
# virsh save vm1 vm1.save
Domain 'vm1' saved to vm1.save

# virsh restore vm1.save

7. Check memory device config by virsh dumpxml
# virsh dumpxml vm1 --xpath "//memory"
<memory unit="KiB">4587520</memory>
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
  </target>
  <alias name="dimm0"/>
  <address type="dimm" slot="0" base="0x100000000"/>
</memory>
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
  </target>
  <alias name="dimm1"/>
  <address type="dimm" slot="1" base="0x108000000"/>
</memory>
<memory model="dimm" discard="no">
  <source>
    <pagesize unit="KiB">4</pagesize>
  </source>
  <target>
    <size unit="KiB">131072</size>
    <node>0</node>
  </target>
  <alias name="dimm2"/>
  <address type="dimm" slot="2" base="0x110000000"/>
</memory>
<memory model="virtio-mem">
  <target>
    <size unit="KiB">2097152</size>
    <node>0</node>
    <block unit="KiB">2048</block>
    <requested unit="KiB">1048576</requested>
    <current unit="KiB">1048576</current>
    <address base="0x120000000"/>
  </target>
  <alias name="virtiomem0"/>
  <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</memory>

Comment 10 liang cong 2023-08-07 03:23:01 UTC
mark it verified for comment 7

Comment 12 errata-xmlrpc 2023-11-07 08:31:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6409