Bug 2158701

Summary: Hotplugged dimm device has wrong alias name in some specific scenario
Product: Red Hat Enterprise Linux 9 Reporter: Fangge Jin <fjin>
Component: libvirtAssignee: Peter Krempa <pkrempa>
libvirt sub component: General QA Contact: liang cong <lcong>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: dzheng, jdenemar, jsuchane, lcong, lmen, nanli, virt-maint
Version: 9.2Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-9.0.0-2.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 07:27:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
virtqemud and qemu log none

Description Fangge Jin 2023-01-06 07:28:42 UTC
Created attachment 1936113 [details]
virtqemud and qemu log

Description of problem:
Start a guest with a dimm device(alias=dimm0) and a nvdimm device(alias=nvdimm1), restart virtqemud, then hotplug another dimm device, the hotplugged dimm device will have wrong alias: dimm1(the correct alias name is dimm2). Then do live migration(or save&restore), it fails with error from qemu:
Unknown ramblock "memdimm1", cannot accept migration

If virtqemud is not restarted, the hotplugged dimm device will have the correct alias: dimm2. And live migration will succeed.

Version-Release number of selected component:
libvirt-8.10.0-2.el9.x86_64

How reproducible:
100% 

Steps to Reproduce:
1. Start a guest with a dimm device(alias=dimm0) and a nvdimm device(alias=nvdimm1)
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>0</node>
  </target>
  <alias name="dimm0"/>
  <address type="dimm" slot="0" base="0x100000000"/>
</memory>
<memory model="nvdimm">
  <source>
    <path>/tmp/nvdimm</path>
  </source>
  <target>
    <size unit="KiB">524288</size>
    <node>1</node>
    <label>
      <size unit="KiB">128</size>
    </label>
  </target>
  <alias name="nvdimm1"/>
  <address type="dimm" slot="1" base="0x110000000"/>
</memory>

2. Restart virtqemud

3. Hotplug another dimm device to guest.
# cat dimm.xml
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>1</node>
  </target>
</memory>

# virsh attach-device rhel9.0.0-full dimm.xml
Device attached successfully

# virsh dumpxml rhel9.0.0-full --xpath //memory
...skip....
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>1</node>
  </target>
  <alias name="dimm1"/>
  <address type="dimm" slot="2" base="0x130000000"/>
</memory>

4. Query memory devices at qemu level:
# virsh qemu-monitor-command rhel9.0.0-full '{"execute":"query-memory-devices"}'
{"return":[{"type":"dimm","data":{"memdev":"/objects/memdimm0","hotplugged":false,"addr":4294967296,"hotpluggable":true,"size":268435456,"slot":0,"node":0,"id":"dimm0"}},{"type":"nvdimm","data":{"memdev":"/objects/memnvdimm1","hotplugged":false,"addr":4563402752,"hotpluggable":true,"size":536739840,"slot":1,"node":1,"id":"nvdimm1"}},{"type":"dimm","data":{"memdev":"/objects/memdimm1","hotplugged":true,"addr":5100273664,"hotpluggable":true,"size":268435456,"slot":2,"node":1,"id":"dimm1"}}],"id":"libvirt-19"}

5. Try to do live migration(or do save&restore):
# virsh migrate rhel9.0.0-full qemu+tcp://dell-per750-04.lab.eng.pek2.redhat.com/system --live --p2p --undefinesource --persistent
error: operation failed: job 'migration out' failed: Unable to write to socket: Bad file descriptor

6. Check qemu log on destination host:
2023-01-04T08:36:48.520108Z qemu-kvm: Unknown ramblock "memdimm1", cannot accept migration
2023-01-04T08:36:48.520182Z qemu-kvm: error while loading state for instance 0x0 of device 'ram'
2023-01-04T08:36:48.520499Z qemu-kvm: load of migration failed: Invalid argument
2023-01-04 08:36:48.922+0000: shutting down, reason=crashed

Actual results:
As above

Expected results:
Hotplugged dimm device has correct alias name(dimm2), and live migration succeeds.

Additional info:

Comment 1 Peter Krempa 2023-01-24 12:41:34 UTC
Fixed upstream by:

commit 5764930463eb8f450e45fa982651ef6b7a7afd7c
Author: Peter Krempa <pkrempa>
Date:   Thu Jan 19 15:18:45 2023 +0100

    qemu: Remove 'memAliasOrderMismatch' field from VM private data
    
    The field is no longer used so we can remove it and the code filling it.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Martin Kletzander <mkletzan>

commit 6d3f0b11b2b056313b123510c96f2924689341f9
Author: Peter Krempa <pkrempa>
Date:   Thu Jan 19 15:16:58 2023 +0100

    qemu: alias: Remove 'oldAlias' argument of qemuAssignDeviceMemoryAlias
    
    All callers pass 'false' so we no longer need it.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Martin Kletzander <mkletzan>

commit 50ce3463d514950350143f03e8421c8c31889c5d
Author: Peter Krempa <pkrempa>
Date:   Thu Jan 19 15:06:11 2023 +0100

    qemu: hotplug: Remove legacy quirk for 'dimm' address generation
    
    Commit b7798a07f93 (in fall of 2016) changed the way we generate aliases
    for 'dimm' memory devices as the alias itself is part of the migration
    stream section naming and thus must be treated as ABI.
    
    The code added compatibility layer for VMs with memory hotplug started
    with the old scheme to prevent from generating wrong aliases. The
    compatibility layer broke though later when 'nvdimm' and 'pmem' devices
    were introduced as it wrongly detected them as old configuration.
    
    Now rather than attempting to fix the legacy compat layer to treat other
    devices properly we'll be better off simply removing it as it's
    extremely unlikely that somebody has a VM started in 2016 running with
    today's libvirt and attempts to hotplug more memory.
    
    This fixes a corner case when a user hot-adds a 'dimm' into a VM with a
    'dimm' and a 'nvdimm' after restart of libvirtd and then attempts to
    migrate the VM.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2158701
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Martin Kletzander <mkletzan>

Comment 3 liang cong 2023-01-28 11:03:27 UTC
Preverified on upstream build libvirt v9.0.0-118-g9f8fba7501

Test steps:
1. Start a guest with a dimm device(alias=dimm0) and a nvdimm device(alias=nvdimm1)
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0-1</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>0</node>
  </target>
  <alias name="dimm0"/>
  <address type="dimm" slot="0" base="0x100000000"/>
</memory>
<memory model="nvdimm">
  <source>
    <path>/tmp/nvdimm</path>
  </source>
  <target>
    <size unit="KiB">524288</size>
    <node>1</node>
    <label>
      <size unit="KiB">128</size>
    </label>
  </target>
  <alias name="nvdimm1"/>
  <address type="dimm" slot="1" base="0x110000000"/>
</memory>

2. Restart virtqemud
# systemctl restart virtqemud

3. Hotplug another dimm device to guest.
# cat dimm.xml
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0-1</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>1</node>
  </target>
</memory>

# virsh attach-device vm1 dimm.xml 
Device attached successfully

4. Check the memory config

# virsh dumpxml vm1 --xpath 'devices//memory'
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0-1</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>0</node>
  </target>
  <alias name="dimm0"/>
  <address type="dimm" slot="0" base="0x100000000"/>
</memory>
<memory model="nvdimm">
  <source>
    <path>/tmp/nvdimm</path>
  </source>
  <target>
    <size unit="KiB">524288</size>
    <node>1</node>
    <label>
      <size unit="KiB">128</size>
    </label>
  </target>
  <alias name="nvdimm1"/>
  <address type="dimm" slot="1" base="0x110000000"/>
</memory>
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0-1</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>1</node>
  </target>
  <alias name="dimm2"/>
  <address type="dimm" slot="2" base="0x130000000"/>
</memory>


5. Query memory devices at qemu level:
# virsh qemu-monitor-command vm1 '{"execute":"query-memory-devices"}'
{"return":[{"type":"dimm","data":{"memdev":"/objects/memdimm0","hotplugged":false,"addr":4294967296,"hotpluggable":true,"size":268435456,"slot":0,"node":0,"id":"dimm0"}},{"type":"nvdimm","data":{"memdev":"/objects/memnvdimm1","hotplugged":false,"addr":4563402752,"hotpluggable":true,"size":536739840,"slot":1,"node":1,"id":"nvdimm1"}},{"type":"dimm","data":{"memdev":"/objects/memdimm2","hotplugged":true,"addr":5100273664,"hotpluggable":true,"size":268435456,"slot":2,"node":1,"id":"dimm2"}}],"id":"libvirt-18"}

6. Do virsh save & restore
# virsh save vm1 /tmp/save

Domain 'vm1' saved to /tmp/save

# virsh restore /tmp/save 
Domain restored from /tmp/save


7. Do live migration
# virsh migrate vm1 qemu+tcp://dell-per740xd-14.lab.eng.pek2.redhat.com/system --live --p2p --undefinesource --persistent

Comment 6 liang cong 2023-02-02 10:23:03 UTC
Verified on upstream build libvirt v9.0.0-118-g9f8fba7501

Test steps:
1. Start a guest with a dimm device(alias=dimm0) and a nvdimm device(alias=nvdimm1)
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0-1</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>0</node>
  </target>
  <alias name="dimm0"/>
  <address type="dimm" slot="0" base="0x100000000"/>
</memory>
<memory model="nvdimm">
  <source>
    <path>/tmp/nvdimm</path>
  </source>
  <target>
    <size unit="KiB">524288</size>
    <node>1</node>
    <label>
      <size unit="KiB">128</size>
    </label>
  </target>
  <alias name="nvdimm1"/>
  <address type="dimm" slot="1" base="0x110000000"/>
</memory>

2. Restart virtqemud
# systemctl restart virtqemud

3. Hotplug another dimm device to guest.
# cat dimm.xml
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0-1</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>1</node>
  </target>
</memory>

# virsh attach-device vm1 dimm.xml 
Device attached successfully

4. Check the memory config

# virsh dumpxml vm1  --xpath 'devices//memory'
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0-1</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>0</node>
  </target>
  <alias name="dimm0"/>
  <address type="dimm" slot="0" base="0x100000000"/>
</memory>
<memory model="nvdimm">
  <source>
    <path>/tmp/nvdimm</path>
  </source>
  <target>
    <size unit="KiB">524288</size>
    <node>1</node>
    <label>
      <size unit="KiB">128</size>
    </label>
  </target>
  <alias name="nvdimm1"/>
  <address type="dimm" slot="1" base="0x110000000"/>
</memory>
<memory model="dimm" access="private" discard="no">
  <source>
    <nodemask>0-1</nodemask>
    <pagesize unit="KiB">2048</pagesize>
  </source>
  <target>
    <size unit="KiB">262144</size>
    <node>1</node>
  </target>
  <alias name="dimm2"/>
  <address type="dimm" slot="2" base="0x130000000"/>
</memory>


5. Query memory devices at qemu level:
# virsh qemu-monitor-command vm1 '{"execute":"query-memory-devices"}'
{"return":[{"type":"dimm","data":{"memdev":"/objects/memdimm0","hotplugged":false,"addr":4294967296,"hotpluggable":true,"size":268435456,"slot":0,"node":0,"id":"dimm0"}},{"type":"nvdimm","data":{"memdev":"/objects/memnvdimm1","hotplugged":false,"addr":4563402752,"hotpluggable":true,"size":536739840,"slot":1,"node":1,"id":"nvdimm1"}},{"type":"dimm","data":{"memdev":"/objects/memdimm2","hotplugged":true,"addr":5100273664,"hotpluggable":true,"size":268435456,"slot":2,"node":1,"id":"dimm2"}}],"id":"libvirt-18"}

6. Do virsh save & restore
# virsh save vm1 /tmp/save

Domain 'vm1' saved to /tmp/save

# virsh restore /tmp/save 
Domain restored from /tmp/save


7. Do live migration
# virsh migrate vm1 qemu+tcp://dell-per740xd-14.lab.eng.pek2.redhat.com/system --live --p2p --undefinesource --persistent

Comment 7 liang cong 2023-02-02 10:24:22 UTC
Verified on libvirt-9.0.0-3.el9.x86_64 with steps same with comment 6

Comment 9 errata-xmlrpc 2023-05-09 07:27:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171