Bug 2050702

Summary: Libvirt can't start a guest if virtio-mem/virtio-pmem is on PCI bus != 0
Product: Red Hat Enterprise Linux 8 Reporter: Michal Privoznik <mprivozn>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Jing Qi <jinqi>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.6CC: dhildenb, jdenemar, jinqi, jsuchane, lcheng, lmen, mprivozn, pkrempa, virt-maint, xuzhang, yanghliu
Target Milestone: rcKeywords: Triaged, Upstream
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-8.0.0-4.module+el8.6.0+14186+211b270d Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2047271 Environment:
Last Closed: 2022-05-10 13:25:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 8.1.0
Embargoed:

Description Michal Privoznik 2022-02-04 13:40:21 UTC
+++ This bug was initially created as a clone of Bug #2047271 +++

+++ This bug was initially created as a clone of Bug #2014487 +++

If virtio-mem or virtio-pmem memory device is on a PCI bus different to the default pci.0, then starting such guest results in QEMU error:

error: internal error: qemu unexpectedly closed the monitor: 2022-01-27T13:44:29.462369Z qemu-system-x86_64: -device {"driver":"virtio-pmem-pci","memdev":"memvirtiopmem0","id":"virtiopmem0","bus":"pci.1","addr":"0xa"}: Bus 'pci.1' not found

Steps to reproduce:
1) add a virtio-mem/virtio-pmem device to config XML so that <address bus='0x1'/>
2) start the guest

--- Additional comment from Michal Privoznik on 2022-01-27 11:06:07 CET ---

(In reply to Jing Qi from comment #6)
> Verified with libvirt-8.0.0-1.el9.x86_64 & qemu-kvm-6.2.0-4.el9.x86_64 &
> kernel version 5.14.0-47.el9.x86_64 -

> So, can you please help to confirm if the attach virtio-mem device works as
> expected?

Yeah, the failure is not expected. But it looks like command line arguments ordering problem. I mean, when I configure virtio-mem to be on bus='0x01' the following cmd line is generated:

qemu-system-x86_64
-name guest=gentoo,debug-threads=on
-S
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-gentoo/master-key.aes"}'
-machine pc-i440fx-7.0,usb=off,dump-guest-core=off \
...
-object '{"qom-type":"memory-backend-file","id":"memua-virtiomem","mem-path":"/hugepages2M/libvirt/qemu/1-gentoo","reserve":false,"size":4294967296}'
-device '{"driver":"virtio-mem-pci","node":0,"block-size":2097152,"memdev":"memua-virtiomem","prealloc":true,"id":"ua-virtiomem","bus":"pci.0","addr":"0x6"}'
-object '{"qom-type":"memory-backend-ram","id":"memua-virtiomem2","reserve":false,"size":4294967296}'
-device '{"driver":"virtio-mem-pci","node":0,"block-size":2097152,"memdev":"memua-virtiomem2","id":"ua-virtiomem2","bus":"pci.1","addr":"0x9"}'
...
-device '{"driver":"pci-bridge","chassis_nr":1,"id":"pci.1","bus":"pci.0","addr":"0x9"}'
-device '{"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}'
-device '{"driver":"lsi","id":"scsi0","bus":"pci.0","addr":"0x5"}'
-device '{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.0","addr":"0x7"}'

Therefore, when QEMU starts up and see the first virtio-mem-pci device ("id":"ua-virtiomem") it will just create it and continue to the next one (ua-virtiomem2) where it sees "pci.1" bus which does not exist at that point yet. The bus is created (well would be) a few arguments later. Let me see if simple reorder fixes the problem (and think of all the implications).

--- Additional comment from Michal Privoznik on 2022-01-27 14:48:26 CET ---

Patch posted on the list:

https://listman.redhat.com/archives/libvir-list/2022-January/msg01234.html

--- Additional comment from Jing Qi on 2022-01-29 03:02:34 CET ---

Michal,can you please help to make sure if the migration issue also be fixed in above patch? Thanks

<memory model='virtio-mem'>
      <source>
        <pagesize unit='KiB'>2048</pagesize>
      </source>
      <target>
        <size unit='KiB'>131072</size>
        <node>0</node>
        <block unit='KiB'>2048</block>
        <requested unit='KiB'>131072</requested>
      </target>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </memory>

virsh migrate rhel9  qemu+ssh://dell-per740xd-27.lab.eng.pek2.redhat.com/system --live --

error: internal error: qemu unexpectedly closed the monitor: 2022-01-29T01:57:12.336180Z qemu-kvm: -device virtio-mem-pci,node=0,block-size=2097152,requested-size=134217728,memdev=memvirtiomem0,id=virtiomem0,bus=pcie.0,addr=0x1: 'virtio-mem-pci' is not a valid device model name

--- Additional comment from Jing Qi on 2022-01-29 05:30:49 CET ---

More info about above comment, the vm is migrated from rhel9 to rhel8.6. But rhel8.6 still doesn't support virtio-mem.  The error message can to be enhanced.
For migrating vm with virtio-mem from rhel9 to rhel9, I filed a new bug 2048022.

--- Additional comment from Michal Privoznik on 2022-01-31 09:38:39 CET ---

(In reply to Jing Qi from comment #2)
>
> error: internal error: qemu unexpectedly closed the monitor:
> 2022-01-29T01:57:12.336180Z qemu-kvm: -device
> virtio-mem-pci,node=0,block-size=2097152,requested-size=134217728,
> memdev=memvirtiomem0,id=virtiomem0,bus=pcie.0,addr=0x1: 'virtio-mem-pci' is
> not a valid device model name

Huh, so this indeed is a problem, but again not specific to virtio-mem. It only demonstrates itself via virtio-mem because that's one of the few differences between RHEL-9 and RHEL-8.6 QEMUs. But in general, XMLs used in migration or save/restore of domain are not validated. Let me open it as a new bug.

--- Additional comment from Michal Privoznik on 2022-02-02 14:27:15 CET ---

Merged upstream as:

af23241cfe qemu_command: Generate memory only after controllers

v8.0.0-260-gaf23241cfe

--- Additional comment from Michal Privoznik on 2022-02-04 14:19:32 CET ---

To POST:

https://gitlab.com/redhat/rhel/src/libvirt/-/merge_requests/9

Comment 4 Jing Qi 2022-02-14 02:36:23 UTC
Verified with libvirt-daemon-8.0.0-4.module+el8.6.0+14186+211b270d.x86_64 & qemu-kvm-6.2.0-5.module+el8.6.0+14025+ca131e0a.x86_64.

Because the virtio-mem is not supported in rhel8.6, only the qemu cmd  line can be checked -


Add  two controllers in the domain xml -
 <controller type='pci' index='9' model='pcie-switch-upstream-port'>
      <model name='x3130-upstream'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='10' model='pcie-switch-downstream-port'>
      <model name='xio3130-downstream'/>
      <target chassis='10' port='0x0'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </controller>

Start an vm  with dimm device-


 <memory model='dimm' access='private' discard='yes'>
      <target>
        <size unit='KiB'>524287</size>
        <node>0</node>
      </target>
      <address type='dimm' slot='0'/>
    </memory>

Check the the qemu cmd line and the memory dimm device is generated after the controllers ( pci.x & x>1 )-

-device x3130-upstream,id=pci.9,bus=pci.7,addr=0x0 \
-device xio3130-downstream,port=0,chassis=10,id=pci.10,bus=pci.9,addr=0x0 \
-device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 \
-object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/var/lib/libvirt/qemu/ram/3-avocado-vt-vm1/dimm0","discard-data":true,"share":false,"size":536870912}' \
-device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \

Comment 6 errata-xmlrpc 2022-05-10 13:25:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1759