Bug 2224472

Summary: [RFE] Request to expose pcie-root-port's mem-reserve option
Product: Red Hat Enterprise Linux 9 Reporter: Yanghang Liu <yanghliu>
Component: libvirtAssignee: Virtualization Maintenance <virt-maint>
libvirt sub component: General QA Contact: Meina Li <meili>
Status: NEW --- Docs Contact:
Severity: medium    
Priority: unspecified CC: chayang, imammedo, kraxel, lmen, meili, mprivozn, virt-maint, xuwei, yalzhang, yanqzhan, yuma
Version: 9.3Keywords: FutureFeature
Target Milestone: rcFlags: mprivozn: needinfo? (kraxel)
mprivozn: needinfo? (imammedo)
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yanghang Liu 2023-07-21 03:53:10 UTC
Description of problem:
Currently we need to setup pcie root port's mem-reserve property if we want to hot-plug the non-prefetchable PF/VF whose memory bar > 2MP into the domain 

Considering that hot-plugging a PF/VF into domain is a basic use case, we expect to expose pcie-root-port's mem-reserve property in upper-layer products.

How reproducible:
100%

Steps to Reproduce:
(1) Choose a non-prefetchable + memory bar size> 2M PF for testing

# lspci -v -s 0000:1a:00.1
1a:00.1 Ethernet controller: Solarflare Communications SFC9220 10/40G Ethernet Controller (rev 02)
        Subsystem: Solarflare Communications SFN8522-R2 8000 Series 10G Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 363, NUMA node 0, IOMMU group 34
        I/O ports at 4000 [size=256]
        Memory at 9d800000 (64-bit, non-prefetchable) [size=8M]
        Memory at a6800000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at a6a80000 [disabled] [size=256K]
        Capabilities: [40] Power Management version 3
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Device Serial Number 00-0f-53-ff-ff-4d-8c-30
        Capabilities: [158] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [198] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [1d8] Transaction Processing Hints
        Kernel driver in use: sfc
        Kernel modules: sfc

(2) start a Q35 + OVMF domain

# virt-install --machine=q35 --noreboot --name=rhel93 --memory=4096 --vcpus=4 --graphics type=vnc,port=5993,listen=0.0.0.0 --boot=uefi --network bridge=switch,model=virtio,mac=52:54:00:00:93:93 --import --noautoconsole --disk path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 --osinfo detect=on,require=off

(3) hot-plug a XL710 PF into domain
# /bin/virsh attach-device rhel93 /tmp/device/0000:1a:00.1.xml

(4) check if the SFC9220 PF can be hot-plug into the domain

# ifconfig  <--- There is not any output about SFC9220 PF info 

# dmesg 
[   38.986998] pci 0000:04:00.0: [1924:0a03] type 00 class 0x020000
[   38.988300] pci 0000:04:00.0: reg 0x10: [io  0x0000-0x00ff]
[   38.989381] pci 0000:04:00.0: reg 0x18: [mem 0x00000000-0x007fffff 64bit]
[   38.990229] pci 0000:04:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit]
[   38.990523] pci 0000:04:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[   38.991142] pci 0000:04:00.0: Max Payload Size set to 128 (was 256, max 1024)
[   38.994702] pci 0000:04:00.0: supports D1 D2
[   39.001323] pci 0000:04:00.0: BAR 2: no space for [mem size 0x00800000 64bit]
[   39.001330] pci 0000:04:00.0: BAR 2: failed to assign [mem size 0x00800000 64bit]
[   39.001335] pci 0000:04:00.0: BAR 6: assigned [mem 0xc2400000-0xc243ffff pref]
[   39.001340] pci 0000:04:00.0: BAR 4: assigned [mem 0xc2440000-0xc2443fff 64bit]
[   39.001808] pci 0000:04:00.0: BAR 0: assigned [io  0x4000-0x40ff]
[   39.073706] Solarflare NET driver
[   39.074273] sfc 0000:04:00.0: Solarflare NIC detected
[   39.081769] sfc 0000:04:00.0: Part Number : SFN8522
[   39.081804] sfc 0000:04:00.0: enabling device (0000 -> 0003)
[   39.084283] sfc 0000:04:00.0: ERROR: No BAR2 mapping from the BIOS. Try pci=realloc on the kernel command line

(5)If we can setup pcie-root-port's mem-reserve value to 16M, the SFC9220 PF can be hot-plugged into the domain successfully.

The detailed test step is in https://bugzilla.redhat.com/show_bug.cgi?id=2055123#c31



Actual results:
Failed to setup pcie-root-port's mem-reserve option via libvirt domain xml

Expected results:
We can setup pcie-root-port's mem-reserve option via libvirt domain xml



Additional info:
(1) This bug is opened based on https://bugzilla.redhat.com/show_bug.cgi?id=2055123#c32


(2) The related qemu-kvm cmd line is like:
-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \
...
-device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","mem-reserve":16777216,"bus":"pcie.0","addr":"0x2.0x4"}' \
-device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","mem-reserve":16777216,"bus":"pcie.0","addr":"0x2.0x5"}' \

Comment 2 Michal Privoznik 2023-07-25 08:05:21 UTC
Adding the knob should be fairly trivial. But what I am wondering about is how users know the value to pass, or - whether there is an automated way for libvirt to select the correct value automagically, without user intervention (which then might create a problem on its own, e.g. during migration because I bet this knob is part of guest ABI). Gerd? Igor?