RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2174749 - [edk2] re-enable dynamic mmio window
Summary: [edk2] re-enable dynamic mmio window
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: edk2
Version: 9.2
Hardware: All
OS: All
high
high
Target Milestone: rc
: ---
Assignee: Gerd Hoffmann
QA Contact: Xueqiang Wei
URL:
Whiteboard:
Depends On: 2171860
Blocks: 2055123 2209005 2209571
TreeView+ depends on / blocked
 
Reported: 2023-03-02 11:24 UTC by Gerd Hoffmann
Modified: 2023-11-07 09:05 UTC (History)
22 users (show)

Fixed In Version: edk2-20230524-2.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2174605
Environment:
Last Closed: 2023-11-07 08:24:29 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src edk2 merge_requests 41 0 None opened enable dynamic mmio window 2023-06-29 15:40:09 UTC
Red Hat Issue Tracker RHELPLAN-150409 0 None None None 2023-03-02 11:24:26 UTC
Red Hat Product Errata RHSA-2023:6330 0 None None None 2023-11-07 08:25:07 UTC

Description Gerd Hoffmann 2023-03-02 11:24:03 UTC
+++ This bug was initially created as a clone of Bug #2174605 +++

This bug was initially created as a copy of Bug #2171860

I am copying this bug because: 



Description of problem:
vm migration failed with "failed to set MSR 0x202 to 0x380000000000"

Version-Release number of selected component (if applicable):
source and target host:
libvirt-9.0.0-6.el9.x86_64
qemu-kvm-7.2.0-9.el9.x86_64
kernel-5.14.0-268.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare source host with cpu model as "Cascadelake-Server-noTSX", target host with cpu model as "Skylake-Client-noTSX-IBRS";
mount nfs on both source and target host to target directory as /var/lib/libvirt/migrate/
On source host run:  
# virsh domcapabilities  > /var/lib/libvirt/migrate/cpu.xml
On target host run:
virsh domcapabilities  >> /var/lib/libvirt/migrate/cpu.xml

On the source host, generate the baseline cpu by:
# virsh hypervisor-cpu-baseline /var/lib/libvirt/migrate/cpu.xml --migratable
<cpu mode='custom' match='exact'>
  <model fallback='forbid'>Skylake-Client-IBRS</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='pdcm'/>
  <feature policy='require' name='hypervisor'/>
  <feature policy='require' name='tsc_adjust'/>
  <feature policy='require' name='clflushopt'/>
  <feature policy='require' name='umip'/>
  <feature policy='require' name='md-clear'/>
  <feature policy='require' name='stibp'/>
  <feature policy='require' name='arch-capabilities'/>
  <feature policy='require' name='ssbd'/>
  <feature policy='require' name='xsaves'/>
  <feature policy='require' name='pdpe1gb'/>
  <feature policy='require' name='ibpb'/>
  <feature policy='require' name='ibrs'/>
  <feature policy='require' name='amd-stibp'/>
  <feature policy='require' name='amd-ssbd'/>
  <feature policy='require' name='skip-l1dfl-vmentry'/>
  <feature policy='require' name='pschange-mc-no'/>
  <feature policy='disable' name='hle'/>
  <feature policy='disable' name='rtm'/>
</cpu>

2. start a vm on source host with the cpu configuration above, and try to migrate the vm to target host:
# virsh migrate rhel --live --verbose qemu+ssh://{$target_host}/system --p2p --persistent --undefinesource
Migration: [100 %]error: operation failed: job 'migration out' unexpectedly failed

check the libvirtd log on target host:
2023-02-18 10:15:47.792+0000: 7216: error : qemuProcessReportLogError:1971 : internal error: qemu unexpectedly closed the monitor: 2023-02-18T10:15:47.735537Z qemu-kvm: warning: TSC frequency mismatch between VM (2194843 kHz) and host (2903990 kHz), and TSC scaling unavailable
2023-02-18T10:15:47.735651Z qemu-kvm: error: failed to set MSR 0x202 to 0x380000000000
qemu-kvm: ../target/i386/kvm/kvm.c:3177: int kvm_buf_set_msrs(X86CPU *): Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.

Actual results:
VM migration failed with baseline cpu

Expected results:
VM migration should succeed

Additional info:

--- Additional comment from Gerd Hoffmann on 2023-03-02 12:22:51 CET ---

Problem: recent OVMF start using the full physical address space which is available.
See https://bugzilla.redhat.com//show_bug.cgi?id=2084533
and https://issues.redhat.com/browse/RHEL-60

libvirt host capabilities (for live migration compatibility)
do not include the physical address space size though, so this
causes problems in heterogeneous clusters.

PLAN: disable for 9.2, enable again for 9.3 (and eventually 9.2.z),
after libvirt has been fixed.

Comment 4 Xueqiang Wei 2023-04-23 07:32:40 UTC
Tested edk2 test loop with the scratch build on amd and intel host, now new bug was found.

Versions:
kernel-5.14.0-299.el9.x86_64
qemu-kvm-8.0.0-1.el9
edk2-ovmf-20230301gitf80f052277c8-2.el9.bz2174749.20230418.1152.noarch
Guest: rhel9.3, win11
 

Job link:
amd host: http://virtqetools.lab.eng.pek2.redhat.com/kvm_autotest_job_log/?jobid=7775914
          Existing bug: Bug 2168446 - Booting VM failed on AMD EPYC 7252 host with npt=0
intel host: http://virtqetools.lab.eng.pek2.redhat.com/kvm_autotest_job_log/?jobid=7775924

Comment 6 Li Xiaohui 2023-05-09 10:14:59 UTC
Hi, I'm trying to migrate RHEL 9.3 OVMF guest from the src host: Xeon(R) Silver 4110 to the destination host: Xeon(R) CPU E3-1240 v5 on edk2-ovmf-20230301gitf80f052277c8-2.el9.bz2174749.20230418.1152.noarch, 

the Address sizes of the source is 46 bits physical, the destination is 39 bits physical

when boot guest without host-phys-bits-limit=39, then dst qemu would core dump after migration completion:
qemu-kvm: error: failed to set MSR 0x202 to 0xe000000000
qemu-kvm: ../target/i386/kvm/kvm.c:3177: int kvm_buf_set_msrs(X86CPU *): Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.

But with host-phys-bits-limit=39, migration succeeds, and VM works well after migration.


The above test results are our expectation, right?

Comment 7 Gerd Hoffmann 2023-05-09 13:42:47 UTC
(In reply to Li Xiaohui from comment #6)
> Hi, I'm trying to migrate RHEL 9.3 OVMF guest from the src host: Xeon(R)
> Silver 4110 to the destination host: Xeon(R) CPU E3-1240 v5 on
> edk2-ovmf-20230301gitf80f052277c8-2.el9.bz2174749.20230418.1152.noarch, 
> 
> the Address sizes of the source is 46 bits physical, the destination is 39
> bits physical
> 
> when boot guest without host-phys-bits-limit=39, then dst qemu would core
> dump after migration completion:
> qemu-kvm: error: failed to set MSR 0x202 to 0xe000000000
> qemu-kvm: ../target/i386/kvm/kvm.c:3177: int kvm_buf_set_msrs(X86CPU *):
> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
> 
> But with host-phys-bits-limit=39, migration succeeds, and VM works well
> after migration.
> 
> 
> The above test results are our expectation, right?

Yes.

If you check /proc/iomem and /proc/mtrr in the guest (running
on the host with 46 bits physical) you can see the different
address space layouts used with/without host-phys-bits-limit=39.

Comment 8 Yanghang Liu 2023-05-11 03:16:40 UTC
(In reply to Gerd Hoffmann from comment #3)
> https://gitlab.com/kraxel/centos-edk2/-/commits/bz2174749-enable-mmio-window
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=2134405

I use the above build to test Bug 2055123 - [Q35] Failed to hot-plug a device whose membar > 2M into the vm

My test result shows Bug 2055123 still can be reproduced.

Test env:
# uname -r
5.14.0-310.el9.x86_64
# rpm -q qemu-kvm
qemu-kvm-8.0.0-1.el9.x86_64
# rpm -qa|grep edk2
edk2-tools-20230301gitf80f052277c8-2.el9.bz2174749.20230418.1152.x86_64
edk2-ovmf-20230301gitf80f052277c8-2.el9.bz2174749.20230418.1152.noarch



Test step:
(1) start a domain
(2) hot-plug a XL710 PF into domain
# lspci -s 87:00.0
87:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
# /bin/virsh attach-device rhel93 /tmp/device/0000:87:00.0.xml
Device attached successfully

(3) check the PF status in the domain
# ifconfig <-- I can not get any PF info here
# dmesg 
[   96.606506] pci 0000:04:00.0: [8086:1583] type 00 class 0x020000
[   96.607008] pci 0000:04:00.0: reg 0x10: [mem 0x00000000-0x00ffffff 64bit pref]
[   96.608131] pci 0000:04:00.0: reg 0x1c: [mem 0x00000000-0x00007fff 64bit pref]
[   96.608873] pci 0000:04:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[   96.609398] pci 0000:04:00.0: Max Payload Size set to 128 (was 256, max 2048)
[   96.619171] pci 0000:04:00.0: BAR 0: no space for [mem size 0x01000000 64bit pref]
[   96.619176] pci 0000:04:00.0: BAR 0: failed to assign [mem size 0x01000000 64bit pref]
[   96.619179] pci 0000:04:00.0: BAR 6: assigned [mem 0x80200000-0x8027ffff pref]
[   96.619184] pci 0000:04:00.0: BAR 3: assigned [mem 0x80400000-0x80407fff 64bit pref]
[   96.674774] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[   96.674776] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[   96.675213] i40e 0000:04:00.0: enabling device (0000 -> 0002)
[   96.676685] i40e 0000:04:00.0: Cannot map registers, bar size 0x0 too small, aborting
[   96.677242] i40e: probe of 0000:04:00.0 failed with error -12

Comment 9 Gerd Hoffmann 2023-05-11 05:16:12 UTC
> I use the above build to test Bug 2055123 - [Q35] Failed to hot-plug a
> device whose membar > 2M into the vm
> 
> My test result shows Bug 2055123 still can be reproduced.

What is the exact qemu command line (or libvirt xml)?

Comment 10 Yanghang Liu 2023-05-11 06:27:11 UTC
(In reply to Gerd Hoffmann from comment #9)
> > I use the above build to test Bug 2055123 - [Q35] Failed to hot-plug a
> > device whose membar > 2M into the vm
> > 
> > My test result shows Bug 2055123 still can be reproduced.
> 
> What is the exact qemu command line (or libvirt xml)?

The virt-install I used to import a domain:

# virt-install --machine=q35 --noreboot --name=rhel93 --memory=4096 --vcpus=4 --graphics type=vnc,port=5993,listen=0.0.0.0 --boot=uefi --network bridge=switch,model=virtio,mac=52:54:00:00:93:93 --import --noautoconsole --disk path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 --osinfo detect=on,require=off

The detailed domain xml:

<domain type='kvm'>
  <name>rhel93</name>
  <uuid>317e4316-4bc9-4997-b74a-acffc0056e4e</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <os firmware='efi'>
    <type arch='x86_64' machine='pc-q35-rhel9.2.0'>hvm</type>
    <firmware>
      <feature enabled='yes' name='enrolled-keys'/>
      <feature enabled='yes' name='secure-boot'/>
    </firmware>
    <loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader>
    <nvram template='/usr/share/edk2/ovmf/OVMF_VARS.secboot.fd'>/var/lib/libvirt/qemu/nvram/rhel93_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <smm state='on'/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'/>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='threads'/>
      <source file='/home/images/RHEL93.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x16'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x17'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x18'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x19'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
    </controller>
    <controller type='pci' index='11' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='11' port='0x1a'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/>
    </controller>
    <controller type='pci' index='12' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='12' port='0x1b'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x3'/>
    </controller>
    <controller type='pci' index='13' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='13' port='0x1c'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x4'/>
    </controller>
    <controller type='pci' index='14' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='14' port='0x1d'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x5'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:00:93:93'/>
      <source bridge='switch'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='5993' autoport='no' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <audio id='1' type='none'/>
    <video>
      <model type='bochs' vram='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <watchdog model='itco' action='reset'/>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Comment 11 Gerd Hoffmann 2023-05-11 09:05:35 UTC
> <domain type='kvm'>
>   <cpu mode='host-passthrough' check='none' migratable='on'/>

Looks good.

Can you add 'lspci -v' output for the device and the pcie bridge
it is connected to (inside the guest, after hotplug)?

Comment 12 Yanghang Liu 2023-05-11 09:21:51 UTC
Hi Gerd,


please check :

[root@vm-210-139 ~]# lspci 
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 Display controller: Device 1234:1111 (rev 02)
00:02.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.6 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.7 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:03.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:03.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:03.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:03.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:03.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:03.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
02:00.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)
03:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon (rev 01)
04:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)


[root@vm-210-139 ~]# lspci -v
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: bus master, fast devsel, latency 0

00:01.0 Display controller: Device 1234:1111 (rev 02)
        Subsystem: Red Hat, Inc. Device 1100
        Flags: bus master, fast devsel, latency 0
        Memory at c0000000 (32-bit, prefetchable) [size=16M]
        Memory at c2650000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at 80600000 [disabled] [size=32K]
        Capabilities: [80] Express Root Complex Integrated Endpoint, MSI 00
        Kernel driver in use: bochs-drm
        Kernel modules: bochs

00:02.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at c264f000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00001000-00001fff [size=4K]
        Memory behind bridge: c2500000-c25fffff [size=1M]
        Prefetchable memory behind bridge: 0000385000000000-00003850000fffff [size=1M]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:02.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at c264e000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: 00002000-00002fff [size=4K]
        Memory behind bridge: c2400000-c24fffff [size=1M]
        Prefetchable memory behind bridge: 0000385000100000-00003850001fffff [size=1M]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:02.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at c264d000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: 00003000-00003fff [size=4K]
        Memory behind bridge: 80000000-801fffff [size=2M]
        Prefetchable memory behind bridge: 0000385000200000-00003850002fffff [size=1M]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:02.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at c264c000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: 00004000-00004fff [size=4K]
        Memory behind bridge: 80200000-803fffff [size=2M]
        Prefetchable memory behind bridge: 0000000080400000-00000000805fffff [size=2M]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:02.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at c264b000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
        I/O behind bridge: 0000f000-0000ffff [size=4K]
        Memory behind bridge: c2200000-c23fffff [size=2M]
        Prefetchable memory behind bridge: 0000380000000000-00003807ffffffff [size=32G]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:02.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at c264a000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=06, subordinate=06, sec-latency=0
        I/O behind bridge: 0000e000-0000efff [size=4K]
        Memory behind bridge: c2000000-c21fffff [size=2M]
        Prefetchable memory behind bridge: 0000380800000000-0000380fffffffff [size=32G]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:02.6 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at c2649000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=07, subordinate=07, sec-latency=0
        I/O behind bridge: 0000d000-0000dfff [size=4K]
        Memory behind bridge: c1e00000-c1ffffff [size=2M]
        Prefetchable memory behind bridge: 0000381000000000-00003817ffffffff [size=32G]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:02.7 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at c2648000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=08, subordinate=08, sec-latency=0
        I/O behind bridge: 0000c000-0000cfff [size=4K]
        Memory behind bridge: c1c00000-c1dfffff [size=2M]
        Prefetchable memory behind bridge: 0000381800000000-0000381fffffffff [size=32G]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:03.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 23
        Memory at c2647000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=09, subordinate=09, sec-latency=0
        I/O behind bridge: 0000b000-0000bfff [size=4K]
        Memory behind bridge: c1a00000-c1bfffff [size=2M]
        Prefetchable memory behind bridge: 0000382000000000-00003827ffffffff [size=32G]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:03.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 23
        Memory at c2646000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0
        I/O behind bridge: 0000a000-0000afff [size=4K]
        Memory behind bridge: c1800000-c19fffff [size=2M]
        Prefetchable memory behind bridge: 0000382800000000-0000382fffffffff [size=32G]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:03.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 23
        Memory at c2645000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=0b, subordinate=0b, sec-latency=0
        I/O behind bridge: 00009000-00009fff [size=4K]
        Memory behind bridge: c1600000-c17fffff [size=2M]
        Prefetchable memory behind bridge: 0000383000000000-00003837ffffffff [size=32G]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:03.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 23
        Memory at c2644000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=0c, subordinate=0c, sec-latency=0
        I/O behind bridge: 00008000-00008fff [size=4K]
        Memory behind bridge: c1400000-c15fffff [size=2M]
        Prefetchable memory behind bridge: 0000383800000000-0000383fffffffff [size=32G]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:03.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 23
        Memory at c2643000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=0d, subordinate=0d, sec-latency=0
        I/O behind bridge: 00007000-00007fff [size=4K]
        Memory behind bridge: c1200000-c13fffff [size=2M]
        Prefetchable memory behind bridge: 0000384000000000-00003847ffffffff [size=32G]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:03.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 23
        Memory at c2642000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=00, secondary=0e, subordinate=0e, sec-latency=0
        I/O behind bridge: 00006000-00006fff [size=4K]
        Memory behind bridge: c1000000-c11fffff [size=2M]
        Prefetchable memory behind bridge: 0000384800000000-0000384fffffffff [size=32G]
        Capabilities: [54] Express Root Port (Slot+), MSI 00
        Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Access Control Services
        Kernel driver in use: pcieport

00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03) (prog-if 00 [UHCI])
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: bus master, fast devsel, latency 0, IRQ 16
        I/O ports at 5040 [size=32]
        Kernel driver in use: uhci_hcd

00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03) (prog-if 00 [UHCI])
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: bus master, fast devsel, latency 0, IRQ 17
        I/O ports at 5060 [size=32]
        Kernel driver in use: uhci_hcd

00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03) (prog-if 00 [UHCI])
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: bus master, fast devsel, latency 0, IRQ 18
        I/O ports at 5080 [size=32]
        Kernel driver in use: uhci_hcd

00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03) (prog-if 20 [EHCI])
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: bus master, fast devsel, latency 0, IRQ 19
        Memory at c2641000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ehci-pci

00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: bus master, fast devsel, latency 0
        Kernel driver in use: lpc_ich
        Kernel modules: lpc_ich

00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02) (prog-if 01 [AHCI 1.0])
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: bus master, fast devsel, latency 0, IRQ 46
        I/O ports at 50a0 [size=32]
        Memory at c2640000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [a8] SATA HBA v1.0
        Kernel driver in use: ahci
        Kernel modules: ahci

00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: bus master, fast devsel, latency 0, IRQ 16
        I/O ports at 5000 [size=64]
        Kernel driver in use: i801_smbus
        Kernel modules: i2c_i801

01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
        Subsystem: Red Hat, Inc. Device 1100
        Physical Slot: 0
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at c2500000 (32-bit, non-prefetchable) [size=4K]
        Memory at 385000000000 (64-bit, prefetchable) [size=16K]
        Expansion ROM at c2540000 [disabled] [size=256K]
        Capabilities: [dc] MSI-X: Enable+ Count=4 Masked-
        Capabilities: [c8] Vendor Specific Information: VirtIO: <unknown>
        Capabilities: [b4] Vendor Specific Information: VirtIO: Notify
        Capabilities: [a4] Vendor Specific Information: VirtIO: DeviceCfg
        Capabilities: [94] Vendor Specific Information: VirtIO: ISR
        Capabilities: [84] Vendor Specific Information: VirtIO: CommonCfg
        Capabilities: [7c] Power Management version 3
        Capabilities: [40] Express Endpoint, MSI 00
        Kernel driver in use: virtio-pci

02:00.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)
        Subsystem: Red Hat, Inc. Device 1100
        Physical Slot: 0-2
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at c2400000 (32-bit, non-prefetchable) [size=4K]
        Memory at 385000100000 (64-bit, prefetchable) [size=16K]
        Capabilities: [dc] MSI-X: Enable+ Count=5 Masked-
        Capabilities: [c8] Vendor Specific Information: VirtIO: <unknown>
        Capabilities: [b4] Vendor Specific Information: VirtIO: Notify
        Capabilities: [a4] Vendor Specific Information: VirtIO: DeviceCfg
        Capabilities: [94] Vendor Specific Information: VirtIO: ISR
        Capabilities: [84] Vendor Specific Information: VirtIO: CommonCfg
        Capabilities: [7c] Power Management version 3
        Capabilities: [40] Express Endpoint, MSI 00
        Kernel driver in use: virtio-pci

03:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon (rev 01)
        Subsystem: Red Hat, Inc. Device 1100
        Physical Slot: 0-3
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at 385000200000 (64-bit, prefetchable) [size=16K]
        Capabilities: [c8] Vendor Specific Information: VirtIO: <unknown>
        Capabilities: [b4] Vendor Specific Information: VirtIO: Notify
        Capabilities: [a4] Vendor Specific Information: VirtIO: DeviceCfg
        Capabilities: [94] Vendor Specific Information: VirtIO: ISR
        Capabilities: [84] Vendor Specific Information: VirtIO: CommonCfg
        Capabilities: [7c] Power Management version 3
        Capabilities: [40] Express Endpoint, MSI 00
        Kernel driver in use: virtio-pci

04:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
        Subsystem: Intel Corporation Ethernet Converged Network Adapter XL710-Q2
        Physical Slot: 0-4
        Flags: fast devsel
        Memory at <unassigned> (64-bit, prefetchable)
        Memory at 80400000 (64-bit, prefetchable) [size=32K]
        Expansion ROM at 80200000 [virtual] [disabled] [size=512K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable- Count=129 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [e0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number a8-90-15-ff-ff-fe-fd-3c
        Capabilities: [1a0] Transaction Processing Hints
        Capabilities: [1b0] Access Control Services
        Kernel modules: i40e

Comment 13 Gerd Hoffmann 2023-05-11 13:39:31 UTC
> 00:02.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal
> decode])
>         Flags: bus master, fast devsel, latency 0, IRQ 22
>         Memory at c264c000 (32-bit, non-prefetchable) [size=4K]
>         Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
>         I/O behind bridge: 00004000-00004fff [size=4K]
>         Memory behind bridge: 80200000-803fffff [size=2M]
>         Prefetchable memory behind bridge: 0000000080400000-00000000805fffff
> [size=2M]
>         Capabilities: [54] Express Root Port (Slot+), MSI 00
>         Capabilities: [48] MSI-X: Enable+ Count=1 Masked-
>         Capabilities: [40] Subsystem: Red Hat, Inc. Device 0000
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [148] Access Control Services
>         Kernel driver in use: pcieport

This should be the root port used by the nic.  Has a 2M prefetchable memory window.

> 00:02.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal
> decode])
>         Prefetchable memory behind bridge: 0000380000000000-00003807ffffffff
> [size=32G]

> 00:02.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal
> decode])
>         Prefetchable memory behind bridge: 0000380800000000-0000380fffffffff
> [size=32G]

All other ports have 32G.  Hmm.  There is no difference in libvirt xml ...

> 04:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for
> 40GbE QSFP+ (rev 02)
>         Subsystem: Intel Corporation Ethernet Converged Network Adapter
> XL710-Q2
>         Physical Slot: 0-4
>         Flags: fast devsel
>         Memory at <unassigned> (64-bit, prefetchable)
>         Memory at 80400000 (64-bit, prefetchable) [size=32K]
>         Expansion ROM at 80200000 [virtual] [disabled] [size=512K]

Finally the NIC.

Can you attach the complete kernel log (booting plus hotplug) please?
I'm wondering where these differences in pcie root port configuration
are comimg from.  I'd expect all ports have 32G windows ...

Comment 15 Gerd Hoffmann 2023-05-15 11:59:37 UTC
New test build:
https://kojihub.stream.centos.org/koji/taskinfo?taskID=2216516

Any change with this one?

Comment 16 Yanghang Liu 2023-05-15 13:08:30 UTC
(In reply to Gerd Hoffmann from comment #15)
> New test build:
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=2216516
> 
> Any change with this one?


Hi Gerd,

My test result shows the PF can be hot-plugged into domain successfully now.

Test env: edk2-ovmf-20230301gitf80f052277c8-3.el9.bz2174749.20230515.1346.noarch

Test step:
(1) start a domain
# virt-install --machine=q35 --noreboot --name=rhel93 --memory=4096 --vcpus=4 --graphics type=vnc,port=5993,listen=0.0.0.0 --boot=uefi --network bridge=switch,model=virtio,mac=52:54:00:00:93:93 --import --noautoconsole --disk path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 --osinfo detect=on,require=off

(2) hot-plug a XL710 PF into domain
# lspci -s 87:00.0
87:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
# /bin/virsh attach-device rhel93 /tmp/device/0000:87:00.0.xml
Device attached successfully

(3) check the PF status in the domain
# ifconfig 
...
enp4s0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 3c:fd:fe:15:90:a8  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
# dmesg 
[   38.456761] pci 0000:04:00.0: [8086:1583] type 00 class 0x020000
[   38.457666] pci 0000:04:00.0: reg 0x10: [mem 0x00000000-0x00ffffff 64bit pref]
[   38.458598] pci 0000:04:00.0: reg 0x1c: [mem 0x00000000-0x00007fff 64bit pref]
[   38.459323] pci 0000:04:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[   38.459874] pci 0000:04:00.0: Max Payload Size set to 128 (was 256, max 2048)
[   38.471403] pci 0000:04:00.0: BAR 0: assigned [mem 0x381800000000-0x381800ffffff 64bit pref]
[   38.472489] pci 0000:04:00.0: BAR 6: assigned [mem 0xc2400000-0xc247ffff pref]
[   38.472496] pci 0000:04:00.0: BAR 3: assigned [mem 0x381801000000-0x381801007fff 64bit pref]
[   38.514703] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[   38.514705] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[   38.515275] i40e 0000:04:00.0: enabling device (0000 -> 0002)
[   38.536779] i40e 0000:04:00.0: fw 9.80.70867 api 1.15 nvm 9.00 0x8000cadc 21.5.9 [8086:1583] [8086:0006]
[   38.603332] i40e 0000:04:00.0: MAC address: 3c:fd:fe:15:90:a8
[   38.604352] i40e 0000:04:00.0: FW LLDP is enabled
[   38.613157] i40e 0000:04:00.0: PCI-Express: Speed 8.0GT/s Width x8
[   38.613832] i40e 0000:04:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 4 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[   38.625170] i40e 0000:04:00.0 enp4s0: renamed from eth0

The full domain dmesg of different PFs is as following: 
http://10.73.72.41/log/bug/Bug2174749/2023_05_15_08:15:19_XL710
http://10.73.72.41/log/bug/Bug2174749/2023_05_15_08:30:25_82599ES
http://10.73.72.41/log/bug/Bug2174749/2023_05_15_08:58:13_MT2892
http://10.73.72.41/log/bug/Bug2174749/2023_05_15_09:01:27_QL41112

Comment 17 liunana 2023-05-16 09:38:47 UTC
(In reply to Gerd Hoffmann from comment #15)
> New test build:
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=2216516


Hi Gerd,

Does QE need to do sanity test with this build instead of edk2-ovmf-20230301gitf80f052277c8-2.el9.bz2174749.20230418.1152?
Thanks.


Best regards
Nana

Comment 18 Gerd Hoffmann 2023-05-16 10:05:58 UTC
Patch posted upstream
https://edk2.groups.io/g/devel/message/104919

> > New test build:
> > https://kojihub.stream.centos.org/koji/taskinfo?taskID=2216516
> 
> Does QE need to do sanity test with this build instead of
> edk2-ovmf-20230301gitf80f052277c8-2.el9.bz2174749.20230418.1152?

Yes, please.  The new scratch build has both patches,
#1 which enables the dynamic mmio window, and
#2 which fixes comment 8 problem.

Comment 20 Yanghang Liu 2023-05-22 09:18:48 UTC
> 
> Hi Gerd,
> 
> My test result shows the PF can be hot-plugged into domain successfully now.
> 
> Test env:
> edk2-ovmf-20230301gitf80f052277c8-3.el9.bz2174749.20230515.1346.noarch
> 
> Test step:
> (1) start a domain
> # virt-install --machine=q35 --noreboot --name=rhel93 --memory=4096
> --vcpus=4 --graphics type=vnc,port=5993,listen=0.0.0.0 --boot=uefi --network
> bridge=switch,model=virtio,mac=52:54:00:00:93:93 --import --noautoconsole
> --disk
> path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,
> size=20 --osinfo detect=on,require=off
> 
> (2) hot-plug a XL710 PF into domain
> # lspci -s 87:00.0
> 87:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for
> 40GbE QSFP+ (rev 02)
> # /bin/virsh attach-device rhel93 /tmp/device/0000:87:00.0.xml
> Device attached successfully
> 
> (3) check the PF status in the domain
> # ifconfig 
> ...
> enp4s0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         ether 3c:fd:fe:15:90:a8  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> # dmesg 
> [   38.456761] pci 0000:04:00.0: [8086:1583] type 00 class 0x020000
> [   38.457666] pci 0000:04:00.0: reg 0x10: [mem 0x00000000-0x00ffffff 64bit
> pref]
> [   38.458598] pci 0000:04:00.0: reg 0x1c: [mem 0x00000000-0x00007fff 64bit
> pref]
> [   38.459323] pci 0000:04:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> [   38.459874] pci 0000:04:00.0: Max Payload Size set to 128 (was 256, max
> 2048)
> [   38.471403] pci 0000:04:00.0: BAR 0: assigned [mem
> 0x381800000000-0x381800ffffff 64bit pref]
> [   38.472489] pci 0000:04:00.0: BAR 6: assigned [mem 0xc2400000-0xc247ffff
> pref]
> [   38.472496] pci 0000:04:00.0: BAR 3: assigned [mem
> 0x381801000000-0x381801007fff 64bit pref]
> [   38.514703] i40e: Intel(R) Ethernet Connection XL710 Network Driver
> [   38.514705] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
> [   38.515275] i40e 0000:04:00.0: enabling device (0000 -> 0002)
> [   38.536779] i40e 0000:04:00.0: fw 9.80.70867 api 1.15 nvm 9.00 0x8000cadc
> 21.5.9 [8086:1583] [8086:0006]
> [   38.603332] i40e 0000:04:00.0: MAC address: 3c:fd:fe:15:90:a8
> [   38.604352] i40e 0000:04:00.0: FW LLDP is enabled
> [   38.613157] i40e 0000:04:00.0: PCI-Express: Speed 8.0GT/s Width x8
> [   38.613832] i40e 0000:04:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 4
> RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
> [   38.625170] i40e 0000:04:00.0 enp4s0: renamed from eth0
> 
> The full domain dmesg of different PFs is as following: 
> http://10.73.72.41/log/bug/Bug2174749/2023_05_15_08:15:19_XL710
> http://10.73.72.41/log/bug/Bug2174749/2023_05_15_08:30:25_82599ES
> http://10.73.72.41/log/bug/Bug2174749/2023_05_15_08:58:13_MT2892
> http://10.73.72.41/log/bug/Bug2174749/2023_05_15_09:01:27_QL41112



Hi Gerd,

My test result shows the SPF9220 PF can still not be hot-plugged into domain even with edk2-ovmf-20230301gitf80f052277c8-3.el9.bz2174749.20230515.1346.noarch.

The full domain dmesg:   http://10.73.72.41/log/bug/Bug2174749/2023_05_22_17:08:19_SFC9220_BZ

Only SPF9220 PF test failed currently.

Could you please help check it ?

Comment 23 Gerd Hoffmann 2023-05-22 10:37:10 UTC
> The full domain dmesg:  
> http://10.73.72.41/log/bug/Bug2174749/2023_05_22_17:08:19_SFC9220_BZ
> 
> Only SPF9220 PF test failed currently.

What is lspci output for this device (on the host)?

Comment 24 Yanghang Liu 2023-05-22 11:20:34 UTC
(In reply to Gerd Hoffmann from comment #23)
> > The full domain dmesg:  
> > http://10.73.72.41/log/bug/Bug2174749/2023_05_22_17:08:19_SFC9220_BZ
> > 
> > Only SPF9220 PF test failed currently.
> 
> What is lspci output for this device (on the host)?


Just like:

# lspci -vv -s  0000:1a:00.0
1a:00.0 Ethernet controller: Solarflare Communications SFC9220 10/40G Ethernet Controller (rev 02)
        Subsystem: Solarflare Communications SFN8522-R2 8000 Series 10G Adapter
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 114
        NUMA node: 0
        IOMMU group: 23
        Region 0: I/O ports at 4100 [size=256]
        Region 2: Memory at 9e000000 (64-bit, non-prefetchable) [size=8M]
        Region 4: Memory at a6904000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at a6a40000 [disabled] [size=256K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
                DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x8 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Via message/WAKE#, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp+ ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00002000
        Capabilities: [d0] Vital Product Data
                Product Name: Solarflare Flareon Ultra 8000 Series 10G Adapter
                Read-only fields:
                        [PN] Part number: SFN8522
                        [SN] Serial number: 852200210000170117100443
                        [EC] Engineering changes: PCBR2:CCSA2
                        [V0] Vendor specific: 8.0.0
                        [VD] Vendor specific: 8.0.0
                        [VL] Vendor specific:                                 
                        [VA] Vendor specific: 0x0000000000000000
                        [VF] Vendor specific: 0x0000000000000000
                        [RV] Reserved: checksum good, 148 byte(s) reserved
                End
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [148 v1] Device Serial Number 00-0f-53-ff-ff-4d-8c-30
        Capabilities: [158 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [168 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [198 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 2, stride: 1, Device ID: 1a03
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 00000000a2800000 (64-bit, non-prefetchable)
                Region 2: Memory at 00000000a6908000 (64-bit, non-prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [1d8 v1] Transaction Processing Hints
                Device specific mode supported
                No steering table available
        Capabilities: [26c v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Kernel driver in use: sfc
        Kernel modules: sfc

Comment 25 Gerd Hoffmann 2023-05-22 14:22:15 UTC
> > What is lspci output for this device (on the host)?
> 
> # lspci -vv -s  0000:1a:00.0
> 1a:00.0 Ethernet controller: Solarflare Communications SFC9220 10/40G
> Ethernet Controller (rev 02)

>         Region 2: Memory at 9e000000 (64-bit, non-prefetchable) [size=8M]

So a 8M non-prefetchable bar.  Hmm.

The mmio window scaling is applied only to the prefetchable memory
window (which usually represent device memory and can be quite big).

Changing the window via '-device pcie-root-port.mem-reserve=...'
should work as workaround here.

The non-prefetchable memory window is 32-bit, so we don't have that
much address space available there.  Bumping the default size doesn't
look like a good plan.  There are two bars, so we'll need a 16M window,
which consume 1G address space with only 64 pcie root ports.

The 'PF' in the test name suggests this is a SR/IOV device.  Is there
a specific reason you assign the complete PF device instead of the VFs?

Comment 26 Xueqiang Wei 2023-05-22 18:20:59 UTC
Tested edk2 test loop with the scratch build mentioned in Comment 15, no new issue was found.

Versions:
kernel-5.14.0-306.el9.x86_64
qemu-kvm-8.0.0-2.el9
edk2-ovmf-20230301gitf80f052277c8-3.el9.bz2174749.20230515.1346.noarch


Edk2 test loop with scratch build edk2-ovmf-20230301gitf80f052277c8-3.el9.bz2174749.20230515.1346.noarch
Job link: http://virtqetools.lab.eng.pek2.redhat.com/kvm_autotest_job_log/?jobid=7875589

Comment 27 Guo, Zhiyi 2023-05-22 23:30:12 UTC
(In reply to Gerd Hoffmann from comment #25)
> > > What is lspci output for this device (on the host)?
> > 
> > # lspci -vv -s  0000:1a:00.0
> > 1a:00.0 Ethernet controller: Solarflare Communications SFC9220 10/40G
> > Ethernet Controller (rev 02)
> 
> >         Region 2: Memory at 9e000000 (64-bit, non-prefetchable) [size=8M]
> 
> So a 8M non-prefetchable bar.  Hmm.
> 
> The mmio window scaling is applied only to the prefetchable memory
> window (which usually represent device memory and can be quite big).
> 
> Changing the window via '-device pcie-root-port.mem-reserve=...'
> should work as workaround here.
> 
> The non-prefetchable memory window is 32-bit, so we don't have that
> much address space available there.  Bumping the default size doesn't
> look like a good plan.  There are two bars, so we'll need a 16M window,
> which consume 1G address space with only 64 pcie root ports.
> 
> The 'PF' in the test name suggests this is a SR/IOV device.  Is there
> a specific reason you assign the complete PF device instead of the VFs?

I think this question might be better to be answered by Alex, and see if we need to open another bug to search for a solution or document such limitation only

Comment 29 Alex Williamson 2023-05-23 16:42:15 UTC
(In reply to Guo, Zhiyi from comment #27)
> (In reply to Gerd Hoffmann from comment #25)
> > > > What is lspci output for this device (on the host)?
> > > 
> > > # lspci -vv -s  0000:1a:00.0
> > > 1a:00.0 Ethernet controller: Solarflare Communications SFC9220 10/40G
> > > Ethernet Controller (rev 02)
> > 
> > >         Region 2: Memory at 9e000000 (64-bit, non-prefetchable) [size=8M]
> > 
> > So a 8M non-prefetchable bar.  Hmm.
> > 
> > The mmio window scaling is applied only to the prefetchable memory
> > window (which usually represent device memory and can be quite big).
> > 
> > Changing the window via '-device pcie-root-port.mem-reserve=...'
> > should work as workaround here.
> > 
> > The non-prefetchable memory window is 32-bit, so we don't have that
> > much address space available there.  Bumping the default size doesn't
> > look like a good plan.  There are two bars, so we'll need a 16M window,
> > which consume 1G address space with only 64 pcie root ports.
> > 
> > The 'PF' in the test name suggests this is a SR/IOV device.  Is there
> > a specific reason you assign the complete PF device instead of the VFs?
> 
> I think this question might be better to be answered by Alex, and see if we
> need to open another bug to search for a solution or document such
> limitation only

I'm not sure what I'm supposed to answer here.  It's very unusual for a device to report a 64-bit, non-prefetchable BAR requirement, there is no way for a bridge to provide anything other than 32-bit non-prefetchable apertures, so these must fit within the 32-bit MMIO space.  It's even more absurd that the SR-IOV BARs for the device are also non-prefetchable.  Is there maybe a firmware update for this device?

We were recently pointed to an Insights report for customer NIC configurations (https://issues.redhat.com/browse/INSPEC-395).  There's not a single SolarFlare card there.

As Gerd says, we cannot allocate arbitrarily large non-prefetchable space, there's a small, finite range of 32-bit MMIO.  It seems sufficient to me if there are workarounds to make this device hot-pluggable, this extent of non-prefetchable space is simply not tenable for a default aperture margin.

Comment 30 liunana 2023-05-24 05:31:14 UTC
Hi Gerd,

CPU sanity test failed with booting old cpu models (Skylake-Client-noTSX, Broadwell-noTSX, Haswell-noTSX or older).
Test Env:
Host:
    edk2-ovmf-20230301gitf80f052277c8-3.el9.bz2174749.20230515.1346.noarch
    5.14.0-316.el9.x86_64
    qemu-kvm-8.0.0-2.el9.x86_64
    intel-eaglestream-spr-07.khw1.lab.eng.bos.redhat.com
    Model name:            Intel(R) Xeon(R) Platinum 8468H
Guest: RHEL9.3

And get the error firmware log
CPU[0BF]  APIC ID=00DF  SMBASE=7FFAF000  SaveState=7FFBEC00  Size=00000400
Stacks                   - 0x7F9B4000
mSmmStackSize            - 0x6000
PcdCpuSmmStackGuard      - 0x1
mXdSupported - 0x1
One Semaphore Size    = 0x40
Total Semaphores Size = 0xC140
PhysicalAddressBits = 48, 5LPageTable = 0.
5LevelPaging Needed             - 0
1GPageTable Support             - 0
PcdCpuSmmRestrictedMemoryAccess - 1
PhysicalAddressBits             - 40
Initialize IDT IST field for SMM Stack Guard
InstallProtocolInterface: 26EEB3DE-B689-492E-80F0-BE8BD7DA4BA7 7FFD4130
SMM IPL registered SMM Entry Point address 7FFEFD89
SmmInstallProtocolInterface: EB346B97-975F-4A9F-8B22-F8E92BB3D569 7FFD4170
SmmInstallProtocolInterface: 69B792EA-39CE-402D-A2A6-F721DE351DFE 7FFD4070
CpuSmm: SpinLock Size = 0x40, PcdCpuSmmMpTokenCountPerChunk = 0x40
SmmInstallProtocolInterface: 5D5450D7-990C-4180-A803-8E63F0608307 7FFD4220
SmmInstallProtocolInterface: 1D202CAB-C8AB-4D5C-94F7-3CFCC0D3D335 7FFD41E0
SmmInstallProtocolInterface: AA00D50B-4911-428F-B91A-A59DDB13E24C 7FFD4020
SMM CPU Module exit from SMRAM with EFI_SUCCESS
SMM IPL closed SMRAM window
CcMeasurementProtocol is not installed. - Not Found
Tcg2Protocol is not installed. - Not Found
None of Tcg2Protocol/CcMeasurementProtocol is installed.
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7D86F118
SmmInstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7FFEB6C0
Loading SMM driver at 0x0007F96C000 EntryPoint=0x0007F96F8F7 FvbServicesSmm.efi
QEMU Flash: Attempting flash detection at FFC00010
QemuFlashDetected => FD behaves as FLASH
QemuFlashDetected => Yes
Installing QEMU flash SMM FVB
SmmInstallProtocolInterface: D326D041-BD31-4C01-B5A8-628BE87F0653 7F96BEB0
SmmInstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 7F96BE18
CcMeasurementProtocol is not installed. - Not Found
Tcg2Protocol is not installed. - Not Found
None of Tcg2Protocol/CcMeasurementProtocol is installed.
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7D86F918
SmmInstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7FFEB2C0
Loading SMM driver at 0x0007F0B7000 EntryPoint=0x0007F1008BB VariableSmm.efi
mSmmMemLibInternalMaximumSupportAddress = 0xFFFFFFFFFF
VarCheckLibRegisterSetVariableCheckHandler - 0x7F0FC652 Success
VarCheckLibRegisterSetVariableCheckHandler - 0x7F0FADBF Success
Variable driver common space: 0x3FF9C 0x3FF9C 0x3FF9C
Variable driver will work with auth variable format!

ASSERT_EFI_ERROR (Status = Out of Resources)
ASSERT /builddir/build/BUILD/edk2-f80f052277c8/MdeModulePkg/Universal/Variable/RuntimeDxe/VariableSmm.c(1164): !(((INTN)(RETURN_STATUS)(Status)) < 0)


Could you please help this, thanks.


Best regards
Nana

Comment 31 liunana 2023-05-24 05:36:01 UTC
Add the full command line for Comment 30.

/usr/libexec/qemu-kvm \
     -S  \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \
     -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/home/filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' \
     -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \
     -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
     -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
     -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
     -nodefaults \
     -device '{"driver": "VGA", "bus": "pcie-pci-bridge-0", "addr": "0x1"}' \
     -m 125952 \
     -object '{"size": 132070244352, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
     -smp 192,maxcpus=192,cores=96,threads=1,dies=1,sockets=2  \
     -cpu 'Skylake-Client-noTSX-IBRS',enforce,+kvm_pv_unhalt \
     -chardev socket,server=on,wait=off,path=/tmp/monitor-qmpmonitor,id=qmp_id_qmpmonitor1  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -device '{"ioport": 1285, "driver": "pvpanic", "id": "idVJSPrN"}' \
     -chardev socket,server=on,wait=off,path=/tmp/serial-serial0,id=chardev_serial0 \
     -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
     -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
     -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
     -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
     -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
     -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0"}' \
     -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio-scsi-ovmf.qcow2", "cache": {"direct": true, "no-flush": false}}' \
     -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
     -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
     -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
     -device '{"driver": "virtio-net-pci", "mac": "9a:49:ba:f9:e1:88", "id": "idx7j24n", "netdev": "idLn9MLW", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
     -netdev tap,id=idLn9MLW,vhost=on \
     -vnc :0  \
     -rtc base=localtime,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -monitor stdio \
     -chardev file,id=firmware,path=/tmp/edk2.log \
     -device isa-debugcon,iobase=0x402,chardev=firmware \

Comment 33 Gerd Hoffmann 2023-05-24 06:59:08 UTC
>      -smp 192,maxcpus=192,cores=96,threads=1,dies=1,sockets=2  \

With that many cpus you most likely need more tseg memory.

From docs/interop/firmware.json (qemu.git):
<quote>
#                Furthermore, a large guest-physical address space
#                (comprising guest RAM, memory hotplug range, and 64-bit
#                PCI MMIO aperture), and/or a high VCPU count, may
#                present high SMRAM requirements from the firmware. On
#                the "pc-q35-*" machine types of the @i386 and @x86_64
#                emulation targets, the SMRAM size may be increased
#                above the default 16MB with the "-global
#                mch.extended-tseg-mbytes=uint16" option. As a rule of
#                thumb, the default 16MB size suffices for 1TB of
#                guest-phys address space and a few tens of VCPUs; for
#                every further TB of guest-phys address space, add 8MB
#                of SMRAM. 48MB should suffice for 4TB of guest-phys
#                address space and 2-3 hundred VCPUs.
</quote>

I'd suggest to start with '-global mch.extended-tseg-mbytes=32'.

Comment 34 liunana 2023-05-24 12:19:20 UTC
(In reply to Gerd Hoffmann from comment #33)
> >      -smp 192,maxcpus=192,cores=96,threads=1,dies=1,sockets=2  \
> 
> With that many cpus you most likely need more tseg memory.
> 
> From docs/interop/firmware.json (qemu.git):
> <quote>
> #                Furthermore, a large guest-physical address space
> #                (comprising guest RAM, memory hotplug range, and 64-bit
> #                PCI MMIO aperture), and/or a high VCPU count, may
> #                present high SMRAM requirements from the firmware. On
> #                the "pc-q35-*" machine types of the @i386 and @x86_64
> #                emulation targets, the SMRAM size may be increased
> #                above the default 16MB with the "-global
> #                mch.extended-tseg-mbytes=uint16" option. As a rule of
> #                thumb, the default 16MB size suffices for 1TB of
> #                guest-phys address space and a few tens of VCPUs; for
> #                every further TB of guest-phys address space, add 8MB
> #                of SMRAM. 48MB should suffice for 4TB of guest-phys
> #                address space and 2-3 hundred VCPUs.
> </quote>
> 
> I'd suggest to start with '-global mch.extended-tseg-mbytes=32'.

Thanks, vm works with adding this qemu commanline.
Will we set the this value default automatically in future?


Best regards
Nana

Comment 35 Gerd Hoffmann 2023-05-25 08:38:38 UTC
> > I'd suggest to start with '-global mch.extended-tseg-mbytes=32'.
> 
> Thanks, vm works with adding this qemu commanline.
> Will we set the this value default automatically in future?

Default is 16 not 32.  Usually 16 works fine, but for large VMs it might not be enough ...
There are no plans to change the default.

Comment 36 Xueqiang Wei 2023-05-25 17:21:16 UTC
(In reply to liunana from comment #34)
> (In reply to Gerd Hoffmann from comment #33)
> > >      -smp 192,maxcpus=192,cores=96,threads=1,dies=1,sockets=2  \
> > 
> > With that many cpus you most likely need more tseg memory.
> > 
> > From docs/interop/firmware.json (qemu.git):
> > <quote>
> > #                Furthermore, a large guest-physical address space
> > #                (comprising guest RAM, memory hotplug range, and 64-bit
> > #                PCI MMIO aperture), and/or a high VCPU count, may
> > #                present high SMRAM requirements from the firmware. On
> > #                the "pc-q35-*" machine types of the @i386 and @x86_64
> > #                emulation targets, the SMRAM size may be increased
> > #                above the default 16MB with the "-global
> > #                mch.extended-tseg-mbytes=uint16" option. As a rule of
> > #                thumb, the default 16MB size suffices for 1TB of
> > #                guest-phys address space and a few tens of VCPUs; for
> > #                every further TB of guest-phys address space, add 8MB
> > #                of SMRAM. 48MB should suffice for 4TB of guest-phys
> > #                address space and 2-3 hundred VCPUs.
> > </quote>
> > 
> > I'd suggest to start with '-global mch.extended-tseg-mbytes=32'.
> 
> Thanks, vm works with adding this qemu commanline.
> Will we set the this value default automatically in future?
> 
> 
> Best regards
> Nana


Hi Nana,

There is a bug, Bug 1866110 - automated TSEG size calculation. Igor set ITR to 9.4.

Comment 39 Gerd Hoffmann 2023-06-21 09:37:36 UTC
Note to self: when re-enabling also backport this commit:

commit c1e853769046b322690ad336fdb98966757e7414 (github.kraxel/master)
Author: Gerd Hoffmann <kraxel>
Date:   Thu Jun 1 09:57:31 2023 +0200

    OvmfPkg/PlatformInitLib: limit phys-bits to 46.
    
    Older linux kernels have problems with phys-bits larger than 46,
    ubuntu 18.04 (kernel 4.15) has been reported to be affected.
    
    Reduce phys-bits limit from 47 to 46.
    
    Reported-by: Fiona Ebner <f.ebner>
    Signed-off-by: Gerd Hoffmann <kraxel>

Comment 40 Gerd Hoffmann 2023-06-26 10:46:03 UTC
(In reply to Gerd Hoffmann from comment #15)
> New test build:
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=2216516

Seems to be expired now, new test build (no changes):
https://kojihub.stream.centos.org/koji/taskinfo?taskID=2399507

Comment 42 Gerd Hoffmann 2023-06-29 11:42:35 UTC
New scratch build (on top of the 2023-05 rebase this time):
https://kojihub.stream.centos.org/koji/taskinfo?taskID=2424988

Comment 43 Gerd Hoffmann 2023-06-29 12:27:35 UTC
Testing: should be tested together with the upcoming libvirt-0.9.5 release
(see bug 2171860, I just noticed there already release candidate builds).

Comment 45 Yanghang Liu 2023-07-05 13:26:12 UTC
(In reply to Alex Williamson from comment #29) 
> > Hi Gerd,
> > 
> > My test result shows the PF can be hot-plugged into domain successfully now.
> > 
> > Test env:
> > edk2-ovmf-20230301gitf80f052277c8-3.el9.bz2174749.20230515.1346.noarch
> > 
> > Test step:
> > (1) start a domain
> > # virt-install --machine=q35 --noreboot --name=rhel93 --memory=4096
> > --vcpus=4 --graphics type=vnc,port=5993,listen=0.0.0.0 --boot=uefi --network
> > bridge=switch,model=virtio,mac=52:54:00:00:93:93 --import --noautoconsole
> > --disk
> > path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,
> > size=20 --osinfo detect=on,require=off
> > 
> > (2) hot-plug a XL710 PF into domain
> > # lspci -s 87:00.0
> > 87:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for
> > 40GbE QSFP+ (rev 02)
> > # /bin/virsh attach-device rhel93 /tmp/device/0000:87:00.0.xml
> > Device attached successfully
> > 
> > (3) check the PF status in the domain
> > # ifconfig 
> > ...
> > enp4s0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
> >         ether 3c:fd:fe:15:90:a8  txqueuelen 1000  (Ethernet)
> >         RX packets 0  bytes 0 (0.0 B)
> >         RX errors 0  dropped 0  overruns 0  frame 0
> >         TX packets 0  bytes 0 (0.0 B)
> >         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> > # dmesg 
> > [   38.456761] pci 0000:04:00.0: [8086:1583] type 00 class 0x020000
> > [   38.457666] pci 0000:04:00.0: reg 0x10: [mem 0x00000000-0x00ffffff 64bit
> > pref]
> > [   38.458598] pci 0000:04:00.0: reg 0x1c: [mem 0x00000000-0x00007fff 64bit
> > pref]
> > [   38.459323] pci 0000:04:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> > [   38.459874] pci 0000:04:00.0: Max Payload Size set to 128 (was 256, max
> > 2048)
> > [   38.471403] pci 0000:04:00.0: BAR 0: assigned [mem
> > 0x381800000000-0x381800ffffff 64bit pref]
> > [   38.472489] pci 0000:04:00.0: BAR 6: assigned [mem 0xc2400000-0xc247ffff
> > pref]
> > [   38.472496] pci 0000:04:00.0: BAR 3: assigned [mem
> > 0x381801000000-0x381801007fff 64bit pref]
> > [   38.514703] i40e: Intel(R) Ethernet Connection XL710 Network Driver
> > [   38.514705] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
> > [   38.515275] i40e 0000:04:00.0: enabling device (0000 -> 0002)
> > [   38.536779] i40e 0000:04:00.0: fw 9.80.70867 api 1.15 nvm 9.00 0x8000cadc
> > 21.5.9 [8086:1583] [8086:0006]
> > [   38.603332] i40e 0000:04:00.0: MAC address: 3c:fd:fe:15:90:a8
> > [   38.604352] i40e 0000:04:00.0: FW LLDP is enabled
> > [   38.613157] i40e 0000:04:00.0: PCI-Express: Speed 8.0GT/s Width x8
> > [   38.613832] i40e 0000:04:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 4
> > RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
> > [   38.625170] i40e 0000:04:00.0 enp4s0: renamed from eth0
> > 
> > The full domain dmesg of different PFs is as following: 
> > http://10.73.72.41/log/bug/Bug2174749/2023_05_15_08:15:19_XL710
> > http://10.73.72.41/log/bug/Bug2174749/2023_05_15_08:30:25_82599ES
> > http://10.73.72.41/log/bug/Bug2174749/2023_05_15_08:58:13_MT2892
> > http://10.73.72.41/log/bug/Bug2174749/2023_05_15_09:01:27_QL41112
> 
> 
> 
> Hi Gerd,
> 
> My test result shows the SPF9220 PF can still not be hot-plugged into domain
> even with
> edk2-ovmf-20230301gitf80f052277c8-3.el9.bz2174749.20230515.1346.noarch.
> 
> The full domain dmesg:  
> http://10.73.72.41/log/bug/Bug2174749/2023_05_22_17:08:19_SFC9220_BZ
> 
> Only SPF9220 PF test failed currently.
> 


My test result shows current edk2-ovmf build fix does not apply for hot-plugging a sfc PF/VF scenario .  

May I ask if we plan to fix it ?  If no, can we request the doc team to draft a known issue for it ?

Comment 46 Gerd Hoffmann 2023-07-06 08:37:45 UTC
> My test result shows current edk2-ovmf build fix does not apply for
> hot-plugging a sfc PF/VF scenario .  
> 
> May I ask if we plan to fix it ?  If no, can we request the doc team to
> draft a known issue for it ?

See comment 25, there is no easy automatic way for non-prefetchable bars,
so we'll continue to depend on manual configuration of the bridge windows.

Comment 48 Yanghang Liu 2023-07-11 06:41:28 UTC
(In reply to Gerd Hoffmann from comment #46)
> > My test result shows current edk2-ovmf build fix does not apply for
> > hot-plugging a sfc PF/VF scenario .  
> > 
> > May I ask if we plan to fix it ?  If no, can we request the doc team to
> > draft a known issue for it ?
> 
> See comment 25, there is no easy automatic way for non-prefetchable bars,
> so we'll continue to depend on manual configuration of the bridge windows.


Hi Gerd, 

I have Checked the SFC9220 PF and VF capabilities, they are all non-prefetchable devices.

For the bug whose scenario is hot-plug a sfc PF/VF into vm , we can close them as WONFIX, am I right ?


Those bugs are like:
Bug 2209571 - [sfc] no VF interface in the VM after attached the VF to the VM
Bug 2137782 - [sfc] could not enable MSI-X & failed to create NIC


# lspci -v -s 0000:1a:00.1
1a:00.1 Ethernet controller: Solarflare Communications SFC9220 10/40G Ethernet Controller (rev 02)
        Subsystem: Solarflare Communications SFN8522-R2 8000 Series 10G Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 363, NUMA node 0, IOMMU group 34
        I/O ports at 4000 [size=256]
        Memory at 9d800000 (64-bit, non-prefetchable) [size=8M]
        Memory at a6800000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at a6a80000 [disabled] [size=256K]
        Capabilities: [40] Power Management version 3
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Device Serial Number 00-0f-53-ff-ff-4d-8c-30
        Capabilities: [158] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [198] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [1d8] Transaction Processing Hints
        Kernel driver in use: sfc
        Kernel modules: sfc


# lspci -v -s 0000:1a:08.2
1a:08.2 Ethernet controller: Solarflare Communications SFC9220 10/40G Ethernet Controller (Virtual Function) (rev 02)
        Subsystem: Solarflare Communications Device 8017
        Flags: bus master, fast devsel, latency 0, NUMA node 0, IOMMU group 188
        Memory at 9e800000 (64-bit, non-prefetchable) [virtual] [size=1M]
        Memory at a6804000 (64-bit, non-prefetchable) [virtual] [size=16K]
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [110] Transaction Processing Hints
        Kernel driver in use: sfc
        Kernel modules: sfc

Comment 49 Yanan Fu 2023-07-11 09:15:12 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 50 Gerd Hoffmann 2023-07-11 11:56:15 UTC
> I have Checked the SFC9220 PF and VF capabilities, they are all
> non-prefetchable devices.

> # lspci -v -s 0000:1a:00.1
> 1a:00.1 Ethernet controller: Solarflare Communications SFC9220 10/40G
> Ethernet Controller (rev 02)

>         Memory at 9d800000 (64-bit, non-prefetchable) [size=8M]

This is the PF I assume?
non-prefetchable window size is 2M, so this is too big.

Requires manual pcie root port configuration using the
mem-reserve= property as discussed previously.

> # lspci -v -s 0000:1a:08.2
> 1a:08.2 Ethernet controller: Solarflare Communications SFC9220 10/40G
> Ethernet Controller (Virtual Function) (rev 02)

>         Memory at 9e800000 (64-bit, non-prefetchable) [virtual] [size=1M]
>         Memory at a6804000 (64-bit, non-prefetchable) [virtual] [size=16K]

This probably is the VF ...
This should work, there two pci bars should fit into the 2M bridge window.

Comment 51 Guo, Zhiyi 2023-07-11 13:36:04 UTC
Test GPU passthrough against edk2-20230524-2.el9 and GPU devices with large video memory, I don't see any problems:

The devices I used:
passthrough 4x Nvidia V100 GPUs into a single rhel 9.3 and windows 11 VM on Intel host.
The GPU:
61:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
        Subsystem: NVIDIA Corporation Device 1249
        Flags: bus master, fast devsel, latency 0, IRQ 318, NUMA node 0, IOMMU group 5
        Memory at c4000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3bf000000000 (64-bit, prefetchable) [size=32G]
        Memory at 3bf800000000 (64-bit, prefetchable) [size=32M]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Capabilities: [ac0] Designated Vendor-Specific: Vendor=10de ID=0001 Rev=1 Len=12 <?>
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_vgpu_vfio, nvidia

The host cpu:
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  80
  On-line CPU(s) list:   0-79
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
    BIOS Model name:     Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz

passthrough 1x Nvidia A100 GPU into a single rhel 9.3 and windows 11 VM on AMD host.
The GPU:
41:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1)
        Subsystem: NVIDIA Corporation Device 145f
        Physical Slot: 1
        Flags: bus master, fast devsel, latency 0, IRQ 211, IOMMU group 43
        Memory at b0000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 26000000000 (64-bit, prefetchable) [size=64G]
        Memory at 28020000000 (64-bit, prefetchable) [size=32M]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] Null
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [c8] MSI-X: Enable+ Count=6 Masked-
        Capabilities: [100] Virtual Channel
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Capabilities: [bb0] Physical Resizable BAR
        Capabilities: [bcc] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [c14] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
        Capabilities: [d00] Lane Margining at the Receiver <?>
        Capabilities: [e00] Data Link Feature <?>
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_vgpu_vfio, nvidia

The host cpu:
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  BIOS Vendor ID:        Advanced Micro Devices, Inc.
  Model name:            AMD EPYC 7262 8-Core Processor
    BIOS Model name:     AMD EPYC 7262 8-Core Processor

These GPUs are identified by nvidia driver correctly inside both rhel and windows VM and GPU function tests are also pass

qemu & kernel I used: qemu-kvm-8.0.0-7.el9.x86_64 & 5.14.0-335.el9.x86_64

Comment 52 Paolo Bonzini 2023-07-12 12:18:57 UTC
> Host CPU info:
> address sizes	: 43 bits physical, 48 bits virtual
>
> Guest CPU info:
> address sizes	: 48 bits physical, 48 bits virtual

The difference is due to the memory encryption bits.  The size of the address space is 43 bits, but the size of the *physical address field in page tables* is 48 bits on both host and guest.

Because memory encryption is not enabled in the guest, the address space reduction due to encryption is not included in /proc/cpuinfo, which only reports the size of the physical address field in page tables.

So it's ugly, but it's expected.

Comment 59 Yanghang Liu 2023-07-24 03:26:07 UTC
Test env: edk2-20230524-2.el9

Test result:

2023-07-14 18:14:55 | PASS - hot plug 1 MT2892 pf into rhel93 domain
2023-07-14 18:17:06 | PASS - hot plug 2 MT2892 pf into rhel93 domain
2023-07-14 18:09:48 | PASS - hot plug 1 MT2892 vf into rhel93 domain
2023-07-14 18:12:50 | PASS - hot plug 7 MT2892 vf into rhel93 domain

2023-07-14 18:22:19 | PASS - hot plug 1 QL41112 pf into rhel93 domain
2023-07-14 18:27:46 | PASS - hot plug 2 QL41112 pf into rhel93 domain
2023-07-17 11:08:12 | PASS - hot plug 1 QL41112 vf into rhel93 domain
2023-07-17 11:14:37 | PASS - hot plug 7 QL41112 vf into rhel93 domain

2023-07-14 18:20:39 | PASS - hot plug 1 82599ES vf into rhel93 domain
2023-07-14 18:23:32 | PASS - hot plug 7 82599ES vf into rhel93 domain
2023-07-14 18:16:27 | PASS - hot plug 1 82599ES pf into rhel93 domain
2023-07-14 18:18:39 | PASS - hot plug 2 82599ES pf into rhel93 domain

2023-07-14 18:11:45 | PASS - hot plug 1 E810 vf into rhel93 domain
2023-07-14 18:14:36 | PASS - hot plug 7 E810 vf into rhel93 domain
2023-07-14 18:07:52 | PASS - hot plug 1 E810 pf into rhel93 domain
2023-07-14 18:09:52 | PASS - hot plug 2 E810 pf into rhel93 domain

2023-07-17 10:57:10 | PASS - hot plug 1 XXV710 pf into rhel93 domain
2023-07-17 11:02:17 | PASS - hot plug 2 XXV710 pf into rhel93 domain
2023-07-17 11:07:24 | PASS - hot plug 1 XXV710 vf into rhel93 domain
2023-07-17 11:13:18 | PASS - hot plug 7 XXV710 vf into rhel93 domain

2023-07-19 02:43:01 | PASS - hot plug 1 SFC9220 pf to rhel93 domain whose pcie-root-port is 16M
2023-07-19 02:48:23 | PASS - hot plug 2 SFC9220 pf into rhel93 domain whose pcie-root-port is 16M
2023-07-17 11:04:35 | PASS - hot plug 1 SFC9220 vf to rhel93 domain
2023-07-17 11:10:41 | PASS - hot plug 7 SFC9220 vf into rhel93 domain

Comment 60 liunana 2023-07-24 05:47:12 UTC
CPU model sanity test on AMD Genoa PASS.

Test Env:
    edk2-ovmf-20230524-2.el9.noarch
    5.14.0-341.el9.x86_64
    qemu-kvm-8.0.0-8.el9.x86_64
    libvirt-client-9.5.0-3.el9.x86_64
    amd-genoa-02.khw1.lab.eng.bos.redhat.com
Guest: latest rhel9.3

Comment 61 liunana 2023-07-24 07:34:19 UTC
CPU other sanity test on Intel Icelake PASS:

Test results:
https://beaker.engineering.redhat.com/recipes/14283495#task163423385,task163423386

Comment 63 Xueqiang Wei 2023-08-01 14:01:31 UTC
Run the following test loops, no new bug was found.

Versions:
kernel-5.14.0-333.el9.x86_64
qemu-kvm-8.0.0-6.el9
edk2-ovmf-20230524-2.el9.noarch

1. Qemu_gating_test_rhel9
Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/qemu_gating_test_rhel9_with_edk2-ovmf-20230524-2.el9/results.html
2. Rhel8.6, rhel8.7, rhel8.8, rhel8.9, rhel9.0, rhel9.1, rhel9.2 secure boot with edk2-20230524-2.el9
Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/rhel860-rhel920_secure_boot_with_edk2-20230524-2.el9/results.html
3. win11_secure_boot_with_edk2-20230524-2.el9
Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/win11_secure_boot_with_edk2-20230524-2.el9/results.html
4. edk2_test_loop_on_intel_host
Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/edk2_test_with_edk2-20230524-2.el9/results.html
 5. Edk2_test_loop_on_amd_host
Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/edk2_test_with_edk2-20230524-2.el9_amd_host/results.html
Existing bug: Bug 2168446 - Booting VM failed on AMD EPYC 7252 host with npt=0



Tested parameter host-phys-bits-limit on amd host and intel host, hit one low priority issue which tracking by RHEL-917.

Details:
Test on an AMD host
check host phys-bits
# lscpu | grep "Address sizes"
Address sizes:                   43 bits physical, 48 bits virtual

1. host-phys-bits-limit testing with -1 (i.e. host-phys-bits=on,host-phys-bits-limit=-1)
get the error message: (qemu) qemu-kvm: can't apply global EPYC-Rome-x86_64-cpu.host-phys-bits-limit=-1: Parameter 'host-phys-bits-limit' expects uint8_t
2. host-phys-bits-limit testing with 1 (i.e. host-phys-bits=on,host-phys-bits-limit=1)
get the prompt message: qemu-kvm: phys-bits should be between 32 and 52  (but is 1)
3. host-phys-bits-limit testing with 36(i.e. host-phys-bits=on,host-phys-bits-limit=36)
guest boot up successfully, and the guest phys-bit is 36
# lscpu | grep "Address sizes"
Address sizes:                   36 bits physical, 48 bits virtual
4. host-phys-bits-limit testing with 40(i.e. host-phys-bits=on,host-phys-bits-limit=40)
guest boot up successfully, and the guest phys-bit is 40
# lscpu | grep "Address sizes"
Address sizes:                   40 bits physical, 48 bits virtual
5. host-phys-bits-limit testing with 43(i.e. host-phys-bits=on,host-phys-bits-limit=43)
guest boot up successfully, and the guest phys-bit is 43
# lscpu | grep "Address sizes"
Address sizes:                   43 bits physical, 48 bits virtual
6. host-phys-bits-limit testing with 48(i.e. host-phys-bits=on,host-phys-bits-limit=48)
guest boot up successfully, and the guest phys-bit is 48
# lscpu | grep "Address sizes"
Address sizes:                   48 bits physical, 48 bits virtual
And get the output from edk2 debug log:
PlatformAddressWidthFromCpuid: Signature: 'AuthenticAMD', PhysBits: 48, QemuQuirk: On, Valid: Yes
PlatformAddressWidthFromCpuid: limit PhysBits to 46 (avoid 5-level paging)
7. host-phys-bits-limit testing with 53(i.e. host-phys-bits=on,host-phys-bits-limit=53)
the guest boot successfully, no error message. tracking by RHEL-917.


Test on an intel host
check host phys-bits
# lscpu | grep "Address sizes"
Address sizes:                   46 bits physical, 57bits virtual

1. host-phys-bits-limit testing with -1(i.e. host-phys-bits=on,host-phys-bits-limit=-1)
get the error message: (qemu) qemu-kvm: can't apply global Icelake-Server-x86_64-cpu.host-phys-bits-limit=-1: Parameter 'host-phys-bits-limit' expects uint8_t
2. host-phys-bits-limit testing with 1(i.e. host-phys-bits=on,host-phys-bits-limit=1)
get the message: qemu-kvm: phys-bits should be between 32 and 52  (but is 1)
3. host-phys-bits-limit testing with 36(i.e. host-phys-bits=on,host-phys-bits-limit=36)
get the prompt message: (qemu) qemu-kvm: Address space limit 0xfffffffff < 0x17bfffffff phys-bits too low (36)
4. host-phys-bits-limit testing with 39(i.e. host-phys-bits=on,host-phys-bits-limit=39)
guest boot up successfully, and the guest phys-bit is 39
# lscpu |grep "Address sizes"
Address sizes:                   39 bits physical, 57 bits virtual
5. host-phys-bits-limit testing with 46(i.e. host-phys-bits=on,host-phys-bits-limit=46)
guest boot up successfully, and the guest phys-bit is 46
# lscpu |grep "Address sizes"
Address sizes:                   46 bits physical, 57 bits virtual
6. host-phys-bits-limit testing with 52(i.e. host-phys-bits=on,host-phys-bits-limit=52)
guest boot up successfully, and the guest phys-bit is 46
# lscpu | grep "Address sizes"
Address sizes:                   46 bits physical, 57 bits virtual
7. host-phys-bits-limit testing with 53(i.e. host-phys-bits=on,host-phys-bits-limit=53)
the guest boot successfully, no error message. tracking by RHEL-917.


Test on an intel host with 52 phys-bits
# lscpu |grep Address
Address sizes:                   52 bits physical, 57 bits virtual

1. host-phys-bits-limit testing with 36(i.e. host-phys-bits=on,host-phys-bits-limit=36)
guest boot up successfully, and the guest phys-bit is 36.
# lscpu | grep "Address sizes"
Address sizes:                   36 bits physical, 57 bits virtual
2. host-phys-bits-limit testing with 52(i.e. host-phys-bits=on,host-phys-bits-limit=52)
guest boot up successfully, and the guest phys-bit is 52.
# lscpu |grep "Address sizes"
Address sizes:                   52 bits physical, 57 bits virtual
Get the following message from edk2 debug log:
PlatformAddressWidthFromCpuid: Signature: 'GenuineIntel', PhysBits: 52, QemuQuirk: On, Valid: Yes
PlatformAddressWidthFromCpuid: limit PhysBits to 46 (avoid 5-level paging)

Comment 64 Li Xiaohui 2023-08-02 10:12:54 UTC
Migrate between hosts that have below physical address size on qemu-kvm-8.0.0-10.el9.x86_64 and edk2-ovmf-20230524-2.el9.noarch, all pass.


1. Intel
39 <-> 46
46 <-> 52
39 <-> 52

2. AMD
43 <-> 48

Comment 65 Xueqiang Wei 2023-08-02 14:40:35 UTC
Thank you Zhiyi, Yanbin, Yanghang, Nana, Mario and Xiaohui. Many thanks.
According to Comment 51, Comment 57, Comment 59, Comment 60, Comment 61, Comment 62, Comment 63 and Comment 64, set status to VERIFIED.

Comment 68 errata-xmlrpc 2023-11-07 08:24:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: edk2 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6330


Note You need to log in before you can comment on or make changes to this bug.