RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1975776 - [sfc][pc-i440fx] the pc + sfc9220 pf vm keeps rebooting after starting it
Summary: [sfc][pc-i440fx] the pc + sfc9220 pf vm keeps rebooting after starting it
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: unspecified
Hardware: x86_64
OS: Linux
low
low
Target Milestone: beta
: ---
Assignee: Laszlo Ersek
QA Contact: Yanghang Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-24 11:49 UTC by Yanghang Liu
Modified: 2023-12-27 08:01 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-09 12:04:15 UTC
Type: ---
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
proposed patch: vfio/pci: hide ROM BAR on SFC9220 10/40G Ethernet Controller PF (1.79 KB, patch)
2023-07-24 20:03 UTC, Laszlo Ersek
no flags Details | Diff

Description Yanghang Liu 2021-06-24 11:49:44 UTC
Description of problem:
the pc + sfc9220 pf vm keeps rebooting after start it.

Version-Release number of selected component (if applicable):
host:
4.18.0-316.el8.x86_64
qemu-kvm-6.0.0-20.module+el8.5.0+11499+199527ef.x86_64
guest:
4.18.0-314.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. start a pc + sfc9220 pf vm
The simple qemu cmd line is as following:
/usr/libexec/qemu-kvm -name rhel85 -M pc -enable-kvm \
-monitor stdio \
-nodefaults \
-m 4G \
-boot menu=on \
-cpu Haswell-noTSX-IBRS \
-smp 8  \
-qmp tcp:0:5555,server,nowait \
-blockdev node-name=back_image,driver=file,cache.direct=on,cache.no-flush=off,filename=/nfsmount/migra_test/RHEL85.qcow2,aio=threads \
-blockdev node-name=drive-virtio-disk0,driver=qcow2,cache.direct=on,cache.no-flush=off,file=back_image \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=disk0 \
-vnc :0 \
-device vfio-pci,host=0000:1a:00.0 \
-device VGA,id=video1 \

The full domain xml is in the attachment.


2. check the qmp log

# telnet 10.73.73.75 5555
Trying 10.73.73.75...
Connected to 10.73.73.75.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 6}, "package": "qemu-kvm-6.0.0-20.module+el8.5.0+11499+199527ef"}, "capabilities": ["oob"]}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"timestamp": {"seconds": 1624532243, "microseconds": 998053}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532244, "microseconds": 112540}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532244, "microseconds": 567002}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532244, "microseconds": 680579}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532245, "microseconds": 134119}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532245, "microseconds": 248491}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532245, "microseconds": 693982}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532245, "microseconds": 808532}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532246, "microseconds": 255050}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532246, "microseconds": 368558}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532246, "microseconds": 814327}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532246, "microseconds": 928571}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532247, "microseconds": 375161}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532247, "microseconds": 496590}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532247, "microseconds": 942182}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532248, "microseconds": 56401}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532248, "microseconds": 503101}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532248, "microseconds": 616519}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532249, "microseconds": 70171}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532249, "microseconds": 184489}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532249, "microseconds": 631180}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532249, "microseconds": 744558}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532250, "microseconds": 190046}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532250, "microseconds": 304532}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532250, "microseconds": 751186}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532250, "microseconds": 864547}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532251, "microseconds": 310195}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532251, "microseconds": 424525}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532251, "microseconds": 871193}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532251, "microseconds": 984608}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532252, "microseconds": 430129}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532252, "microseconds": 544552}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532252, "microseconds": 999249}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532253, "microseconds": 112509}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532253, "microseconds": 567132}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532253, "microseconds": 680611}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532254, "microseconds": 133474}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532254, "microseconds": 248560}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532254, "microseconds": 694077}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532254, "microseconds": 808622}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532255, "microseconds": 261330}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532255, "microseconds": 376606}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532255, "microseconds": 829089}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532255, "microseconds": 944600}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532256, "microseconds": 398067}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532256, "microseconds": 512623}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532256, "microseconds": 965329}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532257, "microseconds": 80640}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532257, "microseconds": 525970}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1624532257, "microseconds": 640588}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
...


Actual results:
The vm can not be started (The vm keeps rebooting)

Expected results:
The vm can be started successfully


Additional info:
(1) The same vm can be started successfully with the MT2892 PF(mlx5_core),XL710 PF(i40e),82599ES(ixgbe).

(2)
# lshw -c network -businfo
Bus info          Device     Class          Description
=======================================================
pci@0000:1a:00.0  ens6f0np0  network        SFC9220 10/40G Ethernet Controller
pci@0000:1a:00.1  ens6f1np1  network        SFC9220 10/40G Ethernet Controller


# lspci -v -s 1a:00.0
1a:00.0 Ethernet controller: Solarflare Communications SFC9220 10/40G Ethernet Controller (rev 02)
	Subsystem: Solarflare Communications SFN8522-R2 8000 Series 10G Adapter
	Flags: bus master, fast devsel, latency 0, IRQ 86, NUMA node 0, IOMMU group 33
	I/O ports at 4100 [size=256]
	Memory at 9e000000 (64-bit, non-prefetchable) [size=8M]
	Memory at a6904000 (64-bit, non-prefetchable) [size=16K]
	Expansion ROM at a6a40000 [disabled] [size=256K]
	Capabilities: [40] Power Management version 3
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [148] Device Serial Number 00-0f-53-ff-ff-4d-8c-30
	Capabilities: [158] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [168] Secondary PCI Express
	Capabilities: [198] Single Root I/O Virtualization (SR-IOV)
	Capabilities: [1d8] Transaction Processing Hints
	Capabilities: [26c] L1 PM Substates
	Kernel driver in use: sfc
	Kernel modules: sfc


# ethtool -i ens6f0np0
driver: sfc
version: 4.18.0-316.el8.x86_64
firmware-version: 8.0.0.1015 rx0 tx0
expansion-rom-version: 
bus-info: 0000:1a:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

Comment 2 Yanghang Liu 2021-06-24 12:03:02 UTC
The bug can also reproduced with libvirt-7.4.0-1.module+el8.5.0.

The full xml is in the attachment.

Comment 4 John Ferlan 2021-07-07 18:28:56 UTC
Ariel - looks to be a virt networking RHEL8 related bug from the needs to be triaged virt-maint backlog

Comment 6 Laurent Vivier 2021-07-08 16:40:46 UTC
Yanghang,

is this a regression?
Could you try to retrieve some guest logs?

I think if you remove the VGA card and use a serial console it would be easier to take the logs
(remove "-vnc :0 -device VGA,id=video1", add "-nographic -serial stdio")

But from a first point of view, it sounds more like a VFIO problem than a networking problem.

Comment 7 Yanghang Liu 2021-07-12 06:55:35 UTC
> is this a regression?


All the following kernel versions of vm can also reproduce this problem.

  7.9 -- 3.10.0-1160.el7.x86_64

  8.0 -- 4.18.0-80.11.1.el8_0.x86_64 

  8.1 -- 4.18.0-147.el8.x86_64  

  8.2 -- 4.18.0-193.el8.x86_64  

  8.3 -- 4.18.0-240.el8.x86_64

  8.4 -- 4.18.0-305.el8.x86_64


> Could you try to retrieve some guest logs?

> I think if you remove the VGA card and use a serial console it would be easier to take the logs 
> (remove "-vnc :0 -device VGA,id=video1", add "-nographic -serial stdio")

Just trying use the following qemu cmd line to connect to the guest console:

    -chardev socket,id=charserial0,path=/tmp/testserial,server=on,wait=off \
    -device isa-serial,chardev=charserial0,id=serial0 \


But it seems that I could not get any other guest logs.

The vm keeps restarting on the following startup page. (The vm screenshot is in the attachment)

    # nc -U /tmp/testserial

    SeaBIOS (version 1.13.0-2.module+el8.3.0+7353+9de0a3cc)
    Solarflare Boot Manager (v5.2.2.1006)
    Solarflare Communications 2008-2019
    gPXE (http://etherboot.org) - 00:03.0 C000 PCI2.10 PnP PMM+BFF90A30+BFED0A30 C000
    Solarflare starting execution...  

If my understanding is not correct, please feel free to let me know.

Comment 8 Laurent Vivier 2021-07-15 17:45:36 UTC
Alex,

it sounds more like a VFIO problem than a virt networking one, could you have a look?

Thanks

Comment 9 Alex Williamson 2021-07-15 18:14:51 UTC
(In reply to Yanghang Liu from comment #7)
> > is this a regression?
> 
> 
> All the following kernel versions of vm can also reproduce this problem.
> 
>   7.9 -- 3.10.0-1160.el7.x86_64
> 
>   8.0 -- 4.18.0-80.11.1.el8_0.x86_64 
> 
>   8.1 -- 4.18.0-147.el8.x86_64  
> 
>   8.2 -- 4.18.0-193.el8.x86_64  
> 
>   8.3 -- 4.18.0-240.el8.x86_64
> 
>   8.4 -- 4.18.0-305.el8.x86_64

So this has never worked and we have no customer issues related to this and this device supports SR-IOV, so VF assignment would be a much more typical use case.  Is PF assignment even supported by the vendor for this device?

>     # nc -U /tmp/testserial
> 
>     SeaBIOS (version 1.13.0-2.module+el8.3.0+7353+9de0a3cc)
>     Solarflare Boot Manager (v5.2.2.1006)
>     Solarflare Communications 2008-2019
>     gPXE (http://etherboot.org) - 00:03.0 C000 PCI2.10 PnP
> PMM+BFF90A30+BFED0A30 C000
>     Solarflare starting execution...  

Looks like we're still in the BIOS and executing the PCI option ROM.  Does it improve anything adding "rombar=0" to the vfio-pci device options?  We've seen other cases where the ROM code can access a device in non-standard ways and manage to get an HPA rather than a GPA, but the VM IOVA space prevents access, which could cause an exception in the ROM execution and trigger a reset.  Our choices at that point are to either exclude default use of the ROM on this device or reverse engineer the access to virtualize it like has been done for consumer GPU assignment.

Comment 10 Yanghang Liu 2021-07-19 15:40:39 UTC
Hi Alex,

Thanks for the info.

I am on vacation before July 20 and I will update my test result in the comment as soon as possible once I come back.

Comment 11 Yanghang Liu 2021-07-22 09:40:04 UTC
Hi Alex,

(In reply to Alex Williamson from comment #9)
> (In reply to Yanghang Liu from comment #7)
> > > is this a regression?
> > 
> > 
> > All the following kernel versions of vm can also reproduce this problem.
> > 
> >   7.9 -- 3.10.0-1160.el7.x86_64
> > 
> >   8.0 -- 4.18.0-80.11.1.el8_0.x86_64 
> > 
> >   8.1 -- 4.18.0-147.el8.x86_64  
> > 
> >   8.2 -- 4.18.0-193.el8.x86_64  
> > 
> >   8.3 -- 4.18.0-240.el8.x86_64
> > 
> >   8.4 -- 4.18.0-305.el8.x86_64
> 
> So this has never worked and we have no customer issues related to this and this device supports SR-IOV, so VF assignment would be a much more typical use case.  
> Is PF assignment even supported by the vendor for this device?

I can see the content about "SR-IOV virtualization using KVM  -- PFIOV " in the official SF-103837-CD-28_Solarflare_Server_Adapter_User_Guide  document.

So it seems to me that "PF assignment" is officially supported by the vendor for this device.

If my understanding is not correct, please correct me.

> Does it improve anything adding "rombar=0" to the vfio-pci device options?  

After I add "rombar=0" to the vfio-pci device options,  the vm can be started successfully now.

> We've seen other cases where the ROM code can access a device in non-standard ways and manage to get an HPA rather than a GPA, 
> but the VM IOVA space prevents access, 
> which could cause an exception in the ROM execution and trigger a reset.  
> Our choices at that point are to either exclude default use of the ROM on this device or reverse engineer the access to virtualize it like has been done for consumer GPU assignment.

Comment 12 John Ferlan 2021-09-14 23:57:28 UTC
Bulk update: Move RHEL8 bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 13 Yanghang Liu 2021-10-08 08:27:04 UTC
Hi

Comment 14 Yanghang Liu 2021-10-08 13:44:30 UTC
Hi Laurent,

When I tried to reboot a Q35 + UEFI + MT2892 VF Win2022 vm multiple times,  I found that the vm would also fail to restart and I could also get a lot of qmp log like "{"timestamp": {"seconds": 1624532243, "microseconds": 998053}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}", which is very similar to this bug.


I want to confirm to you if I need to open a separate bug for tracking this problem ?

Comment 15 Yanghang Liu 2021-10-08 13:47:11 UTC
The (In reply to Yanghang Liu from comment #14)
> Hi Laurent,
> 
> When I tried to reboot a Q35 + UEFI + MT2892 VF Win2022 vm multiple times, 
> I found that the vm would also fail to restart and I could also get a lot of
> qmp log like "{"timestamp": {"seconds": 1624532243, "microseconds": 998053},
> "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}", which
> is very similar to this bug.
> 
> 
> I want to confirm to you if I need to open a separate bug for tracking this
> problem ?

The test environment for the problem described in comment 14 is:
host:
qemu-kvm-6.1.0-4.el9.x86_64
5.14.0-5.el9.x86_64
guest:
5.14.0-3.el9.x86_64

Comment 16 Laurent Vivier 2021-10-12 06:06:14 UTC
(In reply to Yanghang Liu from comment #14)
> Hi Laurent,
> 
> When I tried to reboot a Q35 + UEFI + MT2892 VF Win2022 vm multiple times, 
> I found that the vm would also fail to restart and I could also get a lot of
> qmp log like "{"timestamp": {"seconds": 1624532243, "microseconds": 998053},
> "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}", which
> is very similar to this bug.
> 
> 
> I want to confirm to you if I need to open a separate bug for tracking this
> problem ?

In comment #9, Alex explains the problem appears in the BIOS, so we don't care of the OS.
Don't open a new BZ.

Comment 17 Laurent Vivier 2022-02-02 10:15:41 UTC
As this is more a HWE problem than a virt networking one, re-assign BZ to default.

Comment 19 RHEL Program Management 2022-12-24 07:27:48 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 21 Yanghang Liu 2023-03-28 09:20:35 UTC
Machine type 'pc-i440fx-rhel7.6.0' is deprecated on the RHEL9, move the Priority and Severity to low.

Feel free to correct me if the developer has different opinion.

Comment 23 Laszlo Ersek 2023-07-24 12:22:51 UTC
(In reply to Alex Williamson from comment #9)

> Our choices at that point are to either exclude default use of the
> ROM on this device or reverse engineer the access to virtualize it like has
> been done for consumer GPU assignment.

Reverse engineering does not seem justified, given that no progress, or customer relevance, has been demonstrated on this BZ, since the quoted analysis -- which is two years old.

We should just disable the ROM on this device in the VM when it is assigned. That will prevent the VM from being booted off the PF -- but presently the VM cannot boot at all anyway. So it's a faulty device (considering its onboard flash); we can only paper over the bug.

However, I didn't know we could disable the ROM BAR *by default*. I've only known of the <rom enabled='no'/> libvirt element.

... Do you mean "rom_denylist" in QEMU's "hw/vfio/pci-quirks.c" (exposed internally via vfio_opt_rom_in_denylist())?

That list contains { vendor, device } pairs; but comment#0 here does not contain a suitable (i.e., numeric) lspci output, AFAICT.

Yanghang Liu, can you please repeat the "lspci" command with the "-nn" option?

Comment 24 Yanghang Liu 2023-07-24 14:56:56 UTC
(In reply to Laszlo Ersek from comment #23)
> (In reply to Alex Williamson from comment #9)
> 
> > Our choices at that point are to either exclude default use of the
> > ROM on this device or reverse engineer the access to virtualize it like has
> > been done for consumer GPU assignment.
> 
> Reverse engineering does not seem justified, given that no progress, or
> customer relevance, has been demonstrated on this BZ, since the quoted
> analysis -- which is two years old.
> 
> We should just disable the ROM on this device in the VM when it is assigned.
> That will prevent the VM from being booted off the PF -- but presently the
> VM cannot boot at all anyway. So it's a faulty device (considering its
> onboard flash); we can only paper over the bug.
> 
> However, I didn't know we could disable the ROM BAR *by default*. I've only
> known of the <rom enabled='no'/> libvirt element.
> 
> ... Do you mean "rom_denylist" in QEMU's "hw/vfio/pci-quirks.c" (exposed
> internally via vfio_opt_rom_in_denylist())?
> 
> That list contains { vendor, device } pairs; but comment#0 here does not
> contain a suitable (i.e., numeric) lspci output, AFAICT.
> 
> Yanghang Liu, can you please repeat the "lspci" command with the "-nn"
> option?

Hi Laszlo,

Please let me know if I need to provide more info :)

# virsh nodedev-dumpxml pci_0000_1a_00_0
<device>
  <name>pci_0000_1a_00_0</name>
  <path>/sys/devices/pci0000:17/0000:17:00.0/0000:1a:00.0</path>
  <parent>pci_0000_17_00_0</parent>
  <driver>
    <name>sfc</name>
  </driver>
  <capability type='pci'>
    <class>0x020000</class>
    <domain>0</domain>
    <bus>26</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x0a03'>SFC9220 10/40G Ethernet Controller</product>
    <vendor id='0x1924'>Solarflare Communications</vendor>
    <capability type='virt_functions' maxCount='64'/>
    <capability type='vpd'>
      <name>Solarflare Flareon Ultra 8000 Series 10G Adapter</name>
      <fields access='readonly'>
        <change_level>PCBR2:CCSA2</change_level>
        <part_number>SFN8522</part_number>
        <serial_number>852200210000170117100443</serial_number>
        <vendor_field index='0'>8.0.0</vendor_field>
        <vendor_field index='D'>8.0.0</vendor_field>
        <vendor_field index='L'></vendor_field>
        <vendor_field index='A'>0x0000000000000000</vendor_field>
        <vendor_field index='F'>0x0000000000000000</vendor_field>
      </fields>
    </capability>
    <numa node='0'/>
    <pci-express>
      <link validity='cap' port='0' speed='8' width='8'/>
      <link validity='sta' speed='8' width='8'/>
    </pci-express>
  </capability>
</device>

# lspci -nn -s 1a:00.0
1a:00.0 Ethernet controller [0200]: Solarflare Communications SFC9220 10/40G Ethernet Controller [1924:0a03] (rev 02)

Comment 25 Laszlo Ersek 2023-07-24 20:03:37 UTC
Created attachment 1977361 [details]
proposed patch: vfio/pci: hide ROM BAR on SFC9220 10/40G Ethernet Controller PF

Hi Yanghang Liu,

thanks.

Can you please:

(1) reproduce the issue using qemu-kvm-8.0.0-9.el9 (with both BIOS and UEFI VMs),

(2) if the symptom reproduces like that, try <https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=54147766> (again, both BIOS and UEFI VMs)?

Thanks!

(I'm attaching the patch that I'm proposing on top of qemu-kvm-8.0.0-9.el9. I understand that the patch needs to go upstream first, and then be backported; however, I can't repro the symptom / test the patch myself, due to lack of hardware. Therefore, I need to provide a scratch build for QE to test -- in turn, I strongly dislike providing scratch builds without exposing the patches that went into the build. So this is actually going to be a forward port -- backport dance, but it should be trivial, as "hw/vfio/pci-quirks.c" is identical between qemu-kvm-8.0.0-9.el9 and upstream @ current master (885fc169f09f)).

Comment 26 Laszlo Ersek 2023-07-25 11:51:03 UTC
(Setting needinfo for comment 25, just to be sure.)

Comment 27 Yanghang Liu 2023-07-25 14:30:25 UTC
(In reply to Laszlo Ersek from comment #25)
> Created attachment 1977361 [details]
> proposed patch: vfio/pci: hide ROM BAR on SFC9220 10/40G Ethernet Controller
> PF
> 
> Hi Yanghang Liu,
> 
> thanks.
> 
> Can you please:
> 
> (1) reproduce the issue using qemu-kvm-8.0.0-9.el9 (with both BIOS and UEFI
> VMs),
> 
> (2) if the symptom reproduces like that, try
> <https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=54147766>
> (again, both BIOS and UEFI VMs)?
> 
> Thanks!
> 

My test result shows this issue can still be reproduced in the latest RHEL93 environment.

Test env:
host:
5.14.0-341.el9.x86_64
qemu-kvm-8.0.0-8.el9.x86_64
libvirt-9.5.0-3.el9.x86_64
seabios-bin-1.16.1-1.el9.noarch
edk2-ovmf-20230524-2.el9.noarch


Test result:
[1] start a PC + SEABIOS VM with a SFC9220 PF -- FAILED 
[2] start a Q35 + SEABIOS VM with a SFC9220 PF   --- PASS 
[3] start a Q35 + UEFI VM with a SFC9220 PF -- PASS


Test steps to reproduce this issue:

[1] import a PC + Seabios VM with a SFC9220(sfc) PF and then start the VM

# virt-install --machine=pc --noreboot --name=rhel93 --memory=4096 --vcpus=4 --graphics type=vnc,port=5993,listen=0.0.0.0  --network bridge=switch,model=virtio,mac=52:54:00:00:93:93 --import --noautoconsole --disk path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 --osinfo detect=on,require=off --hostdev pci_0000_1a_00_0

# virsh start rhel93


[2] check if the PC + Seabios VM with a SFC9220(sfc) PF can be started

The qemu-kvm keeps throwing the following logs:

 24.801 ! 0x7f1ee003a010 {"timestamp": {"seconds": 1690293740, "microseconds": 739187}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
...
 80.721 ! 0x7f1ee003a010 {"timestamp": {"seconds": 1690293796, "microseconds": 659098}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
...
188.321 ! 0x7f1ee003a010 {"timestamp": {"seconds": 1690293904, "microseconds": 259390}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}

Comment 28 Yanghang Liu 2023-07-25 14:37:18 UTC
(In reply to Yanghang Liu from comment #27)
> (In reply to Laszlo Ersek from comment #25)
> > Created attachment 1977361 [details]
> > proposed patch: vfio/pci: hide ROM BAR on SFC9220 10/40G Ethernet Controller
> > PF
> > 
> > Hi Yanghang Liu,
> > 
> > thanks.
> > 
> > Can you please:
> > 
> > (1) reproduce the issue using qemu-kvm-8.0.0-9.el9 (with both BIOS and UEFI
> > VMs),
> > 
> > (2) if the symptom reproduces like that, try
> > <https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=54147766>
> > (again, both BIOS and UEFI VMs)?
> > 
> > Thanks!
> > 
> 
> My test result shows this issue can still be reproduced in the latest RHEL93
> environment.
> 
> Test env:
> host:
> 5.14.0-341.el9.x86_64
> qemu-kvm-8.0.0-8.el9.x86_64
> libvirt-9.5.0-3.el9.x86_64
> seabios-bin-1.16.1-1.el9.noarch
> edk2-ovmf-20230524-2.el9.noarch
> 
> 
> Test result:
> [1] start a PC + SEABIOS VM with a SFC9220 PF -- FAILED 
> [2] start a Q35 + SEABIOS VM with a SFC9220 PF   --- PASS 
> [3] start a Q35 + UEFI VM with a SFC9220 PF -- PASS
 

After I upgrade to qemu-kvm-8.0.0-9.el9.bz1975776_1.gcc.x86_64, the following three tests all get PASS.

[1] start a PC + SEABIOS VM with a SFC9220 PF -- PASS
[2] start a Q35 + SEABIOS VM with a SFC9220 PF   --- PASS 
[3] start a Q35 + UEFI VM with a SFC9220 PF -- PASS

And I can see the qemu-kvm throws : "warning: Rom loading for device at 0000:1a:00.0 has been disabled due to system instability issues"

Comment 29 Laszlo Ersek 2023-07-26 19:33:17 UTC
Thanks.

I think I got confused earlier by comment 14. I interpreted that comment as "the card / PF breaks on Q35+UEFI too". That's how I worded the commit message for the patch in comment 25. And now I'm surprised to see, in comment 27, that Q35 actually works, regardless of the guest firmware?

... But the solution to this riddle is that I didn't read comment 14 carefully enough. Comment 14 actually refers to a *different* (vendor, product) pair: comment 14 says "MT2892 VF". That seems to be (a) a different NIC, and (b) a VF, not a PF.

So if we want to handle both issues at the same time, in this BZ (see comment 16), then I need the PCI vendor/product IDs for "MT2892 VF" as well.

(Also I'll have to reword the commit message -- make it clear that the problem with the SFC9220 PF is specific to i440fx!)

... Well, thinking further about this, given that the SFC9220 PF option ROM does not break on Q35 (with either guest firmware -- SeaBIOS or UEFI), then deny-listing the PF altogether for ROM loading in the guest may not be the best approach. It allows i440fx to boot (no crash), but it prevents q35 from *netbooting* via the PF!

I now wonder if we should at best document this issue (i.e., recommend <rom enabled='no'/> on i440fx, for this PF).

Comment 30 RHEL Program Management 2023-07-28 07:28:13 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 36 Laszlo Ersek 2023-08-08 14:37:50 UTC
The solarflare card (1924:0a03) does not cause a problem during UEFI boot because its expansion ROM does not contain a UEFI driver. :)

Using Alex's rom-parser utility:

> Valid ROM signature found @0h, PCIR offset 20h
>         PCIR: type 0 (x86 PC-AT), vendor: 1924, device: 0a03, class: 000002
>         PCIR: revision 3, vendor revision: 1
>         Last image

It only contains a legacy BIOS image.

Furthermore, running "strings" on the ROM shows it's a gPXE derivative:

> ...
> Solarflare Boot Manager (v5.2.2.1006)
> Solarflare Communications 2008-2019
> gPXE (http://etherboot.org) - 
>  PCI
>  PnP
>  PMM
>  INT19
> Press Ctrl-B to configure 
> ...

So on i440fx / SeaBIOS, it crashes likely due to an old gPXE bug.

On any board / UEFI, it does not crash because there is no UEFI driver image in the expansion ROM.

On q35 / SeaBIOS, it does not crash likely because the E820 memory map differs from the one on i440fx, and so the gPXE bug is not tickled.

I think it's fine to disable the oprom by default, and now we see the trade-off clearly.

Comment 37 Laszlo Ersek 2023-08-08 15:00:50 UTC
[PATCH] vfio/pci: hide ROM BAR on SFC9220 10/40G Ethernet Controller PF
Message-Id: <20230808145916.81657-1-lersek>

Comment 39 Laszlo Ersek 2023-08-09 10:04:29 UTC
Here's the reboot loop symptom a bit more closely described (with i440fx/seabios):

(1) The SeaBIOS log portion leading up to the reboot reads like this:

> Scan for option roms
> Running option rom at ca00:0003
> pmm call arg1=1
> pmm call arg1=0
> pmm call arg1=1
> pmm call arg1=0
> In resume (status=0)
> In 32bit resume
> Attempting a hard reboot

So the oprom on the SFC9220 does something that causes control to jump to the reset vector. In turn, SeaBIOS runs

> void VISIBLE32FLAT
> handle_resume32(int status)
> {
>     ASSERT32FLAT();
>     dprintf(1, "In 32bit resume\n");
> 
>     if (status == 0xfe)
>         s3_resume();
> 
>     // Must be a soft reboot - invoke a hard reboot.
>     tryReboot();
> }

and tryReboot() is reached (and it succeeds, too) because status is clearly not 0xFE -- this is not an actual S3 resume.

Meanwhile the graphical display shows:

> Solarflare Boot Manager (v5.2.2.1006)
> Solarflare Communcations 2008-2019
> gPXE (http://etherboot.org) - 00:05.0 CA00 PCI2.10 PnP PMM+5EFD1230+5EF11230 CA00
> Solarflare starting execution...
> [reboot]

This is with the SFC attached to the root bus -- you can see 00:05.0 above already, but here's parts of the domain XML for more completeness:

    <controller type='pci' index='0' model='pci-root'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x1a' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>

Comment 40 Laszlo Ersek 2023-08-09 11:07:47 UTC
Here's an attempt plugging the SFC9220 PF into a different (extra) root
bus:

    <controller type='pci' index='0' model='pci-root'/>
    <controller type='pci' index='1' model='pci-expander-bus'>
      <model name='pxb'/>
      <target busNr='128'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x1a' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x07' function='0x0'/>
    </hostdev>

This changes the logged PCI address to 81:07.0, and the crash / reboot
persists.

*However*. If I change the *slot* number to zero (i.e., the guest
enumerates the assigned PF as "81:00.0"), then things work! Grub and the
OS are reached just fine.

In response to the patch (comment 37), Alex wrote:

> ROMs sometimes take shortcuts around the standard interfaces to the
> device and can therefore hit gaps in the virtualization, which is why
> that's suspect to me.  However if it works on q35 but not 440fx it
> might be more that we're not matching a PCI topology expectation of
> the ROM.  Was it only tested on 440fx attached to the root bus or does
> it also fail if the PF is attached downstream of a PCI-to-PCI bridge?

But now it seems that the decisive factor is not whether the PF is
attached to the VM's root bus vs. a non-root bus. What seems to decide
instead is the *slot number* on whatever bus the PF is seen on, in the
guest. The oprom seems to insist that the slot number be zero.

These two "candidate" requirements are not easy to tell apart. Namely,
on q35, we "naturally" satisfy the apparent "slot == 0" requirement.
While on the root complex, slot 0 is taken by the MCH (memory controller
hub) / DRAM Controller, so we could not assign the PF there even if we
wanted to, in practice we never try placig the PF on the root complex;
instead we always cold-plug it in a PCIe root port. And the "bridge" of
any PCIe root port only ever enables use of slot#0 -- PCIe ARI may be an
exception to this, but I'm rusty on that, and it's not relevant here --;
so the default libvirt placement strategy on q35 satisfies this hidden
requirement of the oprom.

On "pc", things are different; by default the PF is assigned to the root
bus, but to a nonzero slot (slot#0 is taken by the i440FX host bridge).
So by default we break the hidden slot#0 requirement of the oprom. Once
we move the PF to an extra root bus (using the pxb expander bridge), we
can test both zero and nonzero slot numbers, while still not being
"downstream" of anything, and this shows that slot#0 is what matters to
the ROM. (Possibly a vendor expectation that the NIC be plugged into a
PCIe root port.)

Next I'll test both zero and nonzero slot numbers, but not on an extra
root bus (pxb) -- I'll use a "pci-bridge" model.

Comment 41 Laszlo Ersek 2023-08-09 12:02:24 UTC
Wow, new discovery -- which is only a "discovery" because my memory on
PCI is super rusty :/

Slot#0 is *only* usable on pxb because we disable the SHPC (standard hot
plug controller) on it by default [1] [2]. (That's funny because I had
written those QEMU patches myself, in 2015, but I've totally forgotten
about them.)

[1] qemu commit 4e5c9bfecf5d
    ("hw/pci-bridge: introduce "shpc" property", 2015-06-23)

[2] qemu commit d10dda2d60c8
    ("hw/pci-bridge: disable SHPC in PXB", 2015-06-23)

Libvirt is aware of this -- see "bus->minSlot = 0" in [3].

[3] libvirt commit 52f3d0a4d2de
    ("conf: new pci controller model pci-expander-bus", 2016-04-14)

But that's not the (default) case on a normal "pci-bridge". There the
SHPC is supposed to be enabled by default. (We tried disabling it too
[4], but had to revert it [5]!)

[4] qemu commit dc0ae767700c
    ("hw/pci: disable pci-bridge's shpc by default", 2017-02-01)

[5] qemu commit 2fa356629ed2
    ("Revert "hw/pci: disable pci-bridge's shpc by default"", 2017-05-18)

Therefore, even *libvirt* doesn't allow me to assign the PF to slot#0 on
a non-root PCI bridge:

> error: XML error: Invalid PCI address 0000:01:00.0. slot must be >= 1

And "slot >= 1" does not satisfy the oprom -- the following
configuration (PF -> 01:01.0) triggers the crash:

    <controller type='pci' index='0' model='pci-root'/>
    <controller type='pci' index='1' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x1a' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
    </hostdev>

SUMMARY:

- For the SFC9220 PF option ROM (BIOS image) to work, we must plug the
  card in slot #0 (on any bus).

- On q35, this is trivial to satisfy, and libvirt satisfies it by
  default -- plug the PF in any pcie-root-port.

- On pc, the requirement is difficult to satisfy -- on root bus 0,
  slot#0 is taken by the i440FX host bridge, and on pci-bridge devices,
  slot#0 is taken by the SHPC (which QEMU allows the user to disable,
  but libvirt doesn't). Therefore the only solution is to configure a
  pxb / pci-expander-bus (which does not come with an SHPC), and plug
  the PF in slot#0 on the extra root bus that's provided by the pxb.

Here's a working example for the last bullet (using the "pc" machine
type) -- this results in the PF having PCI B/D/F "ff:00.0" in the guest:

    <controller type='pci' index='0' model='pci-root'/>

    <controller type='pci' index='1' model='pci-expander-bus'>
      <model name='pxb'/>
      <target busNr='254'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x1a' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </hostdev>

Comment 42 Laszlo Ersek 2023-08-09 12:04:15 UTC
So there's no need to patch QEMU. The bug is in the option ROM. We can work it around -- but where do we document the workaround?


Note You need to log in before you can comment on or make changes to this bug.