RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2203094 - Add more than 17 pcie-root-ports, display Out Of Resource
Summary: Add more than 17 pcie-root-ports, display Out Of Resource
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: edk2
Version: 9.3
Hardware: aarch64
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Gerd Hoffmann
QA Contact: Zhenyu Zhang
URL:
Whiteboard:
Depends On:
Blocks: 2024818
TreeView+ depends on / blocked
 
Reported: 2023-05-11 08:53 UTC by Zhenyu Zhang
Modified: 2023-11-07 09:06 UTC (History)
16 users (show)

Fixed In Version: edk2-20230524-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-07 08:24:29 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-156933 0 None None None 2023-05-11 09:00:15 UTC
Red Hat Product Errata RHSA-2023:6330 0 None None None 2023-11-07 08:25:07 UTC

Description Zhenyu Zhang 2023-05-11 08:53:33 UTC
Description of problem:
Add more than 17 pcie-root-ports, display Out Of Resourc

Version-Release number of selected component (if applicable):
hostkernel: 5.14.0-306.el9.aarch64
qemu: qemu-kvm-8.0.0-1.el9
edk2: edk2-aarch64-20230301gitf80f052277c8-2.el9.noarch

How reproducible:
100%

Steps to Reproduce:
1. boot qemu with more than 17 pcie-root-ports
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-blockdev '{"node-name": "file_aavmf_code", "driver": "file", "filename": "/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw", "auto-read-only": true, "discard": "unmap"}' \
-blockdev '{"node-name": "drive_aavmf_code", "driver": "raw", "read-only": true, "file": "file_aavmf_code"}' \
-blockdev '{"node-name": "file_aavmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-aarch64-virtio-scsi_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' \
-blockdev '{"node-name": "drive_aavmf_vars", "driver": "raw", "read-only": false, "file": "file_aavmf_vars"}' \
-machine virt,gic-version=host,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \
-device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1.0x0", "chassis": 1}' \
-nodefaults \
-device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
-cpu 'host' \
-device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
-device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
-device '{"id": "pcie-root-port-4", "port": 4, "driver": "pcie-root-port", "addr": "0x1.0x4", "bus": "pcie.0", "chassis": 5}' \
-enable-kvm \
-device '{"id": "pcie-root-port-5", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x2.0x0", "chassis": 6}' \
-device '{"id": "pcie-root-port-6", "driver": "pcie-root-port", "addr": "0x2.0x1", "bus": "pcie.0", "chassis": 7}' \
-device '{"id": "pcie-root-port-7", "driver": "pcie-root-port", "addr": "0x2.0x2", "bus": "pcie.0", "chassis": 8}' \
-device '{"id": "pcie-root-port-8", "driver": "pcie-root-port", "addr": "0x2.0x3", "bus": "pcie.0", "chassis": 9}' \
-device '{"id": "pcie-root-port-9", "driver": "pcie-root-port", "addr": "0x2.0x4", "bus": "pcie.0", "chassis": 10}' \
-device '{"id": "pcie-root-port-10", "driver": "pcie-root-port", "addr": "0x2.0x5", "bus": "pcie.0", "chassis": 11}' \
-device '{"id": "pcie-root-port-11", "driver": "pcie-root-port", "addr": "0x2.0x6", "bus": "pcie.0", "chassis": 12}' \
-device '{"id": "pcie-root-port-12", "driver": "pcie-root-port", "addr": "0x2.0x7", "bus": "pcie.0", "chassis": 13}' \
-device '{"id": "pcie-root-port-13", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3.0x0", "chassis": 14}' \
-device '{"id": "pcie-root-port-14", "driver": "pcie-root-port", "addr": "0x3.0x1", "bus": "pcie.0", "chassis": 15}' \
-device '{"id": "pcie-root-port-15", "driver": "pcie-root-port", "addr": "0x3.0x2", "bus": "pcie.0", "chassis": 16}' \
-device '{"id": "pcie-root-port-16", "driver": "pcie-root-port", "addr": "0x3.0x3", "bus": "pcie.0", "chassis": 17}' \
-serial stdio

UEFI firmware (version edk2-20230301gitf80f052277c8-2.el9 built at 00:00:00 on Apr  5 2023)
Tpm2SubmitCommand - Tcg2 - Not Found
Tpm2GetCapabilityPcrs fail!
Tpm2SubmitCommand - Tcg2 - Not Found
Out Of Resource!
Call PciHostBridgeResourceConflict().
PciHostBridge: Resource conflict happens!
RootBridge[0]:
 I/O: Length/Alignment = 0x11000 / 0xFFF
 Mem: Length/Alignment = 0x2400000 / 0x1FFFFF
     Granularity/SpecificFlag = 32 / 00
 Mem: Length/Alignment = 0x0 / 0x0
     Granularity/SpecificFlag = 32 / 06 (Prefetchable)
 Mem: Length/Alignment = 0x100000 / 0xFFFFF
     Granularity/SpecificFlag = 64 / 00
 Mem: Length/Alignment = 0x0 / 0x0
     Granularity/SpecificFlag = 64 / 06 (Prefetchable)
 Bus: Length/Alignment = 0x13 / 0x0
PciBus: [00|01|00] was rejected due to resource confliction.
Out Of Resource!
Call PciHostBridgeResourceConflict().
PciHostBridge: Resource conflict happens!
RootBridge[0]:
 I/O: Length/Alignment = 0x11000 / 0xFFF
 Mem: Length/Alignment = 0x2400000 / 0x1FFFFF
     Granularity/SpecificFlag = 32 / 00
 Mem: Length/Alignment = 0x0 / 0x0
     Granularity/SpecificFlag = 32 / 06 (Prefetchable)
 Mem: Length/Alignment = 0x100000 / 0xFFFFF
     Granularity/SpecificFlag = 64 / 00
 Mem: Length/Alignment = 0x0 / 0x0
     Granularity/SpecificFlag = 64 / 06 (Prefetchable)
 Bus: Length/Alignment = 0x13 / 0x0
PciBus: [01|00|00] was rejected due to resource confliction.


Expected results:
no error display

Comment 1 Zhenyu Zhang 2023-05-11 08:57:34 UTC
Hello xuwei,

Could you try it on x86?

Thanks
Zhenyu

Comment 2 Eric Auger 2023-05-11 09:13:01 UTC
Hi Zhenyu, is it a regression?

Comment 3 Xueqiang Wei 2023-05-11 09:22:30 UTC
(In reply to Zhenyu Zhang from comment #1)
> Hello xuwei,
> 
> Could you try it on x86?
> 
> Thanks
> Zhenyu

Tested related test cases, not hit it on x86.

Versions:
kernel-5.14.0-306.el9.x86_64
qemu-kvm-8.0.0-2.el9
edk2-ovmf-20230301gitf80f052277c8-3.el9.noarch

1. multi_disk_random_hotplug.single_type - the results were passed.
Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/multi_disk_random_hotplug.single_type/results.html

2. multi_disk - the results were passed.
Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/multi_disk_with_edk2-20230301gitf80f052277c8-3.el9/results.html

Comment 4 Zhenyu Zhang 2023-05-11 09:37:58 UTC
(In reply to Eric Auger from comment #2)
> Hi Zhenyu, is it a regression?

Not sure, I need to do some testing. 
Will update the results later. 
Also, I saw a similar bug on x86 not sure if related.
Bug 2024605 - 6.1.0 introduces regression in q35, unable to add more than 15 pcie-root-ports

Comment 5 Zhenyu Zhang 2023-05-11 11:47:14 UTC
Update Test Results:
on rhel9:
qemu-kvm-7.2.0-10.el9 ----> hit
qemu-kvm-7.1.0-2.el9 ----> hit
qemu-kvm-7.0.0-1.el9 ----> hit 

edk2-aarch64-20221207gitfff6d81270b5-7.el9.noarch ----> hit
edk2-aarch64-20221207gitfff6d81270b5-1.el9.noarch ----> hit
edk2-aarch64-20220221gitb24306f15d-1.el9.noarch ----> hit

on rhel8:
edk2-aarch64-20220126gitbb1bba3d77-5.el8.noarch ----> hit
qemu-kvm-6.2.0-32.module+el8.8.0+18361+9f407f6e ----> hit


Hi Eric,

So it's not a regression,
It itself does not affect loading the kernel into the system. 
It just keeps showing an error in serial. So the priority is low.
It was discovered recently in our development of new automation cases.


/usr/libexec/qemu-kvm \
.......
.......
-serial stdio
VNC server running on ::1:5900
UEFI firmware starting.
Tpm2SubmitCommand - Tcg2 - Not Found
Tpm2GetCapabilityPcrs fail!
Tpm2SubmitCommand - Tcg2 - Not Found
Out Of Resource!
Call PciHostBridgeResourceConflict().
PciHostBridge: Resource conflict happens!
RootBridge[0]:
 I/O: Length/Alignment = 0x11000 / 0xFFF
 Mem: Length/Alignment = 0x2300000 / 0x1FFFFF
     Granularity/SpecificFlag = 32 / 00
 Mem: Length/Alignment = 0x0 / 0x0
     Granularity/SpecificFlag = 32 / 06 (Prefetchable)
 Mem: Length/Alignment = 0x0 / 0x0
     Granularity/SpecificFlag = 64 / 00
 Mem: Length/Alignment = 0x0 / 0x0
     Granularity/SpecificFlag = 64 / 06 (Prefetchable)
 Bus: Length/Alignment = 0x12 / 0x0
PciBus: [00|01|00] was rejected due to resource confliction.
BdsDxe: failed to load Boot0001 "Red Hat Enterprise Linux" from HD(1,GPT,1EC03478-E1A9-4CE9-8CC6-AA8593426FD7,0x800,0x12C000)/\EFI\redhat\shimaa64.efi: Not Found
BdsDxe: loading Boot0007 "EFI Internal Shell" from Fv(64074AFE-340A-4BE6-94BA-91B5B4D0F71E)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
BdsDxe: starting Boot0007 "EFI Internal Shell" from Fv(64074AFE-340A-4BE6-94BA-91B5B4D0F71E)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
Mapping table
     BLK0: Alias(s):
          VenHw(93E34C7E-B50E-11DF-9223-2443DFD72085,00)
Press ESC in 1 seconds to skip startup.nsh or any other key to continue.
Shell>

Comment 6 Eric Auger 2023-05-11 13:38:19 UTC
This is output by EDK2. Adding Gerd and Stefen in CC.

Comment 7 Gerd Hoffmann 2023-05-12 09:36:34 UTC
> It itself does not affect loading the kernel into the system. 
> It just keeps showing an error in serial. So the priority is low.

So there are no bad effects?  Good.

There is io address space for 16 pcie root ports.  So in case
you plug in more there isn't enough address space, and edk2
goes start disabling the (optional) io bridge window in pcie
root ports until everything fits.

That process produces some debug logging ...

Comment 8 Zhenyu Zhang 2023-05-15 03:42:49 UTC
(In reply to Gerd Hoffmann from comment #7)
> So there are no bad effects?  Good.
> 
> There is io address space for 16 pcie root ports.  So in case
> you plug in more there isn't enough address space, and edk2
> goes start disabling the (optional) io bridge window in pcie
> root ports until everything fits.
> 
> That process produces some debug logging ...

Hello Hoffmann,
 
Update test results, 
yes when it boot, it just shows some debug records. 
But when we use these devices to do 17 disks hotplugged, I encountered the following error.
However, on the x86 platform, no error will be reported when inserting 200 pcie-root-ports, 
and all 64 disk hot-plugs are successful.

2023-05-14 23:05:29: [   15.709868] sd 16:0:0:0: Attached scsi generic sg17 type 0
2023-05-14 23:05:29: [   15.713312] sd 13:0:0:0: [sdo] Attached SCSI disk
2023-05-14 23:05:29: [   15.763182] sd 16:0:0:0: [sdr] Attached SCSI disk
2023-05-14 23:05:29: [   15.763238] sd 15:0:0:0: [sdq] Attached SCSI disk
2023-05-14 23:05:51: [   37.922897] sd 0:0:0:0: LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments.
2023-05-14 23:05:51: [   37.973160] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
2023-05-14 23:05:52: [   38.982576] sd 1:0:0:0: [sdc] Synchronizing SCSI cache
2023-05-14 23:05:52: [   38.983015] sd 1:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
2023-05-14 23:05:53: [   39.915822] pcieport 0000:00:02.0: pciehp: Slot(0-5): Attention button pressed
2023-05-14 23:05:53: [   39.915826] pcieport 0000:00:02.0: pciehp: Slot(0-5): Powering off due to button press
2023-05-14 23:06:02: [   48.482522] sd 0:0:0:0: [sda] tag#164 abort
2023-05-14 23:06:04: [   51.063157] sd 2:0:0:0: [sdd] Synchronizing SCSI cache
2023-05-14 23:06:04: [   51.063294] sd 2:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
2023-05-14 23:06:05: [   51.980565] pcieport 0000:00:02.1: pciehp: Slot(0-6): Attention button pressed
2023-05-14 23:06:05: [   51.980569] pcieport 0000:00:02.1: pciehp: Slot(0-6): Powering off due to button press
2023-05-14 23:06:12: [   58.722541] sd 0:0:0:0: [sda] tag#163 abort
2023-05-14 23:06:17: [   63.112552] sd 3:0:0:0: [sde] Synchronizing SCSI cache
2023-05-14 23:06:17: [   63.112877] sd 3:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
2023-05-14 23:06:17: [   64.041353] pcieport 0000:00:02.2: pciehp: Slot(0-7): Attention button pressed
2023-05-14 23:06:17: [   64.041358] pcieport 0000:00:02.2: pciehp: Slot(0-7): Powering off due to button press
2023-05-14 23:06:22: [   68.962537] sd 0:0:0:0: [sda] tag#162 abort
2023-05-14 23:06:29: [   75.172897] sd 4:0:0:0: [sdf] Synchronizing SCSI cache
2023-05-14 23:06:29: [   75.173032] sd 4:0:0:0: [sdf] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
2023-05-14 23:06:29: [   76.103832] pcieport 0000:00:02.3: pciehp: Slot(0-8): Attention button pressed
2023-05-14 23:06:30: [   76.103838] pcieport 0000:00:02.3: pciehp: Slot(0-8): Powering off due to button press
2023-05-14 23:06:33: [   79.202535] sd 0:0:0:0: [sda] tag#161 abort
2023-05-14 23:06:41: [   87.252566] sd 5:0:0:0: [sdg] Synchronizing SCSI cache
2023-05-14 23:06:41: [   87.253250] sd 5:0:0:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
2023-05-14 23:06:42: [   88.164881] pcieport 0000:00:02.4: pciehp: Slot(0-9): Attention button pressed
2023-05-14 23:06:42: [   88.164885] pcieport 0000:00:02.4: pciehp: Slot(0-9): Powering off due to button press
2023-05-14 23:06:43: [   89.442533] sd 0:0:0:0: [sda] tag#160 abort
2023-05-14 23:06:53: [   99.302555] sd 6:0:0:0: [sdh] Synchronizing SCSI cache
2023-05-14 23:06:53: [   99.303175] sd 6:0:0:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
2023-05-14 23:06:53: [   99.682529] sd 0:0:0:0: [sda] tag#159 abort
2023-05-14 23:06:54: [  100.224569] pcieport 0000:00:02.5: pciehp: Slot(0-10): Attention button pressed
2023-05-14 23:06:54: [  100.224573] pcieport 0000:00:02.5: pciehp: Slot(0-10): Powering off due to button press
2023-05-14 23:07:03: [  109.922537] sd 0:0:0:0: [sda] tag#158 abort
2023-05-14 23:07:05: [  111.362582] sd 7:0:0:0: [sdi] Synchronizing SCSI cache
2023-05-14 23:07:05: [  111.363175] sd 7:0:0:0: [sdi] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
2023-05-14 23:07:06: [  112.282193] pcieport 0000:00:02.6: pciehp: Slot(0-11): Attention button pressed
2023-05-14 23:07:06: [  112.282198] pcieport 0000:00:02.6: pciehp: Slot(0-11): Powering off due to button press
2023-05-14 23:07:14: [  120.162544] sd 0:0:0:0: [sda] tag#157 abort
2023-05-14 23:07:14: [  120.163048] sd 0:0:0:0: device reset
2023-05-14 23:07:17: [  123.392555] sd 8:0:0:0: [sdj] Synchronizing SCSI cache
2023-05-14 23:07:17: [  123.392799] sd 8:0:0:0: [sdj] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
2023-05-14 23:07:18: [  124.338456] pcieport 0000:00:02.7: pciehp: Slot(0-12): Attention button pressed
2023-05-14 23:07:18: [  124.338465] pcieport 0000:00:02.7: pciehp: Slot(0-12): Powering off due to button press
2023-05-14 23:07:24: [  130.402548] sd 0:0:0:0: [sda] tag#164 abort
2023-05-14 23:07:24: [  130.403473] sd 0:0:0:0: Device offlined - not ready after error recovery
2023-05-14 23:07:24: [  130.403475] sd 0:0:0:0: Device offlined - not ready after error recovery
2023-05-14 23:07:24: [  130.403476] sd 0:0:0:0: Device offlined - not ready after error recovery
2023-05-14 23:07:24: [  130.403477] sd 0:0:0:0: Device offlined - not ready after error recovery
2023-05-14 23:07:24: [  130.403478] sd 0:0:0:0: Device offlined - not ready after error recovery
2023-05-14 23:07:24: [  130.403479] sd 0:0:0:0: Device offlined - not ready after error recovery
2023-05-14 23:07:24: [  130.403480] sd 0:0:0:0: Device offlined - not ready after error recovery
2023-05-14 23:07:24: [  130.403481] sd 0:0:0:0: Device offlined - not ready after error recovery
2023-05-14 23:07:24: [  130.403482] sd 0:0:0:0: Device offlined - not ready after error recovery
2023-05-14 23:07:24: [  130.403517] sd 0:0:0:0: [sda] tag#164 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=92s
2023-05-14 23:07:24: [  130.403520] sd 0:0:0:0: [sda] tag#164 CDB: Write(10) 2a 00 02 0c d6 e0 00 00 10 00
2023-05-14 23:07:24: [  130.403521] I/O error, dev sda, sector 34395872 op 0x1:(WRITE) flags 0x100000 phys_seg 2 prio class 2
2023-05-14 23:07:24: [  130.403554] sd 0:0:0:0: rejecting I/O to offline device
2023-05-14 23:07:24: gnome-initial-setup-first-login.service: Failed to execute /usr/libexec/gnome-initial-setup: Input/output error[  130.403556] I/O error, dev sda, sector 9390040 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2
2023-05-14 23:07:24: 
2023-05-14 23:07:24: [  130.403559] I/O error, dev sda, sector 4196352 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
2023-05-14 23:07:24: [  130.403559] I/O error, dev sda, sector 9389856 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 2
2023-05-14 23:07:24: [  130.403560] I/O error, dev sda, sector 34398928 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2
2023-05-14 23:07:24: [  130.403564] I/O error, dev sda, sector 17938528 op 0x1:(WRITE) flags 0x100000 phys_seg 5 prio class 2
2023-05-14 23:07:24: [  130.403571] I/O error, dev sda, sector 9390016 op 0x1:(WRITE) flags 0x100000 phys_seg 2 prio class 2
2023-05-14 23:07:24: [  130.403572] I/O error, dev sda, sector 18088760 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
2023-05-14 23:07:24: [  130.403572] I/O error, dev sda, sector 27761648 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 2
2023-05-14 23:07:24: [  130.403572] I/O error, dev sda, sector 26660592 op 0x0:(READ) flags 0x80700 phys_seg 31 prio class 2
2023-05-14 23:07:24: [  130.403580] Buffer I/O error on dev sda1, logical block 1, lost async page write
2023-05-14 23:07:24: [  130.403591] dm-0: writeback error on inode 50782590, offset 0, sector 26871520
2023-05-14 23:07:24: [  130.403595] dm-0: writeback error on inode 1865612, offset 0, sector 1865688
2023-05-14 23:07:24: [  130.403597] dm-0: writeback error on inode 18587912, offset 0, sector 10414176
2023-05-14 23:07:24: [  130.403599] dm-0: writeback error on inode 18587912, offset 110592, sector 10414184
2023-05-14 23:07:24: [  130.403601] dm-0: writeback error on inode 18587912, offset 684032, sector 10414192
2023-05-14 23:07:24: [  130.403602] dm-0: writeback error on inode 18587912, offset 950272, sector 10414200
2023-05-14 23:07:24: [  130.403602] dm-0: writeback error on inode 50783293, offset 204800, sector 26873864
2023-05-14 23:07:24: [  130.403604] dm-0: writeback error on inode 18587912, offset 1298432, sector 10414208
2023-05-14 23:07:24: [  130.403606] dm-0: writeback error on inode 1865600, offset 0, sector 1865664
2023-05-14 23:07:24: [  130.403606] dm-0: writeback error on inode 50783295, offset 0, sector 1848696
2023-05-14 23:07:24: [  130.408500] XFS (dm-0): metadata I/O error in "xfs_buf_ioend_handle_error+0x11c/0x470 [xfs]" at daddr 0x190bc20 len 32 error 5
2023-05-14 23:07:24: [  130.408607] XFS (dm-0): log I/O error -5
2023-05-14 23:07:24: [  130.408662] XFS (dm-0): Log I/O Error (0x2) detected at xlog_ioend_work+0x84/0x90 [xfs] (fs/xfs/xfs_log.c:1378).  Shutting down filesystem.
2023-05-14 23:07:24: [  130.408715] XFS (dm-0): Please unmount the filesystem and rectify the problem(s)
2023-05-14 23:07:24: [  130.417908] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
2023-05-14 23:07:24: [  130.422455] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
2023-05-14 23:07:24: [  130.422932] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
2023-05-14 23:07:24: [  130.423185] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
2023-05-14 23:07:24: [  130.450549] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
2023-05-14 23:07:24: [[0;1;31mFAILED[0m] Failed to start [0;1;39mRecord Runlevel Change in UTMP[0m.
2023-05-14 23:07:24: [  130.576100] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
2023-05-14 23:07:24: [  130.576241] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed


Detailed log on aarch64:
http://10.0.136.47/zhenyzha/multi_disk_hotplug/test-results/1-Host_RHEL.m9.u3.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.aarch64.page_4k.io-github-autotest-qemu.multi_disk_random_hotplug.single_type.separated_bus.arm64-pci/

Pass log on x86_64:
http://virtqetools.lab.eng.pek2.redhat.com/autotest_static_job_log/7783034/test-results/4-Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.3.0.x86_64.io-github-autotest-qemu.multi_disk_random_hotplug.single_type.separated_bus.q35/

Comment 9 Gerd Hoffmann 2023-05-15 12:00:48 UTC
Can you try this build?
https://kojihub.stream.centos.org/koji/taskinfo?taskID=2216516

Comment 10 Zhenyu Zhang 2023-05-16 05:48:15 UTC
(In reply to Gerd Hoffmann from comment #9)
> Can you try this build?
> https://kojihub.stream.centos.org/koji/taskinfo?taskID=2216516

Hello Hoffmann,

Thanks for the quick response,
When I used your build, the above "Out Of Resourc" message disappeared.
But when I do disks hotplugged, I still get I/O error.
It looks like these are two different issues.

About the I/O error, I found a bug you dealt with on x86 before.
Bug 1949813 - Some disks are not hotpluged in guest when hotplug many disks 
It seems that on x86 they also encounter this error when hotplugging 90 disks in the 200 virtio-scsi controller.
But this value is much smaller on aarch64.
We hit I/O error hen hotplugging 17 disks in the 17 virtio-scsi.
Do you think it is necessary for us to follow up on the second question? Or is this a product feature of arm?

# rpm -qa | grep edk2
edk2-aarch64-20230301gitf80f052277c8-3.el9.bz2174749.20230515.1346.noarch

Same cmd as comment0
/usr/libexec/qemu-kvm \
......
-serial stdio
VNC server running on ::1:5900
UEFI firmware (version edk2-20230301gitf80f052277c8-3.el9.bz2174749.20230515.1346 built at 00:00:00Tpm2SubmitCommand - Tcg2 - Not Found
Tpm2GetCapabilityPcrs fail!
Tpm2SubmitCommand - Tcg2 - Not Found
BdsDxe: failed to load Boot0008 "Red Hat Enterprise Linux" from HD(1,GPT,758811EC-AD57-429F-821F-646B45E3D060,0x800,0x12C000)/\EFI\redhat\shimaa64.efi: Not Found
BdsDxe: loading Boot0006 "EFI Internal Shell" from Fv(64074AFE-340A-4BE6-94BA-91B5B4D0F71E)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
BdsDxe: starting Boot0006 "EFI Internal Shell" from Fv(64074AFE-340A-4BE6-94BA-91B5B4D0F71E)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
map: No mapping found.
Press ESC in 1 seconds to skip startup.nsh or any other key to continue.
Shell>

Comment 11 Gerd Hoffmann 2023-05-16 06:54:38 UTC
> Thanks for the quick response,
> When I used your build, the above "Out Of Resourc" message disappeared.

Good.

> But when I do disks hotplugged, I still get I/O error.
> It looks like these are two different issues.

Indeed, and it also doesn't look firmware related.  Please open a new bug for that one.

Comment 12 Gerd Hoffmann 2023-05-16 09:55:21 UTC
> > When I used your build, the above "Out Of Resourc" message disappeared.
> 
> Good.

Patch posted upstream.
https://edk2.groups.io/g/devel/message/104919

Comment 13 Zhenyu Zhang 2023-05-16 11:57:47 UTC
(In reply to Gerd Hoffmann from comment #11)
> 
> Indeed, and it also doesn't look firmware related.  Please open a new bug
> for that one.

Hello Hoffmann,

I created a new bug to track the I/O error issue.
Bug 2207634 - Multiple hot-plug/hot-unplug virtio-scsi disks operations hit core dump

Comment 14 Zhenyu Zhang 2023-05-17 07:56:26 UTC
(In reply to Gerd Hoffmann from comment #12)
> > > When I used your build, the above "Out Of Resourc" message disappeared.
> > 
> > Good.
> 
> Patch posted upstream.
> https://edk2.groups.io/g/devel/message/104919

Hello Hoffmann,

I see that your code changes seem to be on all platforms, 
do I need to change bug Hardware to all?

Thanks in advance
Zhenyu

Comment 16 Gerd Hoffmann 2023-05-24 07:07:30 UTC
> I see that your code changes seem to be on all platforms, 
> do I need to change bug Hardware to all?

As you observe the behavior on aarch64 only leave it to aarch64.

Can you try the scratch build here?
https://bugzilla.redhat.com/show_bug.cgi?id=2174749#c18

Comment 17 Gerd Hoffmann 2023-05-24 07:09:36 UTC
> Can you try the scratch build here?
> https://bugzilla.redhat.com/show_bug.cgi?id=2174749#c18

Oh, we already had a scratch build with that patch (comment 9).
No need to test again then.

Comment 18 Zhenyu Zhang 2023-05-25 02:16:00 UTC
(In reply to Gerd Hoffmann from comment #17)
> > Can you try the scratch build here?
> > https://bugzilla.redhat.com/show_bug.cgi?id=2174749#c18
> 
> Oh, we already had a scratch build with that patch (comment 9).
> No need to test again then.

Ok, I see.
Thanks for the feedback :)

Comment 19 Gerd Hoffmann 2023-06-01 08:33:26 UTC
> Patch posted upstream.
> https://edk2.groups.io/g/devel/message/104919

merged now as commit 27727338b2c0e3f50eb0176a1044e903fcb3c3b1
(after edk2-stable202305, so this needs backporting, to be done after rebase).

Comment 22 Zhenyu Zhang 2023-06-28 02:59:57 UTC
When I used edk2-20230524-1.el9, the above "Out Of Resourc" message disappeared.
So set Verified tested.

Version-Release number of selected component (if applicable):
hostkernel: 5.14.0-331.el9.aarch64+64k
qemu: qemu-kvm-8.0.0-6.el9
edk2: edk2-aarch64-20230524-1.el9.noarch

Comment 25 Zhenyu Zhang 2023-07-04 13:41:08 UTC
When I used edk2-20230524-1.el9, the above "Out Of Resourc" message disappeared.
So set Verified

Version-Release number of selected component (if applicable):
hostkernel: 5.14.0-332.el9.aarch64+64k
qemu: qemu-kvm-8.0.0-6.el9
edk2: edk2-aarch64-20230524-1.el9.noarch

/usr/libexec/qemu-kvm -name 'avocado-vt-vm1'  -sandbox on  -blockdev '{"node-name": "file_aavmf_code", "driver": "file", "filename": "/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.qcow2", "auto-read-only": true, "discard": "unmap"}' -blockdev '{"node-name": "drive_aavmf_code", "driver": "qcow2", "read-only": true, "file": "file_aavmf_code"}' -blockdev '{"node-name": "file_aavmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-aarch64-64k-virtio-scsi_qcow2_filesystem_VARS.qcow2", "auto-read-only": true, "discard": "unmap"}' -blockdev '{"node-name": "drive_aavmf_vars", "driver": "qcow2", "read-only": false, "file": "file_aavmf_vars"}' -machine virt,gic-version=host,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1.0x0", "chassis": 1}' -nodefaults -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' -cpu 'host' -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' -device '{"id": "pcie-root-port-4", "port": 4, "driver": "pcie-root-port", "addr": "0x1.0x4", "bus": "pcie.0", "chassis": 5}' -enable-kvm -device '{"id": "pcie-root-port-5", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x2.0x0", "chassis": 6}' -device '{"id": "pcie-root-port-6", "driver": "pcie-root-port", "addr": "0x2.0x1", "bus": "pcie.0", "chassis": 7}' -device '{"id": "pcie-root-port-7", "driver": "pcie-root-port", "addr": "0x2.0x2", "bus": "pcie.0", "chassis": 8}' -device '{"id": "pcie-root-port-8", "driver": "pcie-root-port", "addr": "0x2.0x3", "bus": "pcie.0", "chassis": 9}' -device '{"id": "pcie-root-port-9", "driver": "pcie-root-port", "addr": "0x2.0x4", "bus": "pcie.0", "chassis": 10}' -device '{"id": "pcie-root-port-10", "driver": "pcie-root-port", "addr": "0x2.0x5", "bus": "pcie.0", "chassis": 11}' -device '{"id": "pcie-root-port-11", "driver": "pcie-root-port", "addr": "0x2.0x6", "bus": "pcie.0", "chassis": 12}' -device '{"id": "pcie-root-port-12", "driver": "pcie-root-port", "addr": "0x2.0x7", "bus": "pcie.0", "chassis": 13}' -device '{"id": "pcie-root-port-13", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3.0x0", "chassis": 14}' -device '{"id": "pcie-root-port-14", "driver": "pcie-root-port", "addr": "0x3.0x1", "bus": "pcie.0", "chassis": 15}' -device '{"id": "pcie-root-port-15", "driver": "pcie-root-port", "addr": "0x3.0x2", "bus": "pcie.0", "chassis": 16}' -device '{"id": "pcie-root-port-16", "driver": "pcie-root-port", "addr": "0x3.0x3", "bus": "pcie.0", "chassis": 17}' -serial stdio
VNC server running on ::1:5900
UEFI firmware (version edk2-20230524-1.el9 built at 00:00:00 on Jun 27 2023)
Tpm2SubmitCommand - Tcg2 - Not Found
Tpm2GetCapabilityPcrs fail!
Tpm2SubmitCommand - Tcg2 - Not Found
BdsDxe: loading Boot0001 "EFI Internal Shell" from Fv(64074AFE-340A-4BE6-94BA-91B5B4D0F71E)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
BdsDxe: starting Boot0001 "EFI Internal Shell" from Fv(64074AFE-340A-4BE6-94BA-91B5B4D0F71E)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)
UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)

Comment 27 errata-xmlrpc 2023-11-07 08:24:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: edk2 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6330


Note You need to log in before you can comment on or make changes to this bug.