RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1967494 - kernel BUG at mm/ioremap.c:76 for a guest exposed with pcie expander bridge/root port
Summary: kernel BUG at mm/ioremap.c:76 for a guest exposed with pcie expander bridge/r...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.0
Hardware: aarch64
OS: Linux
low
low
Target Milestone: beta
: ---
Assignee: Virtualization Maintenance
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-03 08:49 UTC by Eric Auger
Modified: 2021-12-07 22:30 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-24 08:06:41 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eric Auger 2021-06-03 08:49:18 UTC
While testing the pcie expander bridge on a rhel9 guest (5.12.0-1.el9.aarch64) I noticed the boot fails with

[    0.818089] ------------[ cut here ]------------
[    0.818868] kernel BUG at mm/ioremap.c:76!
[    0.819551] Internal error: Oops - BUG: 0 [#1] SMP
[    0.820347] Modules linked in:
[    0.820858] CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.12.0-1.el9.aarch64 #1
[    0.822052] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[    0.823189] pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
[    0.824188] pc : ioremap_pte_range+0x160/0x190
[    0.824934] lr : ioremap_pmd_range+0x12c/0x1c0
[    0.825671] sp : ffff80001294f810
[    0.826224] x29: ffff80001294f810 x28: 0000000000000001 
[    0.827104] x27: 0068000000000717 x26: fffffbfffe801000 
[    0.828019] x25: 00000400407f0000 x24: ffff800011e70000 
[    0.828956] x23: 0400000000000001 x22: 0000000000000041 
[    0.829889] x21: fffffc0020070000 x20: ffff00038fc70038 
[    0.830819] x19: 0068000060850717 x18: ffffffffffffffff 
[    0.831762] x17: 0000000000000000 x16: 0000000000000001 
[    0.832690] x15: ffff0000c286de38 x14: 0000000000000002 
[    0.833618] x13: 0000000000000000 x12: 7830206d656d5b20 
[    0.834557] x11: 00000000ffffff76 x10: 000000000000002e 
[    0.835487] x9 : ffff800010307ffc x8 : ffff0000c2866580 
[    0.836416] x7 : ffff8000118fa000 x6 : 00000003cfc40000 
[    0.837354] x5 : ffff80001294f954 x4 : 0140000000000000 
[    0.838288] x3 : 00000003cfc50000 x2 : fffffbfffe801000 
[    0.839222] x1 : fff1000040000000 x0 : ffff0003ffdc9700 
[    0.840150] Call trace:
[    0.840579]  ioremap_pte_range+0x160/0x190
[    0.841305]  ioremap_pmd_range+0x12c/0x1c0
[    0.842026]  ioremap_page_range+0xa8/0x1d0
[    0.842743]  pci_remap_iospace+0x80/0x94
[    0.843432]  acpi_pci_probe_root_resources+0x190/0x250
[    0.844336]  pci_acpi_root_prepare_resources+0x28/0xd0
[    0.845239]  acpi_pci_root_create+0x9c/0x340
[    0.845985]  pci_acpi_scan_root+0x14c/0x240
[    0.846716]  acpi_pci_root_add+0x15c/0x2a0
[    0.847444]  acpi_bus_attach+0x15c/0x2f0
[    0.848134]  acpi_bus_attach+0x94/0x2f0
[    0.848810]  acpi_bus_attach+0x94/0x2f0
[    0.849480]  acpi_bus_scan+0x60/0x114
[    0.850120]  acpi_scan_init+0x110/0x268
[    0.850794]  acpi_init+0xd4/0x140
[    0.851385]  do_one_initcall+0x50/0x270
[    0.852061]  do_initcalls+0x104/0x144
[    0.852704]  kernel_init_freeable+0x174/0x1c0
[    0.853464]  kernel_init+0x20/0x134
[    0.854078]  ret_from_fork+0x10/0x18
[    0.854716] Code: aa1403e0 97f4b517 d2e02804 17ffffd9 (d4210000) 
[    0.855780] ---[ end trace d8b97d47eba36df8 ]---
[    0.856585] Kernel panic - not syncing: Oops - BUG: Fatal exception
[    0.857688] SMP: stopping secondary CPUs
[    0.858405] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---

Also a BUG_ON hits with an upstream kernel (5.13 rc4) with 64kB pages, but slightly different:

[    0.928475] ------------[ cut here ]------------
[    0.929260] kernel BUG at mm/vmalloc.c:96!
[    0.929979] Internal error: Oops - BUG: 0 [#1] SMP
[    0.930803] Modules linked in:
[    0.931337] CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc4-guest-64K+ #48
[    0.932600] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[    0.933785] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[    0.934814] pc : vmap_range+0x1d8/0x208
[    0.935478] lr : vmap_range+0x38/0x208
[    0.936128] sp : fffffe001292f810
[    0.936702] x29: fffffe001292f810 x28: fffffffefe800000 x27: fffffffefe801000
[    0.937923] x26: ffffffff20070000 x25: 00000001407f0000 x24: 0140000000000000
[    0.939143] x23: fffffe001134ffb8 x22: 000000000000003f x21: 0068000000000717
[    0.940372] x20: 000000003eff3000 x19: fffffffefe801000 x18: 0000000000000030
[    0.941597] x17: 0000000000000000 x16: 0000000000000001 x15: ffffffffffffffff
[    0.942832] x14: fffffe00118b99c8 x13: fffffc00c1f069b8 x12: 0000000000000000
[    0.944056] x11: 000000000000002e x10: fffffe0012142000 x9 : fffffe0010d3141c
[    0.945278] x8 : fffffc00c1f0d400 x7 : 0000000000000000 x6 : 0000000000000000
[    0.946503] x5 : 0068000000000f17 x4 : 000000003eff0000 x3 : 0000000000001ff7
[    0.947726] x2 : 0040000000000001 x1 : fffffc03f0680038 x0 : fffffc03ffded340
[    0.948945] Call trace:
[    0.949367]  vmap_range+0x1d8/0x208
[    0.949978]  ioremap_page_range+0x20/0x30
[    0.950668]  pci_remap_iospace+0x74/0x88
[    0.951349]  acpi_pci_probe_root_resources+0x180/0x238
[    0.952241]  pci_acpi_root_prepare_resources+0x28/0xc8
[    0.953128]  acpi_pci_root_create+0x9c/0x2f8
[    0.953869]  pci_acpi_scan_root+0x14c/0x230
[    0.954587]  acpi_pci_root_add+0x268/0x5c0
[    0.955291]  acpi_bus_attach+0x15c/0x2c0
[    0.955965]  acpi_bus_attach+0x9c/0x2c0
[    0.956631]  acpi_bus_attach+0x9c/0x2c0
[    0.957290]  acpi_bus_scan+0x64/0x118
[    0.957919]  acpi_scan_init+0x10c/0x25c
[    0.958581]  acpi_init+0x40c/0x478
[    0.959166]  do_one_initcall+0x54/0x268
[    0.959836]  kernel_init_freeable+0x23c/0x2d8
[    0.960584]  kernel_init+0x1c/0x128
[    0.961190]  ret_from_fork+0x10/0x18
[    0.961815] Code: a90687e2 97f518cd a94687e2 17ffffe4 (d4210000) 
[    0.962870] ---[ end trace 1c594c9170b51a5f ]---
[    0.963667] Kernel panic - not syncing: Oops - BUG: Fatal exception
[    0.964738] SMP: stopping secondary CPUs
[    0.965442] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---

However the upstream guest boots normally if it uses 4kB page size 

[root@vm-rhel9 ~]# lspci -tv
-+-[0000:fe]---00.0-[ff]----00.0  Red Hat, Inc. Virtio network device
 \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
             +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
             +-02.0-[01]--
             +-03.0-[02]--
             \-04.0-[03]----00.0  Red Hat, Inc. Virtio block device


Used qemu command line:

qemu-system-aarch64 -M virt,gic-version=host -cpu host -smp 8 -m 16G,maxmem=32G,slots=3 -display none --enable-kvm -serial stdio -device pxb-pcie,bus_nr=254,id=bridge,bus=pcie.0 -device pcie-root-port,bus=bridge,chassis=4,id=pcie.11 -device virtio-net-pci,bus=pcie.11,netdev=nic0,mac=6a:f5:10:b1:3d:d2 -netdev tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1,addr=0x2.0x0 -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2,addr=0x3.0x0 -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3,addr=0x4.0x0 -qmp unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -device virtio-blk-pci,bus=pcie.3,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop -drive file=/home/augere/VM/IMAGES/aarch64-vm2-rhel9.0.qcow2,format=qcow2,if=none,cache=writethrough,id=drv0 -bios /home/augere/VM/UEFI/QEMU_EFI_4198.img -kernel /home/augere/VM/BOOT/vmlinuz-5.13.0-rc4-guest-64K+ -initrd /home/augere/VM/BOOT/initrd.img-5.13.0-rc4-guest-64K+ -append 'root=/dev/mapper/rhel_fedora-root ro rd.lvm.lv=rhel_fedora/root rd.lvm.lv=rhel_fedora/swap acpi=force' -net none -d guest_errors

Comment 2 Eric Auger 2021-06-22 15:45:58 UTC
Created attachment 1793082 [details]
Traces featuring upstream EDK2 in DEBUG mode and linux BUG_ON()

Comment 3 Eric Auger 2021-06-22 15:52:43 UTC
To me it looks like an EDK2 issue:

PciBus: Resource Map for Root Bridge PciRoot(0x0) -> GPEX
Type =   Io16; Base = 0x0;      Length = 0x3000;        Alignment = 0xFFF

PciBus: Resource Map for Root Bridge PciRoot(0x4) -> PCIe expander bridge
Type =   Io16; Base = 0x3000;   Length = 0x1000;        Alignment = 0xFFF
                         ^

The base is not aligned with the guest 64kB page

Then on PCIe expander bridge Io16 remap, we have
pci_remap_iospace vaddr=0xfffffffefe800000, size=0x1000, phys_addr=0x3eff3000
                                                                           ^
which hits a BUG_ON on vmap_range because I think the page already has an entry that was created when Ioremapping the GPX Io16

Do you share my understanding? Should be the alignement 0xFFFF instead?

Comment 4 Laszlo Ersek 2021-06-23 20:49:00 UTC
UEFI (per spec) only deals with a single (last level) page size, and that's 4KB.

Short version: please set the following option on the QEMU command line, and retry:

  -global pcie-root-port.io-reserve=0


Long version: here's an excerpt from my email that I sent to Eric basically in parallel to the above needinfo being set on me.

I agree with your analysis that it's an alignment issue. The PCI bus
driver in the firmware, "MdeModulePkg/Bus/Pci/PciBusDxe", assigns bridge
IO port resources with a 4KB alignment (see "BridgeIoAlignment").
Furthermore, on aarch64/virt, the IO port aperture is simulated through
a special MMIO aperture. The guest kernel is however unable to map the
4KB MMIO areas in question separately, in units of 64KiB.

Now, the guest firmware actually lets you dictate the "padding size" for
resource reservation. This was enabled for the above BZ in commit
e843a21e23ea ("ArmVirtPkg/ArmVirtQemu: Add support for HotPlug",
2021-01-20).

The syntax is

  -device pcie-root-port,[properties],io-reserve=...

While this is primarily for hotplug purposes, I think you could
theoretically use it for enforcing alignment as well...

Unfortunately however, the entire IO port aperture (which is simulated
through MMIO on aarch64/virt), is only 64KiB! All root bridges on the
sole host bridge share that aperture, and it is only 64KiB. This is
parsed by the firmware from QEMU's DTB, and it is logged as:

  ProcessPciHost: Config[0x4010000000+0x10000000) Bus[0x0..0xFF] Io[0x0+0x10000)@0x3EFF0000 Mem32[0x10000000+0x2EFF0000)@0x0 Mem64[0x8000000000+0x8000000000)@0x0

The relevant part is "Io[0x0+0x10000)@0x3EFF0000". It says that the full
size is 64KiB (0x10000), and it is based at MMIO 0x3EFF0000 (you can see
that constant in the kernel messages too).

So... even if you could force the IO port reservation *per root port* to
be 64KiB, that wouldn't work, because you have 64KiB IO port space, for
all root ports together


How about this instead:

PCIe devices are required to function properly without IO resources. So
what if you explicitly state that *zero* IO port space should be
reserved for the root port in question?

  -device pcie-root-port,[properties],io-reserve=0
                                      ^^^^^^^^^^^^

This will definitely cause the guest firmware to reserve no IO port
space for the root port; therefore the guest kernel should not attempt
to io-map any such MMIO range (regardless of page size).


In fact, you can do this for *all* PCIe root ports at once:

  -global pcie-root-port.io-reserve=0

(This is what I have in one of my libvirt domain XMLs:

  <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
    <qemu:commandline>
      <qemu:arg value='-global'/>
      <qemu:arg value='pcie-root-port.io-reserve=0'/>
    </qemu:commandline>
  </domain>

... I have now tried this with one of my long-term aarch64 libvirt
domains (no pxb-pcie usage, just one root bridge with several root ports
on it), and *all* Io16 resources have disappeared from the firmware log.

Comment 5 Eric Auger 2021-06-24 07:48:06 UTC
Hi Laszlo, many thanks for your detailed answer here and in the separate email. Effectively this works fine in my case at qemu level and I don't see any Io16 allocation anymore in the EDK2 log.

Before submitting a new BZ at libvirt level I would like to double check with all the stake holders what could be the overall consequences of globally setting io-reserve=0 for the supported guest PCIe topology. Does anyone foresee any possible regression for supported PCIe/PCI devices?

Thanks

Eric

Comment 6 Eric Auger 2021-06-24 08:03:21 UTC
Actually we can reduce the scope of io-reserve=0 to the root port plugged onto the pcie expander bridge. Then an Io16 region is allocated once for the GPEX and none is attempted for the PXB. I checked this works too and this does not bring any regression on existing supported use cases.

So I am going to close that bug as NO_BUG and add those info in the associated libvirt BZ.


Note You need to log in before you can comment on or make changes to this bug.