Bug 2192989
| Summary: | Cannot enable PCIe Resizable BAR on Intel ARC GPU [DG2] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Alex Williamson <alex.williamson> |
| Component: | kernel | Assignee: | Myron Stowe <mstowe> |
| kernel sub component: | PCI subsystem | QA Contact: | William Gomeringer <wgomerin> |
| Status: | NEW --- | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | pragyansri.pathi, prd-fedora |
| Version: | 9.3 | Keywords: | Triaged |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I recently got an Intel Arc GPU (A750). I put it in my system in a PCIe 4x slot, alone side my AMD 6800XT in the first PCIe slot. The system seems to work fine. However, just a few days after installing the Arc GPU, I was reading the logs, just to see if there were any interesting messages about it, and I noticed the message kernel: i915 0000:07:00.0: [drm] Failed to resize BAR2 to 8192M (-ENOSPC) which I thought was odd, because I thought ReBAR was kind of an old thing and hadn't heard about it in a while. So, I poked around my BIOS and discovered I had never enabled ReBAR. This is surprising because I think my system performs very well and is very smooth. But, I turned it on, anyway. Immediately, I noticed that the mouse pointer was jittery just moving around the desktop and moving windows around. It was more noticeable with BOINC distributed computing running on CPU+GPU, but still noticeable with no applications running at all. I turned off ReBAR, and the symptom disappeared. Looking at the journal, I see drm complaining that ReBAR isn't available when I have it disabled in the BIOS, and I see i915 complaining about it too, and pcieport reports "failure to assign ..." and "no space for...". All that makes sense. But, I don't see AMDGPU complaining about it. When ReBAR is enabled in the BIOS, i915 doesn't seem to complain about it, and I see drm saying "BAR=16384M" instead of "256M". I'm just reporting this here because it seems somewhat related and I'm curious if OP has any update. It seems like my Arc is able to use ReBAR when enabled, but there is this performance issue that comes with it. I don't know quite what to make of it. I just wanted to report that the issue I reported, earlier, seems to have resolved itself. I currently have ReBAR enabled and that symptom is gone. It seems really odd, since, as I think I mentioned, I don't think there has been any ReBAR issues in a long time and I assume nothing having to do with ReBAR changed, recently, with the kernel. Maybe Intel driver updates? Whatever. (In reply to Paul DeStefano from comment #2) > Looking at the journal, I see drm complaining that ReBAR isn't available > when I have it disabled in the BIOS, and I see i915 complaining about it > too, and pcieport reports "failure to assign ..." and "no space for...". All > that makes sense. But, I don't see AMDGPU complaining about it. When ReBAR > is enabled in the BIOS, i915 doesn't seem to complain about it, and I see > drm saying "BAR=16384M" instead of "256M". This is the nuance of ReBAR support in the Intel ARC implementation that this bug is intended to track. The ARC GPUs make use of a PCIe switch which consumes the same type of MMIO resource as used by the resizable BAR on the GPU, which makes it difficult for runtime resizing of the BAR by the OS. This does not affect BIOS enabled ReBAR support, assuming the BIOS sizes the BAR to the maximum supported value. Specific anomalies related to ARC GPU behavior outside of this resource conflict are also not intended to be tracked by this issue. As indicated in comment 0, AMD GPUs do not have this issue with runtime resizing because they do not introduce a resource conflict with their own PCIe switch. |
Description of problem: I have an Intel DG2 in the following configuration: +-[0000:5d]-+-00.0-[5e-61]----00.0-[5f-61]--+-01.0-[60]----00.0 Intel Corporation DG2 [Arc A380] \-04.0-[61]----00.0 Intel Corporation DG2 Audio Controller /proc/iomem reports the following resources: b8800000-c5ffffff : PCI Bus 0000:5d b9000000-ba0fffff : PCI Bus 0000:5e b9000000-ba0fffff : PCI Bus 0000:5f b9000000-b9ffffff : PCI Bus 0000:60 b9000000-b9ffffff : 0000:60:00.0 ba000000-ba0fffff : PCI Bus 0000:61 ba000000-ba003fff : 0000:61:00.0 b000000000-bfffffffff : PCI Bus 0000:5d bfe0000000-bff07fffff : PCI Bus 0000:5e bfe0000000-bfefffffff : PCI Bus 0000:5f bfe0000000-bfefffffff : PCI Bus 0000:60 bfe0000000-bfefffffff : 0000:60:00.0 bff0000000-bff07fffff : 0000:5e:00.0 From dmesg, resource assignments: PCI host bridge to bus 0000:5d pci_bus 0000:5d: root bus resource [io 0x8000-0x9fff window] pci_bus 0000:5d: root bus resource [mem 0xb8800000-0xc5ffffff window] pci_bus 0000:5d: root bus resource [mem 0xb000000000-0xbfffffffff window] pci_bus 0000:5d: root bus resource [bus 5d-7f] pci 0000:5e:00.0: [8086:4fa1] type 01 class 0x060400 pci 0000:5e:00.0: reg 0x10: [mem 0xbff0000000-0xbff07fffff 64bit pref] pci 0000:5d:00.0: PCI bridge to [bus 5e-61] pci 0000:5d:00.0: bridge window [mem 0xb9000000-0xba0fffff] pci 0000:5d:00.0: bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref] pci 0000:5e:00.0: PCI bridge to [bus 5f-61] pci 0000:5e:00.0: bridge window [mem 0xb9000000-0xba0fffff] pci 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] pci 0000:60:00.0: [8086:56a5] type 00 class 0x030000 pci 0000:60:00.0: reg 0x10: [mem 0xb9000000-0xb9ffffff 64bit] pci 0000:60:00.0: reg 0x18: [mem 0xbfe0000000-0xbfefffffff 64bit pref] pci 0000:60:00.0: reg 0x30: [mem 0xffe00000-0xffffffff pref] pci 0000:5f:01.0: PCI bridge to [bus 60] pci 0000:5f:01.0: bridge window [mem 0xb9000000-0xb9ffffff] pci 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] pci 0000:61:00.0: [8086:4f92] type 00 class 0x040300 pci 0000:61:00.0: reg 0x10: [mem 0xba000000-0xba003fff 64bit] pci 0000:5f:04.0: PCI bridge to [bus 61] pci 0000:5f:04.0: bridge window [mem 0xba000000-0xba0fffff] And finally, lspci: 0000:5d:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode]) Bus: primary=5d, secondary=5e, subordinate=61, sec-latency=0 I/O behind bridge: 0000f000-00000fff [disabled] Memory behind bridge: b9000000-ba0fffff [size=17M] Prefetchable memory behind bridge: 000000bfe0000000-000000bff07fffff [size=264M] 0000:5e:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01) (prog-if 00 [Normal decode]) Region 0: Memory at bff0000000 (64-bit, prefetchable) [size=8M] Bus: primary=5e, secondary=5f, subordinate=61, sec-latency=0 I/O behind bridge: 0000f000-00000fff [disabled] Memory behind bridge: b9000000-ba0fffff [size=17M] Prefetchable memory behind bridge: 000000bfe0000000-000000bfefffffff [size=256M] 0000:5f:01.0 PCI bridge: Intel Corporation Device 4fa4 (prog-if 00 [Normal decode]) Bus: primary=5f, secondary=60, subordinate=60, sec-latency=0 I/O behind bridge: 0000f000-00000fff [disabled] Memory behind bridge: b9000000-b9ffffff [size=16M] 0000:5f:04.0 PCI bridge: Intel Corporation Device 4fa4 (prog-if 00 [Normal decode]) Bus: primary=5f, secondary=61, subordinate=61, sec-latency=0 I/O behind bridge: 0000f000-00000fff [disabled] Memory behind bridge: ba000000-ba0fffff [size=1M] Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled] 0000:60:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A380] (rev 05) (prog-if 00 [VGA controller]) Region 0: Memory at b9000000 (64-bit, non-prefetchable) [disabled] [size=16M] Region 2: Memory at bfe0000000 (64-bit, prefetchable) [disabled] [size=256M] Expansion ROM at <ignored> [disabled] ... Capabilities: [420 v1] Physical Resizable BAR BAR 2: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 0000:61:00.0 Audio device: Intel Corporation DG2 Audio Controller Region 0: Memory at ba000000 (64-bit, non-prefetchable) [size=16K] From iomem and dmesg, we can see that the root port at 0000:5d:00.0 has a 64GB [0xb000000000-0xbfffffffff] 64-bit, prefetchable aperture available to it. Only 264MB of that aperture is programmed via the BIOS (system does not support BIOS enabled ReBAR). Of that 264MB, 8MB is allocated to the upstream switch port at 5e:00.0, the remaining 256MB is available in the downstream aperture of this bridge and is allocated to the DG2 GPU at 60:00.0. Therefore, in order to make use of PCIe Resizable BARs, not only does the PCI subsystem need to release the GPU resources, but it also needs to release the upstream switch BAR resources. Linux currently refuses to do this: # cat /sys/bus/pci/devices/0000\:60\:00.0/resource2_resize 0000000000003f00 # echo 9 > /sys/bus/pci/devices/0000\:60\:00.0/resource2_resize -bash: echo: write error: No space left on device dmesg reports: pci 0000:60:00.0: BAR 2: releasing [mem 0xbfe0000000-0xbfefffffff 64bit pref] pcieport 0000:5f:01.0: BAR 15: releasing [mem 0xbfe0000000-0xbfefffffff 64bit pref] pcieport 0000:5e:00.0: BAR 15: releasing [mem 0xbfe0000000-0xbfefffffff 64bit pref] pcieport 0000:5e:00.0: BAR 15: no space for [mem size 0x20000000 64bit pref] pcieport 0000:5e:00.0: BAR 15: failed to assign [mem size 0x20000000 64bit pref] pcieport 0000:5f:01.0: BAR 15: no space for [mem size 0x20000000 64bit pref] pcieport 0000:5f:01.0: BAR 15: failed to assign [mem size 0x20000000 64bit pref] pci 0000:60:00.0: BAR 2: no space for [mem size 0x20000000 64bit pref] pci 0000:60:00.0: BAR 2: failed to assign [mem size 0x20000000 64bit pref] pcieport 0000:5d:00.0: PCI bridge to [bus 5e-61] pcieport 0000:5d:00.0: bridge window [mem 0xb9000000-0xba0fffff] pcieport 0000:5d:00.0: bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref] pcieport 0000:5e:00.0: PCI bridge to [bus 5f-61] pcieport 0000:5e:00.0: bridge window [mem 0xb9000000-0xba0fffff] pcieport 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] pcieport 0000:5f:01.0: PCI bridge to [bus 60] pcieport 0000:5f:01.0: bridge window [mem 0xb9000000-0xb9ffffff] pcieport 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] pci 0000:60:00.0: BAR 2: assigned [mem 0xbfe0000000-0xbfefffffff 64bit pref] Meanwhile, an AMD GPU in the same system can trivially be resized: +-[0000:b0]-+-00.0-[b1-b3]----00.0-[b2-b3]----00.0-[b3]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon Pro W5700] +-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio +-00.2 Advanced Micro Devices, Inc. [AMD/ATI] Device 7316 \-00.3 Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 USB PCI host bridge to bus 0000:b0 pci_bus 0000:b0: root bus resource [io 0xc000-0xdfff window] pci_bus 0000:b0: root bus resource [mem 0xe1000000-0xee7fffff window] pci_bus 0000:b0: root bus resource [mem 0xe000000000-0xefffffffff window] pci_bus 0000:b0: root bus resource [bus b0-d7] pci 0000:b1:00.0: [1002:1478] type 01 class 0x060400 pci 0000:b1:00.0: reg 0x10: [mem 0xe1200000-0xe1203fff] pci 0000:b0:00.0: PCI bridge to [bus b1-b3] pci 0000:b0:00.0: bridge window [io 0xc000-0xcfff] pci 0000:b0:00.0: bridge window [mem 0xe1000000-0xe12fffff] pci 0000:b0:00.0: bridge window [mem 0xefe0000000-0xeff01fffff 64bit pref] pci 0000:b1:00.0: PCI bridge to [bus b2-b3] pci 0000:b1:00.0: bridge window [io 0xc000-0xcfff] pci 0000:b1:00.0: bridge window [mem 0xe1000000-0xe11fffff] pci 0000:b1:00.0: bridge window [mem 0xefe0000000-0xeff01fffff 64bit pref] pci 0000:b3:00.0: [1002:7312] type 00 class 0x030000 pci 0000:b3:00.0: reg 0x10: [mem 0xefe0000000-0xefefffffff 64bit pref] pci 0000:b3:00.0: reg 0x18: [mem 0xeff0000000-0xeff01fffff 64bit pref] pci 0000:b3:00.0: reg 0x20: [io 0xc000-0xc0ff] pci 0000:b3:00.0: reg 0x24: [mem 0xe1100000-0xe117ffff] pci 0000:b3:00.0: reg 0x30: [mem 0xfffe0000-0xffffffff pref] pci 0000:b3:00.1: [1002:ab38] type 00 class 0x040300 pci 0000:b3:00.1: reg 0x10: [mem 0xe1184000-0xe1187fff] pci 0000:b3:00.2: [1002:7316] type 00 class 0x0c0330 pci 0000:b3:00.2: reg 0x10: [mem 0xe1000000-0xe10fffff 64bit] pci 0000:b3:00.3: [1002:7314] type 00 class 0x0c8000 pci 0000:b3:00.3: reg 0x10: [mem 0xe1180000-0xe1183fff 64bit] pci 0000:b2:00.0: PCI bridge to [bus b3] pci 0000:b2:00.0: bridge window [io 0xc000-0xcfff] pci 0000:b2:00.0: bridge window [mem 0xe1000000-0xe11fffff] pci 0000:b2:00.0: bridge window [mem 0xefe0000000-0xeff01fffff 64bit pref] 0000:b0:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode]) Bus: primary=b0, secondary=b1, subordinate=b3, sec-latency=0 I/O behind bridge: 0000c000-0000cfff [size=4K] Memory behind bridge: e1000000-e12fffff [size=3M] Prefetchable memory behind bridge: 000000efe0000000-000000eff01fffff [size=258M] 0000:b1:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (prog-if 00 [Normal decode]) Region 0: Memory at e1200000 (32-bit, non-prefetchable) [size=16K] Bus: primary=b1, secondary=b2, subordinate=b3, sec-latency=0 I/O behind bridge: 0000c000-0000cfff [size=4K] Memory behind bridge: e1000000-e11fffff [size=2M] Prefetchable memory behind bridge: 000000efe0000000-000000eff01fffff [size=258M] 0000:b2:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (prog-if 00 [Normal decode]) Bus: primary=b2, secondary=b3, subordinate=b3, sec-latency=0 I/O behind bridge: 0000c000-0000cfff [size=4K] Memory behind bridge: e1000000-e11fffff [size=2M] Prefetchable memory behind bridge: 000000efe0000000-000000eff01fffff [size=258M] 0000:b3:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon Pro W5700] (prog-if 00 [VGA controller]) Region 0: Memory at efe0000000 (64-bit, prefetchable) [disabled] [size=256M] Region 2: Memory at eff0000000 (64-bit, prefetchable) [disabled] [size=2M] Region 4: I/O ports at c000 [disabled] [size=256] Region 5: Memory at e1100000 (32-bit, non-prefetchable) [disabled] [size=512K] Expansion ROM at e11a0000 [disabled] [size=128K] ... Capabilities: [200 v1] Physical Resizable BAR BAR 0: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB BAR 2: current size: 2MB, supported: 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB 0000:b3:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio Region 0: Memory at e1184000 (32-bit, non-prefetchable) [size=16K] 0000:b3:00.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 7316 (prog-if 30 [XHCI]) Region 0: Memory at e1000000 (64-bit, non-prefetchable) [size=1M] 0000:b3:00.3 Serial bus controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 USB Region 0: Memory at e1180000 (64-bit, non-prefetchable) [size=16K] The layout is ALMOST identical, but the minor difference is that the upstream switch port BAR at b1:00.0 makes use of the 32-bit, non-prefetchable range. The ONLY consumer of the 64-bit, prefetchable range downstream from the root port is the GPU itself. This means Linux will happily make use of PCI Resizable BARs: # cat /sys/bus/pci/devices/0000\:b3\:00.0/resource0_resize 0000000000003f00 # echo 9 > /sys/bus/pci/devices/0000\:b3\:00.0/resource0_resize # echo $? 0 dmesg: pci 0000:b3:00.0: BAR 0: releasing [mem 0xefe0000000-0xefefffffff 64bit pref] pci 0000:b3:00.0: BAR 2: releasing [mem 0xeff0000000-0xeff01fffff 64bit pref] pcieport 0000:b2:00.0: BAR 15: releasing [mem 0xefe0000000-0xeff01fffff 64bit pref] pcieport 0000:b1:00.0: BAR 15: releasing [mem 0xefe0000000-0xeff01fffff 64bit pref] pcieport 0000:b0:00.0: BAR 15: releasing [mem 0xefe0000000-0xeff01fffff 64bit pref] pcieport 0000:b0:00.0: BAR 15: assigned [mem 0xe000000000-0xe02fffffff 64bit pref] pcieport 0000:b1:00.0: BAR 15: assigned [mem 0xe000000000-0xe02fffffff 64bit pref] pcieport 0000:b2:00.0: BAR 15: assigned [mem 0xe000000000-0xe02fffffff 64bit pref] pci 0000:b3:00.0: BAR 0: assigned [mem 0xe000000000-0xe01fffffff 64bit pref] pci 0000:b3:00.0: BAR 2: assigned [mem 0xe020000000-0xe0201fffff 64bit pref] pcieport 0000:b0:00.0: PCI bridge to [bus b1-b3] pcieport 0000:b0:00.0: bridge window [io 0xc000-0xcfff] pcieport 0000:b0:00.0: bridge window [mem 0xe1000000-0xe12fffff] pcieport 0000:b0:00.0: bridge window [mem 0xe000000000-0xe02fffffff 64bit pref] pcieport 0000:b1:00.0: PCI bridge to [bus b2-b3] pcieport 0000:b1:00.0: bridge window [io 0xc000-0xcfff] pcieport 0000:b1:00.0: bridge window [mem 0xe1000000-0xe11fffff] pcieport 0000:b1:00.0: bridge window [mem 0xe000000000-0xe02fffffff 64bit pref] pcieport 0000:b2:00.0: PCI bridge to [bus b3] pcieport 0000:b2:00.0: bridge window [io 0xc000-0xcfff] pcieport 0000:b2:00.0: bridge window [mem 0xe1000000-0xe11fffff] pcieport 0000:b2:00.0: bridge window [mem 0xe000000000-0xe02fffffff 64bit pref] The decision of the Intel DG2 to make use of an upstream switch port with a BAR in the same resource pool as the GPU resizable BAR requires enhancements to the Linux PCI core to be able to handle this scenario. Version-Release number of selected component (if applicable): The issue is present in the usptream Linux kernel as of v6.3. The above is from 5.14.0-306.el9.x86_64 How reproducible: 100% Steps to Reproduce: 1. As outlined above 2. 3. Actual results: Resizable BARs cannot be configured from the OS on Intel DG2 GPUs, but is available for AMD GPUs. Expected results: Better hardware choices or enhancement to the Linux PCI resource subsystem to manage the upstream switch resource BAR. Additional info: Do we need a pci=reassign option to move 64-bit, prefetchable resources for non-endpoints to the 32-bit, non-prefetchable address space?