Bug 1437113
Summary: | PCIe: Allow configuring Generic PCIe Root Ports MMIO Window | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Marcel Apfelbaum <marcel> |
Component: | qemu-kvm-rhev | Assignee: | Marcel Apfelbaum <marcel> |
Status: | CLOSED ERRATA | QA Contact: | jingzhao <jinzhao> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.4 | CC: | ailan, chayang, jinzhao, juzhang, lersek, marcel, michen, mtessun, virt-maint |
Target Milestone: | rc | Keywords: | FutureFeature |
Target Release: | --- | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.10.0-7.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-11 00:16:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1344299, 1434747 |
Description
Marcel Apfelbaum
2017-03-29 14:05:16 UTC
By default the Generic PCIe Root Port exposes a 2M MMIO window size. In case we want to attach a phys device to a VM, it is not enough for modern PCIe devices that may require more. Gerd: The best way to communicate window size hints would be to use a vendor specific pci capability (instead of setting the desired size on reset). The information will always be available then and we don't run into initialization order issues. Hi Marcel Could you share with more info about this bz. As QE know, checked the device that attached to pcie-root-port in guest # cat /proc/iomem [root@localhost ~]# lspci ....... 01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01) [root@localhost ~]# cat /proc/iomem .......... fc200000-fc23ffff : 0000:01:00.0 fc240000-fc240fff : 0000:01:00.0 .............. Am I right? Thanks Jing (In reply to jingzhao from comment #5) > Hi Marcel > > Could you share with more info about this bz. > > As QE know, checked the device that attached to pcie-root-port in guest > > # cat /proc/iomem > > [root@localhost ~]# lspci > ....... > 01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01) > > [root@localhost ~]# cat /proc/iomem > .......... > fc200000-fc23ffff : 0000:01:00.0 > fc240000-fc240fff : 0000:01:00.0 > .............. > Hi Jing, > Am I right? > I am afraid is a little more complicated. This is about reserving more/less MMIO or IO than the default values. You run QEMU with: -device pcie-root-port,id=p1,io-reserve=0x2000,mem-reserve=0x400000,pref32-reserve=0x400000 and check the lspci command in guest (if linux) or Device manager in Windows and see the values are passed correctly. And this is also not enough... you need an updated firmware that supports the above hints. You should use the latest OVMF rebase for RHEL-7.5 (thanks Laszlo!): * https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=608934 ovmf-20171011-1.git92d07e48907f.el7 Thanks, Marcel > Thanks > Jing Thanks marcel and lazslo I had tried it with ("https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14539582") and ovmf (OVMF-20171011-1.git92d07e48907f.el7.noarch) -device pcie-root-port,bus=pcie.0,id=root0,io-reserve=4K,mem-reserve=8M,pref32-reserve=16M \ the test result: ovmf log: PciBus: Discovered PPB @ [00|03|00] GetResourcePadding: Address=00:03.0 DevicePath=PciRoot(0x0)/Pci(0x3,0x0) GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0x1000 NonPrefetchable32BitMmio=0x800000 GetResourcePadding: Prefetchable32BitMmio=0x1000000 Prefetchable64BitMmio=0xFFFFFFFFFFFFFFFF Padding: Type = PMem32; Alignment = 0xFFFFFF; Length = 0x1000000 ^^^^^^^^^^^^^^^^(size=16M) Padding: Type = Mem32; Alignment = 0x7FFFFF; Length = 0x800000 ^^^^^^^^^(size=8M) Padding: Type = Io; Alignment = 0xFFF; Length = 0x1000 ^^^^^^^^^^(size=4K) BAR[0]: Type = Mem32; Alignment = 0xFFF; Length = 0x1000; Offset = 0x10 PciBus: Resource Map for Bridge [00|03|00] Type = Io16; Base = 0x7000; Length = 0x1000; Alignment = 0xFFF Base = Padding; Length = 0x1000; Alignment = 0xFFF Type = Mem32; Base = 0x98000000; Length = 0x2000000; Alignment = 0xFFFFFF Base = Padding; Length = 0x1000000; Alignment = 0xFFFFFF Base = Padding; Length = 0x800000; Alignment = 0x7FFFFF Base = 0x98000000; Length = 0x1000; Alignment = 0xFFF; Owner = PCI [01|00|00:14] Type = Mem32; Base = 0x9A205000; Length = 0x1000; Alignment = 0xFFF ^^^^^^^^^^(confused about above log, how can I check it?) Type = PMem64; Base = 0x800000000; Length = 0x100000; Alignment = 0xFFFFF Base = 0x800000000; Length = 0x4000; Alignment = 0x3FFF; Owner = PCI [01|00|00:20] lspci result: I/O behind bridge: 00007000-00007fff (size=4K) Memory behind bridge: 98000000-99ffffff (size = 64M) ? confused, didn't fixed it or other issues? Prefetchable memory behind bridge: 0000000800000000-00000008000fffff (size = 16M) Marcel, could you help to confirm it? Thanks Jing Hi Jing, here's an explanation for the log snippets you see: (In reply to jingzhao from comment #12) > Thanks marcel and lazslo > > I had tried it with > ("https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14539582") > and ovmf (OVMF-20171011-1.git92d07e48907f.el7.noarch) > > -device pcie-root-port,bus=pcie.0,id=root0,io-reserve=4K,mem-reserve=8M,pref32-reserve=16M \ > > > the test result: > > ovmf log: > > PciBus: Discovered PPB @ [00|03|00] > GetResourcePadding: Address=00:03.0 DevicePath=PciRoot(0x0)/Pci(0x3,0x0) > GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0x1000 NonPrefetchable32BitMmio=0x800000 > GetResourcePadding: Prefetchable32BitMmio=0x1000000 Prefetchable64BitMmio=0xFFFFFFFFFFFFFFFF These lines show that the PCI capability with the resource reservation hints has been parsed correctly. > Padding: Type = PMem32; Alignment = 0xFFFFFF; Length = 0x1000000 > ^^^^^^^^^^^^^^^^(size=16M) > > Padding: Type = Mem32; Alignment = 0x7FFFFF; Length = 0x800000 > > ^^^^^^^^^(size=8M) > Padding: Type = Io; Alignment = 0xFFF; Length = 0x1000 > ^^^^^^^^^^(size=4K) > BAR[0]: Type = Mem32; Alignment = 0xFFF; Length = 0x1000; Offset = 0x10 So this part basically lists the BARs ("resources needed") by the root port. The hints from the PCI capability are correctly turned into "Padding" pseudo-resources. The BAR#0 resource is a real one: IIRC, it is from the SHPC (Standard HotPlug Controller) BAR for the root port. "Offset" means the address of the BAR (base address register) itself in the config space of the device. > PciBus: Resource Map for Bridge [00|03|00] OK, this is printed after the enumeration and resource assignment have completed. This part will list the actual addresses allocated. > Type = Io16; Base = 0x7000; Length = 0x1000; Alignment = 0xFFF > Base = Padding; Length = 0x1000; Alignment = 0xFFF This corresponds to io-reserve=4K; the reservation has been allocated at IO port 0x7000, for 0x1000 ports. > Type = Mem32; Base = 0x98000000; Length = 0x2000000; Alignment = 0xFFFFFF > Base = Padding; Length = 0x1000000; Alignment = 0xFFFFFF > Base = Padding; Length = 0x800000; Alignment = 0x7FFFFF > Base = 0x98000000; Length = 0x1000; Alignment = 0xFFF; Owner = PCI [01|00|00:14] > Type = Mem32; Base = 0x9A205000; Length = 0x1000; Alignment = 0xFFF > > ^^^^^^^^^^(confused about above log, how can I check it?) > Type = PMem64; Base = 0x800000000; Length = 0x100000; Alignment = 0xFFFFF > Base = 0x800000000; Length = 0x4000; Alignment = 0x3FFF; Owner = PCI [01|00|00:20] So this is more tricky, and indeed I see a bug here (in the firmware). First, OVMF's PciHostBridgeLib passes the EFI_PCI_HOST_BRIDGE_COMBINE_MEM_PMEM flag to PciHostBridgeDxe. This means that the same root complex-level MMIO aperture will be used for allocating both prefetchable and non-prefetchable MMIO. This is why you see the Paddings for both mem-reserve=8M and pref32-reserve=16M under "Type = Mem32" -- PciBusDxe degrades the PMem32 resource request to Mem32. However, this degradation does not mean that the *maximum* of both will be taken, for resource reservation. The maximum *would* be taken if the resource requests were *originally* of the same type. Because they are originally different types, here they are handled separately after degradation, so they are added (rather than taking their maximum). IOW, we'll have a summed padding (reservation) for 32-bit MMIO of 24MB (0x180_0000). Second, there's an allocation at 0x98000000. This is for the sake of BAR#1 (offset 0x14) of the device that is plugged into the root port. This is marked as "Owner = PCI [01|00|00:14]": bus 1, slot 0 -- slot is guaranteed to be 0 for devices plugged into pcie root ports --, function 0, offset 0x14 (i.e. BAR#1). Note that this allocation is not in *addition* to the 24MB described above; instead it is *within* either the 16MB or the 8MB reservation (I can't tell without seeing more of the log, but this is a side topic anyway.) Third, you see an allocation at 0x9A205000 for the root port's own SHPC BAR. This is separate from the reservations, for good reason: resources needed by the port itself are accounted for in the aperture of the port's *parent* bus (i.e., the root complex in this case). Fourth, you see a 64-bit MMIO allocation at 0x8_0000_0000, namely for BAR#4 (= offset 0x20) of the same PCI device that is plugged into this pcie root port ("Owner = PCI [01|00|00:...]"). Notice "Length = 0x100000" near "Type = PMem64". While it doesn't seem to follow from the BAR size (0x4000) of the device behind the root port, this is in fact an expected rounding-up. Section 3.2.5.9. Prefetchable Memory Base Register and Prefetchable Memory Limit Register in the PCI-to-PCI Bridge Architecture Specification says, Thus, the bottom of the defined prefetchable memory address range will be aligned to a 1 MB boundary and the top of the defined memory address range will be one less than a 1 MB boundary. OK, so what about the bug I mentioned above: if you look at the top line, it says "Length = 0x2000000". That's wrong, it means 32MB, but it should be 24MB ("Length = 0x1800000"). This is fixed by the following upstream edk2 commit -- I just tested it --, which I will have to backport: * 6e3287442774 ("MdeModulePkg/PciBus: Fix bug that PCI BUS claims too much resource", 2017-10-20) > lspci result: Here you mis-calculated a few values, but your basic question is right: > I/O behind bridge: 00007000-00007fff (size=4K) Correct calculation on your part, and the result is correct too. > Memory behind bridge: 98000000-99ffffff (size = 64M) ? confused, didn't fixed it or other issues? In fact (0x99ffffff + 1 - 0x98000000) equals 0x200_0000, i.e., 32MB, not 64MB. Here lspci should report 24MB. This issue is a consequence of the above-mentioned firmware bug (where the firmware should set "Length = 0x1800000"). > Prefetchable memory behind bridge: 0000000800000000-00000008000fffff (size = 16M) Another mis-calculation on your part; (0x8000fffff + 1 - 0x800000000) equals 0x10_0000; that is, 1MB. Again, this is an expected result; see the rounding I mentioned above, from section 3.2.5.9. of the PCI bridge spec. > Marcel, could you help to confirm it? I won't clear the NEEDINFO just yet so Marcel can agree with or dispute my above analysis. Thanks, Laszlo (In reply to Laszlo Ersek from comment #13) > OK, so what about the bug I mentioned above: if you look at the top line, it > says "Length = 0x2000000". That's wrong, it means 32MB, but it should be > 24MB ("Length = 0x1800000"). This is fixed by the following upstream edk2 > commit -- I just tested it --, which I will have to backport: > > * 6e3287442774 ("MdeModulePkg/PciBus: Fix bug that PCI BUS claims too much > resource", 2017-10-20) Now tracked by bug 1514105. Fix included in qemu-kvm-rhev-2.10.0-7.el7 1. Tested it with kernel-3.10.0-820.el7.x86_64 & OVMF-20171011-4.git92d07e48907f.el7.noarch & qemu-kvm-rhev-2.10.0-12.el7.x86_64 checked lspci in guest and ovmf log ovmf log: PciBus: Discovered PPB @ [00|03|00] GetResourcePadding: Address=00:03.0 DevicePath=PciRoot(0x0)/Pci(0x3,0x0) GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0x1000 NonPrefetchable32BitMmio=0x800000 GetResourcePadding: Prefetchable32BitMmio=0x1000000 Prefetchable64BitMmio=0xFFFFFFFFFFFFFFFF Padding: Type = PMem32; Alignment = 0xFFFFFF; Length = 0x1000000 Padding: Type = Mem32; Alignment = 0x7FFFFF; Length = 0x800000 Padding: Type = Io; Alignment = 0xFFF; Length = 0x1000 lspci result: I/O behind bridge: 00007000-00007fff (size=4K) Memory behind bridge: 98000000-997fffff (size=24M) Prefetchable memory behind bridge: 0000000800000000-00000008000fffff (size=1M) According to comment 13, it is the expected result 2. Tested it with kernel-3.10.0-820.el7.x86_64 & seabios-1.11.0-1.el7.x86_64 & qemu-kvm-rhev-2.10.0-12.el7.x86_64 checked lspci result in guest: I/O behind bridge: 0000c000-0000cfff (size = 4K) Memory behind bridge: fc000000-fc7fffff (size = 8M ) Prefetchable memory behind bridge: 00000000fd800000-00000000fe7ffff (size = 16M) Is it the expected behavior with seabios? Could you help to check it? Following the key command of qemu: -device pcie-root-port,bus=pcie.0,id=root0,io-reserve=4K,mem-reserve=8M,pref32-reserve=16M \ -device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e,bus=root0 -netdev tap,id=tap10 \ Thanks Jing (In reply to jingzhao from comment #20) > 1. Tested it with kernel-3.10.0-820.el7.x86_64 & > OVMF-20171011-4.git92d07e48907f.el7.noarch & > qemu-kvm-rhev-2.10.0-12.el7.x86_64 > > checked lspci in guest and ovmf log > > ovmf log: > > PciBus: Discovered PPB @ [00|03|00] > GetResourcePadding: Address=00:03.0 DevicePath=PciRoot(0x0)/Pci(0x3,0x0) > GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0x1000 > NonPrefetchable32BitMmio=0x800000 > GetResourcePadding: Prefetchable32BitMmio=0x1000000 > Prefetchable64BitMmio=0xFFFFFFFFFFFFFFFF > Padding: Type = PMem32; Alignment = 0xFFFFFF; Length = 0x1000000 > Padding: Type = Mem32; Alignment = 0x7FFFFF; Length = 0x800000 > Padding: Type = Io; Alignment = 0xFFF; Length = 0x1000 > > lspci result: > > I/O behind bridge: 00007000-00007fff (size=4K) > Memory behind bridge: 98000000-997fffff (size=24M) > Prefetchable memory behind bridge: 0000000800000000-00000008000fffff > (size=1M) > > According to comment 13, it is the expected result > > 2. Tested it with kernel-3.10.0-820.el7.x86_64 & seabios-1.11.0-1.el7.x86_64 > & qemu-kvm-rhev-2.10.0-12.el7.x86_64 > > > checked lspci result in guest: > > I/O behind bridge: 0000c000-0000cfff (size = 4K) > Memory behind bridge: fc000000-fc7fffff (size = 8M ) > Prefetchable memory behind bridge: 00000000fd800000-00000000fe7ffff (size = > 16M) > > Is it the expected behavior with seabios? > > Could you help to check it? > > Following the key command of qemu: > > -device > pcie-root-port,bus=pcie.0,id=root0,io-reserve=4K,mem-reserve=8M,pref32- > reserve=16M \ > -device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e,bus=root0 -netdev > tap,id=tap10 \ > Exactly as expected, thank. A little thing about io-reseve=4k. This is the default size, you should use 8k to check it > > Thanks > Jing Test against with "io-reserve=8K,mem-reserve=8M,pref32-reserve=16M" with kernel-3.10.0-820.el7.x86_64 & OVMF-20171011-4.git92d07e48907f.el7.noarch & qemu-kvm-rhev-2.10.0-12.el7.x86_64 & seabios-1.11.0-1.el7.x86_64 1. test result of ovmf & q35 machine type: lspci result: I/O behind bridge: 00006000-00007fff (size=8k) Memory behind bridge: 98000000-997fffff Prefetchable memory behind bridge: 0000000800000000-00000008000fffff ovmf log: PciBus: Discovered PPB @ [00|03|00] GetResourcePadding: Address=00:03.0 DevicePath=PciRoot(0x0)/Pci(0x3,0x0) GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0x2000 NonPrefetchable32BitMmio=0x800000 GetResourcePadding: Prefetchable32BitMmio=0x1000000 Prefetchable64BitMmio=0xFFFFFFFFFFFFFFFF Padding: Type = PMem32; Alignment = 0xFFFFFF; Length = 0x1000000 Padding: Type = Mem32; Alignment = 0x7FFFFF; Length = 0x800000 Padding: Type = Io; Alignment = 0x1FFF; Length = 0x2000 (size=8k) BAR[0]: Type = Mem32; Alignment = 0xFFF; Length = 0x1000; Offset = 0x1 According to comment 13, it is the expected behavior 2. test result of q35 & seabios: lspci result: I/O behind bridge: 0000c000-0000dfff (size=8k) Memory behind bridge: fc000000-fc7fffff Prefetchable memory behind bridge: 00000000fd800000-00000000fe7ffff According to comment 21, it is the expect behavior Thanks Jing According to comment 13, 20, 21, 22, verified the issue changed to verified status Thanks Jing Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104 |