Bug 1529618
| Summary: | [Q35] MMIO hint is not passed to the guest OS when set mem-reserve=4G | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | jingzhao <jinzhao> | ||||||||||||||||||
| Component: | ovmf | Assignee: | Laszlo Ersek <lersek> | ||||||||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | FuXiangChun <xfu> | ||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||
| Priority: | high | ||||||||||||||||||||
| Version: | 7.5 | CC: | chayang, jinzhao, juzhang, marcel, virt-maint | ||||||||||||||||||
| Target Milestone: | rc | ||||||||||||||||||||
| Target Release: | 7.4 | ||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||
| Fixed In Version: | Doc Type: | Enhancement | |||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||
| Clone Of: | 1390346 | Environment: | |||||||||||||||||||
| Last Closed: | 2018-01-05 12:40:04 UTC | Type: | Bug | ||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
| Embargoed: | |||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||
Hi Laszlo
summarized the test result, hope you can see it clearly
1. used the default value "-device pcie-root-port,id=root.3,slot=4 " and plug memory to the pcie-root-port
{'execute': 'object-add', 'arguments': {'id': 'shmmem-shmem0', 'qom-type': 'memory-backend-ram', 'props': {'policy': 'default', 'size': 4294967296}}}
{"return": {}}
{'execute': 'device_add', 'arguments':{'id': 'shmem0','driver': 'ivshmem-plain', 'memdev': 'shmmem-shmem0', 'bus':'root.3'}}
{"return": {}}
test result:
[root@localhost ~]# lspci -vvv -t
-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
+-01.0 Red Hat, Inc. QXL paravirtual graphic card
+-02.0-[01]----00.0 Red Hat, Inc. Virtio block device
+-03.0-[02-06]----00.0-[03-06]--+-00.0-[04]--
| +-01.0-[05]--
| \-02.0-[06]--
+-04.0-[07]----00.0 Red Hat, Inc. Virtio network device
+-05.0-[08]----00.0 Red Hat, Inc. Inter-VM shared memory
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
[root@localhost ~]# lspci -v -s 08:00.0
08:00.0 RAM memory: Red Hat, Inc. Inter-VM shared memory (rev 01)
Subsystem: Red Hat, Inc. QEMU Virtual Machine
Physical Slot: 4
Flags: fast devsel
Memory at 98000000 (32-bit, non-prefetchable) [size=256]
Memory at <unassigned> (64-bit, prefetchable)
Kernel modules: virtio_pci
2. used the "mem-reserve=1G" (-device pcie-root-port,id=root.3,slot=4,mem-reserve=1G) and do the hotplug operation
[root@localhost ~]# lspci -v -s 08:00.0
08:00.0 RAM memory: Red Hat, Inc. Inter-VM shared memory (rev 01)
Subsystem: Red Hat, Inc. QEMU Virtual Machine
Physical Slot: 4
Flags: fast devsel
Memory at 98b00000 (32-bit, non-prefetchable) [size=256]
Memory at <unassigned> (64-bit, prefetchable)
Kernel modules: virtio_pci
[root@localhost ~]# lspci -vvv -t
-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
+-01.0 Red Hat, Inc. QXL paravirtual graphic card
+-02.0-[01]----00.0 Red Hat, Inc. Virtio block device
+-03.0-[02-06]----00.0-[03-06]--+-00.0-[04]--
| +-01.0-[05]--
| \-02.0-[06]--
+-04.0-[07]----00.0 Red Hat, Inc. Virtio network device
+-05.0-[08]----00.0 Red Hat, Inc. Inter-VM shared memory
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
3. used the "mem-reserve=4G" (-device pcie-root-port,id=root.3,slot=4,mem-reserve=4G) and do the hotplug operation
[root@localhost ~]# lspci -vvv -t
-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
+-01.0 Red Hat, Inc. QXL paravirtual graphic card
+-02.0-[01]----00.0 Red Hat, Inc. Virtio block device
+-03.0-[02-06]----00.0-[03-06]--+-00.0-[04]--
| +-01.0-[05]--
| \-02.0-[06]--
+-04.0-[07]----00.0 Red Hat, Inc. Virtio network device
+-05.0-[08]----00.0 Red Hat, Inc. Inter-VM shared memory
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
[root@localhost ~]# lspci -v -s 08:00.0
08:00.0 RAM memory: Red Hat, Inc. Inter-VM shared memory (rev 01)
Subsystem: Red Hat, Inc. QEMU Virtual Machine
Physical Slot: 4
Flags: fast devsel
Memory at 98b00000 (32-bit, non-prefetchable) [size=256]
Memory at <unassigned> (64-bit, prefetchable)
Kernel modules: virtio_pci
the detailed dmesg log and ovmf log, please check the attachment
Created attachment 1376671 [details]
dmesg of default parameter
Created attachment 1376672 [details]
dmesg log of mem reserve 1G
Created attachment 1376673 [details]
dmesg of mem reserve 4G
Created attachment 1376674 [details]
ovmf log of default
Created attachment 1376675 [details]
ovmf log of mem reserve 1G
Created attachment 1376676 [details]
ovmf log of mem reserve 4G
Hi Jing, thank you for summarizing the steps in comment 3. I have executed the same test (with ivshmem) earlier, successfully. And, I don't even need to check the logs in comments 4 through 9; I can see the problem in comment 3 already. Let me explain: (1) The ivshmem device has two BARs: - a small, fixed size (256 byte) MMIO BAR that is non-prefetchable, - a large MMIO BAR that is prefetchable; this BAR size is controlled by the end-user (on the QEMU command line or in the monitor command) (2) A "prefetchable" BAR means that it can be read by the system (not just the CPU, but also by other system components) at any time, without side effects to the device. A "non-prefetchable" BAR may only be read by the system if the CPU actually wants the device to do something. Therefore, a "prefetchable" BAR may be placed in both prefetched and non-prefetched apertures. If it is allocated from a prefetched aperture, then the system might perform spurious reads from the BAR, but the device is fine with that. If the BAR is allocated from a non-prefetched aperture, then the system will simply not perform spurious reads. Conversely, a "non-prefetchable" BAR may only be placed in non-prefetched apertures. Otherwise, the system might perform a spurious read to the BAR (through the prefetched aperture), and the device would *not* like that. This means that the small BAR of the ivshmem device may only be allocated from non-prefetched aperture (*all* reads to this BAR will have an effect on the device). Whereas the large BAR of the ivshmem device (which is actually used for inter-VM memory sharing) may be allocated from both prefetched and non-prefetched apertures -- the device couldn't care less about "spurious" reads to the shared guest RAM. (3) The "pcie-root-port" device has 3 properties that control MMIO aperture reservations: (a) mem-reserve: reserve non-prefetched MMIO aperture, 32-bit *only* (b) pref32-reserve: reserve prefetched MMIO aperture, 32-bit (c) pref64-reserve: reserve prefetched MMIO aperture, 64-bit There are two rules about them: - each one of the three reservation hints is optional, - the "pref32-reserve" and "pref64-reserve" hints are mutually exclusive. (4) Given that you want to make the "large BAR" of the ivshmem device 4GB in size, you have to reserve *at least* 4GB aperture on the PCI Express root port level. Furthermore, given the >=4GB reservation size, *only* the "pref64-reserve" reservation hint is suitable. NEEDINFO: Therefore, please replace the "mem-reserve" property on your QEMU command line, with "pref64-reserve", and repeat the test with the original (unchanged) QMP commands. The expected results are: - The non-prefetchable "small BAR" of the ivshmem device will be allocated from the non-prefetched, 32-bit only, MMIO aperture that OVMF reserves by default for the PCI Express root port -- 2MB in size; - The prefetchable "large BAR" of the device will be allocated from the prefetched, 64-bit MMIO aperture that OVMF will reserve due to the "pref64-reserve" property. (5) Side topic: let's say you only want to use a 256MB ivshmem device. This means we need to reserve at least 256MB prefetched or non-prefetched aperture for the large BAR, and 256B non-prefetched aperture for the small BAR. Any of the following would work for that: - mem-reserve=512M: the non-prefetched aperture reservation would contain both BARs - pref32-reserve=256M: the small BAR would go into the default 2MB non-pref aperture, the large BAR would go into the 256MB 32-bit pref reservation - pref64-reserve=256M: the small BAR would go into the default 2MB non-pref aperture, the large BAR would go into the 256MB 64-bit pref reservation Thanks! Hi Laszlo
Thanks your detailed explain
Test against with "pref64-reserve=4G" and detailed test result
1. Boot guest with qemu command line
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-nodefaults -rtc base=utc \
-m 4G \
-smp 2,sockets=2,cores=1,threads=1 \
-enable-kvm \
-name rhel7.4 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-k en-us \
-serial unix:/tmp/console,server,nowait \
-boot menu=on \
-qmp tcp::8887,server,nowait \
-vga qxl \
-drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/home/test/OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=on \
-spice port=5932,disable-ticketing \
-debugcon file:/home/ovmf.log \
-global isa-debugcon.iobase=0x402 \
-device pcie-root-port,id=root.0,slot=1 \
-drive file=/home/test/rhel75-ovmf-bk.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,bus=root.0,drive=drive-virtio-disk0,id=virtio-disk0,disable-legacy=on,disable-modern=off,bootindex=1 \
-device pcie-root-port,id=root.1,slot=2 \
-device x3130-upstream,bus=root.1,id=upstream1 \
-device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
-device xio3130-downstream,bus=upstream1,id=downstream2,chassis=2 \
-device xio3130-downstream,bus=upstream1,id=downstream3,chassis=3 \
-device pcie-root-port,id=root.2,slot=3 \
-netdev tap,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=54:52:00:B6:40:22,bus=root.2 \
-device pcie-root-port,id=root.3,slot=4,pref64-reserve=4G \
-monitor stdio \
2. Hotplug memory to root.3
3. check result in guest
[root@localhost home]# lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
00:02.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:03.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:04.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:05.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)
02:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream) (rev 02)
03:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
03:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
03:02.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
07:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
08:00.0 RAM memory: Red Hat, Inc. Inter-VM shared memory (rev 01)
[root@localhost home]# lspci -vvv -t
-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
+-01.0 Red Hat, Inc. QXL paravirtual graphic card
+-02.0-[01]----00.0 Red Hat, Inc. Virtio block device
+-03.0-[02-06]----00.0-[03-06]--+-00.0-[04]--
| +-01.0-[05]--
| \-02.0-[06]--
+-04.0-[07]----00.0 Red Hat, Inc. Virtio network device
+-05.0-[08]----00.0 Red Hat, Inc. Inter-VM shared memory
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
[root@localhost home]# lspci -v -s 08:00.0
08:00.0 RAM memory: Red Hat, Inc. Inter-VM shared memory (rev 01)
Subsystem: Red Hat, Inc. QEMU Virtual Machine
Physical Slot: 4
Flags: fast devsel
Memory at 98000000 (32-bit, non-prefetchable) [size=256]
Memory at 800000000 (64-bit, prefetchable) [size=4G]
Kernel modules: virtio_pci
dmesg log and ovmf log, please check the attachment
Created attachment 1377359 [details]
dmesg with pref64-reserve=4G
Created attachment 1377360 [details]
ovmf log with pref64-reserve=4G
* From the OVMF log: The hint is parsed fine in OVMF (see "Prefetchable64BitMmio"): > PciBus: Discovered PPB @ [00|05|00] > GetResourcePadding: Address=00:05.0 DevicePath=PciRoot(0x0)/Pci(0x5,0x0) > GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0xFFFFFFFFFFFFFFFF NonPrefetchable32BitMmio=0xFFFFFFFF > GetResourcePadding: Prefetchable32BitMmio=0xFFFFFFFF Prefetchable64BitMmio=0x100000000 > Padding: Type = Mem32; Alignment = 0x1FFFFF; Length = 0x200000 > Padding: Type = Io; Alignment = 0x1FF; Length = 0x200 > Padding: Type = PMem64; Alignment = 0xFFFFFFFF; Length = 0x100000000 > BAR[0]: Type = Mem32; Alignment = 0xFFF; Length = 0x1000; Offset = 0x10 You can see the default, non-pref, 32-bit only, aperture reservation, 2MB in size ("Mem32"). Similarly the default 512B IO-port reservation, "Io" (which would be disabled by "io-reserve=0" in other use cases). The "PMem64" padding stands for the 4GB reservation from "pref64-reserve=4G". The final entry ("BAR[0]") is the root port's SHPC (standard hot plug controller) BAR. > PciBus: Resource Map for Bridge [00|05|00] > Type = Io16; Base = 0x6000; Length = 0x200; Alignment = 0xFFF > Base = Padding; Length = 0x200; Alignment = 0x1FF > Type = Mem32; Base = 0x98000000; Length = 0x200000; Alignment = 0x1FFFFF > Base = Padding; Length = 0x200000; Alignment = 0x1FFFFF > Type = Mem32; Base = 0x98C03000; Length = 0x1000; Alignment = 0xFFF > Type = PMem64; Base = 0x800000000; Length = 0x100000000; Alignment = 0xFFFFFFFF > Base = Padding; Length = 0x100000000; Alignment = 0xFFFFFFFF This shows where the aperture reservations were actually allocated. The non-pref 32-bit-only 2MB reservation is allocated at 0x9800_0000. The pref 64-bit 4GB reservation is allocated at 0x8_0000_0000. * From the guest kernel dmesg: > [ 54.047108] pciehp 0000:00:05.0:pcie004: Slot(4): Attention button pressed > [ 54.047117] pciehp 0000:00:05.0:pcie004: Slot(4): Card present > [ 54.047143] pciehp 0000:00:05.0:pcie004: Slot(4) Powering on due to button press > [ 55.149156] pci 0000:08:00.0: [1af4:1110] type 00 class 0x050000 > [ 55.149198] pci 0000:08:00.0: reg 0x10: [mem 0x00000000-0x000000ff] > [ 55.149262] pci 0000:08:00.0: reg 0x18: [mem 0x00000000-0xffffffff 64bit pref] > [ 55.149888] pci 0000:08:00.0: BAR 2: assigned [mem 0x800000000-0x8ffffffff 64bit pref] > [ 55.161699] pci 0000:08:00.0: BAR 0: assigned [mem 0x98000000-0x980000ff] > [ 55.162091] pcieport 0000:00:05.0: PCI bridge to [bus 08] > [ 55.162098] pcieport 0000:00:05.0: bridge window [io 0x6000-0x6fff] > [ 55.162681] pcieport 0000:00:05.0: bridge window [mem 0x98000000-0x981fffff] > [ 55.176399] pcieport 0000:00:05.0: bridge window [mem 0x800000000-0x8ffffffff 64bit pref] This shows -- matching the last "lspci" output from comment 11 -- that Linux allocates the ivshmem device's "small BAR" from the root port's 2MB reservation at 0x9800_0000, and the "large BAR" from the root port's 4GB reservation at 0x8_0000_0000. This matches the "expected results" from my comment 10 bullet (4). Thus, everything's fine. Closing this BZ as NOTABUG -- using the "mem-reserve=4G" property was the issue; "pref64-reserve=4G" proved OK. |
Hi, an OVMF debug log does not seem to be attached to this bug report, but I think I can respond anyway. The "mem-reserve=4G" property setting does not do what you think it does. * First, this property controls the non-prefetchable MMIO reservation size. Non-prefetchable MMIO can only be allocated in 32-bit address space. Given that the size of the entire 32-bit address space is 4GB -- and that includes the low RAM as well, which is 2GB for Q35 --, it makes no sense to reserve 4GB there. * Second, in technical terms, the "mem-reserve" property is exposed to the guest in a uint32_t field; see "include/hw/pci/pci_bridge.h" in the QEMU tree: typedef struct PCIBridgeQemuCap { ... uint32_t mem; /* Non-prefetchable memory to reserve */ ... } PCIBridgeQemuCap; Setting it to 2^32 means setting it to 0. And when OVMF sees 0 in this field, OVMF interprets the request as "no reservation needed" -- it disables the (otherwise default) 2MB non-prefetchable MMIO reservation [OvmfPkg/PciHotPlugInitDxe/PciHotPlugInit.c]: // // (c) Reserve non-prefetchable MMIO space (32-bit only). // switch (ReservationHint.NonPrefetchable32BitMmio) { case 0: // // No reservation needed, disable our built-in. // DefaultMmio = FALSE; break; Comment 0 is very long, and I can't discern the exact use case / goal. To my current understanding, this is NOTABUG for OVMF. We can either close the BZ as such, or else please repeat the test with a *lot* smaller non-prefetchable MMIO reservation (a few tens or maybe hundreds of MBs). --------*-------- ... If we are being very strict, we can consider this a bug in QEMU. The property is defined like this, in "hw/pci-bridge/gen_pcie_root_port.c": DEFINE_PROP_SIZE("mem-reserve", GenPCIERootPort, mem_reserve, -1), and the underlying property member is: typedef struct GenPCIERootPort { ... uint64_t mem_reserve; ... }; However, the mem_reserve member is truncated silently to 32 bits when setting up the reservation hint capability in config space, in the gen_rp_realize() function: int rc = pci_bridge_qemu_reserve_cap_init(d, 0, grp->bus_reserve, grp->io_reserve, grp->mem_reserve, grp->pref32_reserve, grp->pref64_reserve, errp); Because, the 5. parameter is declared as "uint32_t mem_non_pref_reserve". So, if we want QEMU to reject the invalid "mem-reserve=4G" property up-front, then we should change the parameter's type to uint64_t in pci_bridge_qemu_reserve_cap_init(), and use error_setg() if the value is out of the 32-bit range. Setting NEEDINFO for: - reassignment to QEMU (dependent on what Marcel thinks about the property parsing, see above), - clarification of the use case and re-trial with a much smaller reservation size.