Bug 2113840
Summary: | [RHEL9.2] Memory mapping optimization for virt machine | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Zhenyu Zhang <zhenyzha> |
Component: | qemu-kvm | Assignee: | Guowen Shan <gshan> |
qemu-kvm sub component: | General | QA Contact: | Zhenyu Zhang <zhenyzha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | ailan, alexander.lougovski, cohuck, coli, eauger, eric.auger, gshan, jinzhao, juzhang, lijin, virt-maint, yihyu |
Version: | 9.1 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | 9.2 | ||
Hardware: | aarch64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-7.2.0-3.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-05-09 07:20:04 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | qemu v8.0 |
Embargoed: |
Description
Zhenyu Zhang
2022-08-02 06:31:16 UTC
I took a close look into the implementation of address assignment for those 3 high memory regions. I don't think the implementation is correct enough. Several issues are existing there. - For one particular high memory region, it can be disabled by user or due to IPA limit. However, the address range for the high memory region is always counted no matter if it's disabled. - One particular high memory region can be disabled sliently due to IPA limit. It's not good and I believe we need warn_report() to warn users. I will propose something to improve it for upstream qemu. (In reply to Guowen Shan from comment #1) > I took a close look into the implementation of address assignment for > those 3 high memory regions. I don't think the implementation is correct > enough. Several issues are existing there. > > - For one particular high memory region, it can be disabled by user > or due to IPA limit. However, the address range for the high memory > region is always counted no matter if it's disabled. As soon as an high mem region does not fit vms->highest_gpa is not increased anymore. redists and high mmio cannot be disabled. Only ecam can be disabled. But for server configs it won't be. I wonder if it is worth to take care of that marginal case. > > - One particular high memory region can be disabled sliently due to > IPA limit. It's not good and I believe we need warn_report() to > warn users. don't we have errors on guest? The problem is if we add warnings some other users will complain that some warnings are emitted whereas they do not care. arm virt does not commit to provide those memory ranges. It does if the IPA range is large enough. > > I will propose something to improve it for upstream qemu. (In reply to Eric Auger from comment #2) VIRT_HIGH_PCIE_ECAM isn't big concern here, but why VIRT_HIGH_{GIC_REDIST2, PCIE_MMIO} can't be disabled? GIC_REDIST2 covers PPI and SGI and its address space depends on number of available CPUs. VIRT_GIC_REDIST has 2*64kB*123 in size, meaning we needn't VIRT_HIGH_GIC_REDIST2 if number of vCPUs doesn't exceeds 123. VIRT_PCIE_MMIO has ~508MB in size. If we needn't larger PCI MMIO space, VIRT_HIGH_PCIE_MMIO can be disabled? Yes, we should see error messages on guest side when we're running out PCI MMIO space to accomodate (backup) one particular 64-bits PCI BAR. There are other causes for it. Without the warning message provided by QEMU, users or developers need to pull the details, to figure out the fact that qemu doesn't provide enough PCI memory space. So I think it's worthy to have this sort of messages in QEMU. However, it's not expected to see this sort of messages frquently because all the cases we're discussing are corner cases. (In reply to Guowen Shan from comment #3) > (In reply to Eric Auger from comment #2) > > VIRT_HIGH_PCIE_ECAM isn't big concern here, but why VIRT_HIGH_{GIC_REDIST2, > PCIE_MMIO} > can't be disabled? > > GIC_REDIST2 covers PPI and SGI and its address space depends on number of > available CPUs. VIRT_GIC_REDIST has 2*64kB*123 in size, meaning we needn't > VIRT_HIGH_GIC_REDIST2 if number of vCPUs doesn't exceeds 123. > > VIRT_PCIE_MMIO has ~508MB in size. If we needn't larger PCI MMIO space, > VIRT_HIGH_PCIE_MMIO can be disabled? > > Yes, we should see error messages on guest side when we're running out > PCI MMIO space to accomodate (backup) one particular 64-bits PCI BAR. > There are other causes for it. Without the warning message provided > by QEMU, users or developers need to pull the details, to figure out > the fact that qemu doesn't provide enough PCI memory space. So I think > it's worthy to have this sort of messages in QEMU. However, it's not > expected to see this sort of messages frquently because all the cases > we're discussing are corner cases. Well the arm virt address space is not supposed to be that 'dynamic'. Of course you could introduce options to disable redist2 or mmio but it would add extra complexity. Personally I don't think it is worth. If you want more RAM on guest, I think it is fair to require the host to support a larger IPA space. (In reply to Eric Auger from comment #4) > > [...] > > Well the arm virt address space is not supposed to be that 'dynamic'. Of > course you could introduce options to disable redist2 or mmio but it would > add extra complexity. Personally I don't think it is worth. If you want more > RAM on guest, I think it is fair to require the host to support a larger IPA > space. > Thanks, Eric. It's also what I thought. It will introduce extra complexity, degrading user use experience, migration compatibility issue. I don't want to add options to disable redist2 and mmio regions. As we discussed for the upstream community, I won't add the warning messages when redist2/mmio/ecam regions are disabled. So lets use this bug to track the optimization, or we can close it as NOTABUG. OK. Nevertheless that's a good point that silently skipping highmem regions is not the ideal solution and may lead to some customer complaints. Maybe we shall start thinking at documenting that kind of stuff somewhere, maybe in "RHEL stories" or articles for potential customer not to be trapped on such case. We have not written anything of that kind anywhere. Let's investigate what is the best form to document such kind of stuff. This is actually upstream work, meaning we need come up with something for upstream first of all. Besides, we'd like to use this bug to track the memory mapping optimization for virt machine, instead of adding warning mesages when high memory regions are disabled. So the subject is changed accordingly to make it indicative. If the upstream work completes/merged before qemu-7.2, then feel free to use the qemu-7.2 rebase bug 2135806 as the dependent bz, update the devel whiteboard with a message like resolved by upstream qemu-7.2 commit id ###<hash>###, and of course move to POST. The latest series (v7) gains the needed reviews, but it won't be merged to upstream QEMU v7.2 because it has been in frozen state. It means I need to post v8 to enable 'compact-highmem' for virt-8.0 machine type when upstream QEMU 8.0 is ready. (v7): https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00693.html The series has been merged to upstream qemu v8.0 6a48c64eec hw/arm/virt: Add properties to disable high memory regions f40408a9fe hw/arm/virt: Add 'compact-highmem' property 4a4ff9edc6 hw/arm/virt: Improve high memory region address assignment a5cb1350b1 hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper fa245799b9 hw/arm/virt: Introduce variable region_base in virt_set_high_memmap() 370bea9d1c hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap() 4af6b6edec hw/arm/virt: Introduce virt_set_high_memmap() helper I will post MR to backport them after RHEL9.2.0 machine type is available in our downstream QEMU, which is handled by MR-241. MR-241 needs some time to be merged as Mirek said. The code as expected: # /usr/libexec/qemu-kvm -version QEMU emulator version 7.2.0 (qemu-kvm-7.2.0-1.el9.gwshan202212170657) with -machine virt-rhel9.2.0 1) with -machine virt-rhel9.2.0,compact-highmem=on,highmem-ecam=off,highmem-mmio=on,highmem_redists=off # lspci -vvvvs 06:00.0 | grep "Region " Region 1: Memory at 10800000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at 8000800000 (64-bit, prefetchable) [size=16K] 2) with -machine virt-rhel9.2.0,compact-highmem=off,highmem-ecam=off,highmem-mmio=on,highmem_redists=off # lspci -vvvvs 06:00.0 | grep "Region " Region 1: Memory at 11000000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at 11200000 (64-bit, prefetchable) [size=16K] 3) with -machine virt-rhel9.2.0,compact-highmem=on,highmem-ecam=off,highmem-mmio=off,highmem_redists=off Region 1: Memory at 11000000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at 11200000 (64-bit, prefetchable) [size=16K] with -machine virt-rhel9.0.0 1) with -machine virt-rhel9.0.0,compact-highmem=on,highmem-ecam=off,highmem-mmio=on,highmem_redists=off # lspci -vvvvs 06:00.0 | grep "Region " Region 1: Memory at 10800000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at 8000800000 (64-bit, prefetchable) [size=16K] 2) with -machine virt-rhel9.0.0,compact-highmem=off,highmem-ecam=off,highmem-mmio=on,highmem_redists=off # lspci -vvvvs 06:00.0 | grep "Region " Region 1: Memory at 11000000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at 11200000 (64-bit, prefetchable) [size=16K] 3) with -machine virt-rhel9.0.0,compact-highmem=on,highmem-ecam=off,highmem-mmio=off,highmem_redists=off # lspci -vvvvs 06:00.0 | grep "Region " Region 1: Memory at 11000000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at 11200000 (64-bit, prefetchable) [size=16K] And I test the hotplug memory and migration then do unplug memory cross-test all pass. 1) With 'compact-highmem=on,highmem-ecam=off,highmem-mmio=on,highmem_redists=off' options 2) With 'compact-highmem=on,highmem-ecam=off,highmem-mmio=on,highmem_redists=off' options The overall test looks relatively stable, which is ok for me. [root@ampere-mtsnow-altramax-15 ~]# /usr/libexec/qemu-kvm -version QEMU emulator version 7.2.0 (qemu-kvm-7.2.0-3.el9) Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers [root@ampere-mtsnow-altramax-15 ~]# /usr/libexec/qemu-kvm -cpu host -machine virt-rhel9.2.0,? | grep mem dump-guest-core=<bool> - Include guest memory in a core dump highmem=<bool> - Set on/off to enable/disable using physical address space above 32 bits mem-merge=<bool> - Enable/disable memory merge support memory-backend=<link<memory-backend>> - Set RAM backendValid value is ID of hostmem based backend memory-encryption=<string> - Set memory encryption object to use memory=<MemorySizeConfiguration> - Memory size configuration ras=<bool> - Set on/off to enable/disable reporting host memory errors to a KVM guest using ACPI and guest external abort exceptions The following options are all missing: compact-highmem, highmem-ecam, highmem-mmio, highmem_redists Test results are not as expected Change status to assigned according to comment17 After a discussion with gavin, Currently in downstream, those properties are hidden. So the following test results are as expected So restore bug status to modified On IPA 40 host: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \ -blockdev node-name=file_aavmf_vars,driver=file,filename=/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel920-aarch64-virtio-scsi_qcow2_filesystem_VARS.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \ -machine virt-rhel9.0.0,gic-version=host,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -m 4096,maxmem=511G,slots=4 \ -object memory-backend-ram,size=2048M,id=mem-memN0 \ -object memory-backend-ram,size=2048M,id=mem-memN1 \ -smp 4,maxcpus=4,cores=2,threads=1,clusters=1,sockets=2 \ -numa node,memdev=mem-memN0,nodeid=0,cpus=0-1 \ -numa node,memdev=mem-memN1,nodeid=1,cpus=2-3 \ -cpu 'host' \ -serial unix:'/var/tmp/serial-serial0',server=on,wait=off \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel920-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \ -device virtio-net-pci,mac=9a:58:24:6b:36:9d,id=idupBJUC,netdev=idBh7vgm,bus=pcie-root-port-4,addr=0x0 \ -netdev tap,id=idBh7vgm,vhost=on \ -vnc :20 \ -enable-kvm \ -monitor stdio with -machine virt-rhel9.2.0 -m 4096,maxmem=512G lspci -vvvvs 05:00.0 | grep "Region" Region 1: Memory at 10c00000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at 10e00000 (64-bit, prefetchable) [size=16K] with -machine virt-rhel9.0.0 -m 4096,maxmem=512G lspci -vvvvs 05:00.0 | grep "Region" Region 1: Memory at 10c00000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at 10e00000 (64-bit, prefetchable) [size=16K] with -machine virt-rhel9.2.0 -m 4096,maxmem=511G lspci -vvvvs 05:00.0 | grep "Region" Region 1: Memory at 10c00000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at 10e00000 (64-bit, prefetchable) [size=16K] with -machine virt-rhel9.0.0 -m 4096,maxmem=511G lspci -vvvvs 05:00.0 | grep "Region" Region 1: Memory at 10c00000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at 10e00000 (64-bit, prefetchable) [size=16K] Change bug status to VERIFIED according to comment 19 Change bug status to VERIFIED according to comment 19 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2162 |