RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2113840 - [RHEL9.2] Memory mapping optimization for virt machine
Summary: [RHEL9.2] Memory mapping optimization for virt machine
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.1
Hardware: aarch64
OS: Linux
low
low
Target Milestone: rc
: 9.2
Assignee: Guowen Shan
QA Contact: Zhenyu Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-02 06:31 UTC by Zhenyu Zhang
Modified: 2023-05-09 07:43 UTC (History)
12 users (show)

Fixed In Version: qemu-kvm-7.2.0-3.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-09 07:20:04 UTC
Type: ---
Target Upstream Version: qemu v8.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src qemu-kvm merge_requests 126 0 None opened hw/arm/virt: Optimize high memory region address assignment 2022-12-19 05:07:57 UTC
Red Hat Issue Tracker RHELPLAN-129865 0 None None None 2022-08-02 06:41:42 UTC
Red Hat Product Errata RHSA-2023:2162 0 None None None 2023-05-09 07:20:28 UTC

Description Zhenyu Zhang 2022-08-02 06:31:16 UTC
Description of problem:
When one specific high memory region is disabled due to the PA limit, it'd better to warn user about that. The warning messages help to identify the cause in some cases.

Version-Release number of selected component (if applicable):
qemu-kvm-7.0.0-9.el9


Actual results:
no error message displayed

Expected results:
qemu-kvm: Disabled PCIE_MMIO high memory region due to PA limit

Comment 1 Guowen Shan 2022-08-02 06:47:48 UTC
I took a close look into the implementation of address assignment for
those 3 high memory regions. I don't think the implementation is correct
enough. Several issues are existing there. 

- For one particular high memory region, it can be disabled by user
  or due to IPA limit. However, the address range for the high memory
  region is always counted no matter if it's disabled.

- One particular high memory region can be disabled sliently due to
  IPA limit. It's not good and I believe we need warn_report() to
  warn users.

I will propose something to improve it for upstream qemu.

Comment 2 Eric Auger 2022-08-02 10:59:55 UTC
(In reply to Guowen Shan from comment #1)
> I took a close look into the implementation of address assignment for
> those 3 high memory regions. I don't think the implementation is correct
> enough. Several issues are existing there. 
> 
> - For one particular high memory region, it can be disabled by user
>   or due to IPA limit. However, the address range for the high memory
>   region is always counted no matter if it's disabled.

As soon as an high mem region does not fit vms->highest_gpa is not increased anymore. redists and high mmio cannot be disabled. Only ecam can be disabled. But for server configs it won't be. I wonder if it is worth to take care of that marginal case.

> 
> - One particular high memory region can be disabled sliently due to
>   IPA limit. It's not good and I believe we need warn_report() to
>   warn users.

don't we have errors on guest? The problem is if we add warnings some other users will complain that some warnings are emitted whereas they do not care. arm virt does not commit to provide those memory ranges. It does if the IPA range is large enough.

> 
> I will propose something to improve it for upstream qemu.

Comment 3 Guowen Shan 2022-08-03 00:40:02 UTC
(In reply to Eric Auger from comment #2)

VIRT_HIGH_PCIE_ECAM isn't big concern here, but why VIRT_HIGH_{GIC_REDIST2, PCIE_MMIO}
can't be disabled? 

GIC_REDIST2 covers PPI and SGI and its address space depends on number of
available CPUs. VIRT_GIC_REDIST has 2*64kB*123 in size, meaning we needn't
VIRT_HIGH_GIC_REDIST2 if number of vCPUs doesn't exceeds 123.

VIRT_PCIE_MMIO has ~508MB in size. If we needn't larger PCI MMIO space,
VIRT_HIGH_PCIE_MMIO can be disabled?

Yes, we should see error messages on guest side when we're running out
PCI MMIO space to accomodate (backup) one particular 64-bits PCI BAR.
There are other causes for it. Without the warning message provided
by QEMU, users or developers need to pull the details, to figure out
the fact that qemu doesn't provide enough PCI memory space. So I think
it's worthy to have this sort of messages in QEMU. However, it's not
expected to see this sort of messages frquently because all the cases
we're discussing are corner cases.

Comment 4 Eric Auger 2022-08-03 09:12:54 UTC
(In reply to Guowen Shan from comment #3)
> (In reply to Eric Auger from comment #2)
> 
> VIRT_HIGH_PCIE_ECAM isn't big concern here, but why VIRT_HIGH_{GIC_REDIST2,
> PCIE_MMIO}
> can't be disabled? 
> 
> GIC_REDIST2 covers PPI and SGI and its address space depends on number of
> available CPUs. VIRT_GIC_REDIST has 2*64kB*123 in size, meaning we needn't
> VIRT_HIGH_GIC_REDIST2 if number of vCPUs doesn't exceeds 123.
> 
> VIRT_PCIE_MMIO has ~508MB in size. If we needn't larger PCI MMIO space,
> VIRT_HIGH_PCIE_MMIO can be disabled?
> 
> Yes, we should see error messages on guest side when we're running out
> PCI MMIO space to accomodate (backup) one particular 64-bits PCI BAR.
> There are other causes for it. Without the warning message provided
> by QEMU, users or developers need to pull the details, to figure out
> the fact that qemu doesn't provide enough PCI memory space. So I think
> it's worthy to have this sort of messages in QEMU. However, it's not
> expected to see this sort of messages frquently because all the cases
> we're discussing are corner cases.

Well the arm virt address space is not supposed to be that 'dynamic'. Of course you could introduce options to disable redist2 or mmio but it would add extra complexity. Personally I don't think it is worth. If you want more RAM on guest, I think it is fair to require the host to support a larger IPA space.

Comment 5 Guowen Shan 2022-08-03 10:22:46 UTC
(In reply to Eric Auger from comment #4)
> 
> [...]
> 
> Well the arm virt address space is not supposed to be that 'dynamic'. Of
> course you could introduce options to disable redist2 or mmio but it would
> add extra complexity. Personally I don't think it is worth. If you want more
> RAM on guest, I think it is fair to require the host to support a larger IPA
> space.
>

Thanks, Eric. It's also what I thought. It will introduce extra complexity,
degrading user use experience, migration compatibility issue. I don't want
to add options to disable redist2 and mmio regions.

As we discussed for the upstream community, I won't add the warning messages
when redist2/mmio/ecam regions are disabled. So lets use this bug to track
the optimization, or we can close it as NOTABUG.

Comment 6 Eric Auger 2022-08-03 10:27:38 UTC
OK. Nevertheless that's a good point that silently skipping highmem regions is not the ideal solution and may lead to some customer complaints. Maybe we shall start thinking at documenting that kind of stuff somewhere, maybe in "RHEL stories" or articles for potential customer not to be trapped on such case. We have not written anything of that kind anywhere. Let's investigate what is the best form to document such kind of stuff.

Comment 7 Guowen Shan 2022-08-05 10:25:46 UTC
This is actually upstream work, meaning we need come up with something
for upstream first of all. Besides, we'd like to use this bug to
track the memory mapping optimization for virt machine, instead of
adding warning mesages when high memory regions are disabled. So the
subject is changed accordingly to make it indicative.

Comment 9 John Ferlan 2022-10-18 16:15:44 UTC
If the upstream work completes/merged before qemu-7.2, then feel free to use the qemu-7.2 rebase bug 2135806 as the dependent bz, update the devel whiteboard with a message like resolved by upstream qemu-7.2 commit id ###<hash>###, and of course move to POST.

Comment 10 Guowen Shan 2022-11-11 06:32:59 UTC
The latest series (v7) gains the needed reviews, but it won't be merged
to upstream QEMU v7.2 because it has been in frozen state. It means I
need to post v8 to enable 'compact-highmem' for virt-8.0 machine type
when upstream QEMU 8.0 is ready.

  (v7): https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00693.html

Comment 13 Guowen Shan 2022-12-16 10:56:00 UTC
The series has been merged to upstream qemu v8.0

  6a48c64eec hw/arm/virt: Add properties to disable high memory regions
  f40408a9fe hw/arm/virt: Add 'compact-highmem' property
  4a4ff9edc6 hw/arm/virt: Improve high memory region address assignment
  a5cb1350b1 hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
  fa245799b9 hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
  370bea9d1c hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()
  4af6b6edec hw/arm/virt: Introduce virt_set_high_memmap() helper

I will post MR to backport them after RHEL9.2.0 machine type is available in
our downstream QEMU, which is handled by MR-241. MR-241 needs some time to
be merged as Mirek said.

Comment 14 Zhenyu Zhang 2022-12-19 07:59:30 UTC
The code as expected:
# /usr/libexec/qemu-kvm -version
QEMU emulator version 7.2.0 (qemu-kvm-7.2.0-1.el9.gwshan202212170657)

with -machine virt-rhel9.2.0
1) with -machine
virt-rhel9.2.0,compact-highmem=on,highmem-ecam=off,highmem-mmio=on,highmem_redists=off
# lspci -vvvvs 06:00.0 | grep "Region "
Region 1: Memory at 10800000 (32-bit, non-prefetchable) [size=4K]
Region 4: Memory at 8000800000 (64-bit, prefetchable) [size=16K]
2) with -machine
virt-rhel9.2.0,compact-highmem=off,highmem-ecam=off,highmem-mmio=on,highmem_redists=off
# lspci -vvvvs 06:00.0 | grep "Region "
Region 1: Memory at 11000000 (32-bit, non-prefetchable) [size=4K]
Region 4: Memory at 11200000 (64-bit, prefetchable) [size=16K]
3) with -machine
virt-rhel9.2.0,compact-highmem=on,highmem-ecam=off,highmem-mmio=off,highmem_redists=off
Region 1: Memory at 11000000 (32-bit, non-prefetchable) [size=4K]
Region 4: Memory at 11200000 (64-bit, prefetchable) [size=16K]

with -machine virt-rhel9.0.0
1) with -machine
virt-rhel9.0.0,compact-highmem=on,highmem-ecam=off,highmem-mmio=on,highmem_redists=off
# lspci -vvvvs 06:00.0 | grep "Region "
Region 1: Memory at 10800000 (32-bit, non-prefetchable) [size=4K]
Region 4: Memory at 8000800000 (64-bit, prefetchable) [size=16K]
2) with -machine
virt-rhel9.0.0,compact-highmem=off,highmem-ecam=off,highmem-mmio=on,highmem_redists=off
# lspci -vvvvs 06:00.0 | grep "Region "
Region 1: Memory at 11000000 (32-bit, non-prefetchable) [size=4K]
Region 4: Memory at 11200000 (64-bit, prefetchable) [size=16K]
3) with -machine
virt-rhel9.0.0,compact-highmem=on,highmem-ecam=off,highmem-mmio=off,highmem_redists=off
# lspci -vvvvs 06:00.0 | grep "Region "
Region 1: Memory at 11000000 (32-bit, non-prefetchable) [size=4K]
Region 4: Memory at 11200000 (64-bit, prefetchable) [size=16K]

And I test the hotplug memory and migration then do unplug memory
cross-test all pass.
1) With 'compact-highmem=on,highmem-ecam=off,highmem-mmio=on,highmem_redists=off'
options
2) With 'compact-highmem=on,highmem-ecam=off,highmem-mmio=on,highmem_redists=off'
options

The overall test looks relatively stable, which is ok for me.

Comment 17 Zhenyu Zhang 2023-01-05 03:48:40 UTC
[root@ampere-mtsnow-altramax-15 ~]# /usr/libexec/qemu-kvm -version
QEMU emulator version 7.2.0 (qemu-kvm-7.2.0-3.el9)
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers


[root@ampere-mtsnow-altramax-15 ~]# /usr/libexec/qemu-kvm -cpu host -machine virt-rhel9.2.0,? | grep mem
  dump-guest-core=<bool> - Include guest memory in a core dump
  highmem=<bool>         - Set on/off to enable/disable using physical address space above 32 bits
  mem-merge=<bool>       - Enable/disable memory merge support
  memory-backend=<link<memory-backend>> - Set RAM backendValid value is ID of hostmem based backend
  memory-encryption=<string> - Set memory encryption object to use
  memory=<MemorySizeConfiguration> - Memory size configuration
  ras=<bool>             - Set on/off to enable/disable reporting host memory errors to a KVM guest using ACPI and guest external abort exceptions


The following options are all missing:
compact-highmem,
highmem-ecam,
highmem-mmio,
highmem_redists

Test results are not as expected

Comment 18 Zhenyu Zhang 2023-01-05 04:03:31 UTC
Change status to assigned according to comment17

Comment 19 Zhenyu Zhang 2023-01-05 05:57:50 UTC
After a discussion with gavin, 
Currently in downstream, those properties are hidden.
So the following test results are as expected
So restore bug status to modified

On IPA 40 host:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \
-blockdev node-name=file_aavmf_vars,driver=file,filename=/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel920-aarch64-virtio-scsi_qcow2_filesystem_VARS.fd,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \
-machine virt-rhel9.0.0,gic-version=host,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-m 4096,maxmem=511G,slots=4 \
-object memory-backend-ram,size=2048M,id=mem-memN0 \
-object memory-backend-ram,size=2048M,id=mem-memN1  \
-smp 4,maxcpus=4,cores=2,threads=1,clusters=1,sockets=2  \
-numa node,memdev=mem-memN0,nodeid=0,cpus=0-1  \
-numa node,memdev=mem-memN1,nodeid=1,cpus=2-3  \
-cpu 'host' \
-serial unix:'/var/tmp/serial-serial0',server=on,wait=off \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel920-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
-device virtio-net-pci,mac=9a:58:24:6b:36:9d,id=idupBJUC,netdev=idBh7vgm,bus=pcie-root-port-4,addr=0x0  \
-netdev tap,id=idBh7vgm,vhost=on  \
-vnc :20  \
-enable-kvm \
-monitor stdio 

with -machine virt-rhel9.2.0 -m 4096,maxmem=512G
 lspci -vvvvs 05:00.0 | grep "Region"
	Region 1: Memory at 10c00000 (32-bit, non-prefetchable) [size=4K]
	Region 4: Memory at 10e00000 (64-bit, prefetchable) [size=16K]

with -machine virt-rhel9.0.0 -m 4096,maxmem=512G
lspci -vvvvs 05:00.0 | grep "Region"
	Region 1: Memory at 10c00000 (32-bit, non-prefetchable) [size=4K]
	Region 4: Memory at 10e00000 (64-bit, prefetchable) [size=16K]

with -machine virt-rhel9.2.0 -m 4096,maxmem=511G
lspci -vvvvs 05:00.0 | grep "Region"
	Region 1: Memory at 10c00000 (32-bit, non-prefetchable) [size=4K]
	Region 4: Memory at 10e00000 (64-bit, prefetchable) [size=16K]

with -machine virt-rhel9.0.0 -m 4096,maxmem=511G
lspci -vvvvs 05:00.0 | grep "Region"
	Region 1: Memory at 10c00000 (32-bit, non-prefetchable) [size=4K]
	Region 4: Memory at 10e00000 (64-bit, prefetchable) [size=16K]

Comment 20 Zhenyu Zhang 2023-01-05 06:05:15 UTC
Change bug status to VERIFIED according to comment 19

Comment 24 Zhenyu Zhang 2023-01-10 12:34:30 UTC
Change bug status to VERIFIED according to comment 19

Comment 26 errata-xmlrpc 2023-05-09 07:20:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2162


Note You need to log in before you can comment on or make changes to this bug.