1888677 – VM stuck when started with q35 + UEFI + maxmem >= 16 GB

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1888677 - VM stuck when started with q35 + UEFI + maxmem >= 16 GB

Summary: VM stuck when started with q35 + UEFI + maxmem >= 16 GB

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Eduardo Habkost
QA Contact:	jingzhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1885632
TreeView+	depends on / blocked

Reported:	2020-10-15 13:32 UTC by Milan Zamazal
Modified:	2021-10-10 23:18 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-09-09 10:48:26 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
QEMU command line (6.47 KB, text/plain) 2020-10-15 13:32 UTC, Milan Zamazal	no flags	Details
View All

Description Milan Zamazal 2020-10-15 13:32:22 UTC

Created attachment 1721842 [details]
QEMU command line

Description of problem:

When a VM is started with q35 chipset, UEFI BIOS and maxmem at least 16 GB on a certain hardware, it starts but it doesn't boot.

Version-Release number of selected component (if applicable):

qemu-kvm-5.1.0-10.module+el8.3.0+8254+568ca30d.x86_64
kernel-4.18.0-240.el8.x86_64

How reproducible:

It was observed and is always reproducible only on two machine with Intel(R) Xeon(R) CPU E3-1230 V2 and 8 GB RAM. It couldn't be reproduced elsewhere.

Steps to Reproduce:
1. Start a VM with an installed guest OS from RHV, see the attached qemu-kvm command line.
2. Connect to the VM using SPICE -- the VM is stuck with a black screen and doesn't boot. Pinging the VM also doesn't work.

Actual results:

The VM gets stuck immediately after starting, it doesn't reach even bootloader screen and qemu-kvm process consumes 100% CPU.

Expected results:

The VM starts normally.

Additional info:

When maxmem is reduced to e.g. 12 GB, the VM starts normally on the same machine. With a non-UEFI BIOS and 16 GB maxmem, it starts and reaches BIOS. When the same VM is started the same way on a different kind of host with the same amount of RAM, it starts normally.

Comment 4 John Ferlan 2020-10-27 11:52:29 UTC

Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Looks to be some sort of specific machine type and memory size type issue. Not clear what model(s) would allow boot to continue or whether a specific change to some parameter(s) would make a difference.

Comment 6 Eduardo Habkost 2020-11-10 19:58:39 UTC

This is weirdly similar to the issues we had when trying to go beyond ~700 VCPUs when testing the BZs related to bug 1788991.  I will investigate.

Comment 7 Milan Zamazal 2021-01-04 12:58:18 UTC

Eduardo, did you find out something?

Comment 8 Eduardo Habkost 2021-01-26 14:00:19 UTC

Issue seems unrelated to the ones on bug 1788991.  The CPUs where this bug can be reproduced have a small physical address size (36 bits), and I believe that's the root cause.

Laszlo, any suggestion on where to look?  Do you think OVMF might be using more than 36 bits of physical address space somehow?

Comment 9 Laszlo Ersek 2021-01-26 20:19:03 UTC

(1) General correction for the QEMU command line (not related to this particular symptom, but required for actually securing Secure Boot) -- the following option *must* be appended:

  -global driver=cfi.pflash01,property=secure,value=on \

(2) Regarding the specific symptom, please capture the OVMF debug log, and attach it to this BZ. QEMU options for that:

  -chardev file,id=debugfile,path=ovmf.log \
  -device isa-debugcon,iobase=0x402,chardev=debugfile \

Once you have the OVMF debug log attached, please set needinfo on me again. Thanks.

Comment 10 John Ferlan 2021-09-08 19:08:22 UTC

Bulk update: Move RHEL-AV bugs to RHEL8

Comment 11 Laszlo Ersek 2021-09-09 10:48:26 UTC

No update in ~7 months after I requested the OVMF debug log, closing as "insufficient data". Reopen if necessary please (with the debug log provided).

Note You need to log in before you can comment on or make changes to this bug.