Bug 1888677

Summary: VM stuck when started with q35 + UEFI + maxmem >= 16 GB
Product: Red Hat Enterprise Linux 8 Reporter: Milan Zamazal <mzamazal>
Component: qemu-kvmAssignee: Eduardo Habkost <ehabkost>
qemu-kvm sub component: Machine Types QA Contact: jingzhao <jinzhao>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: medium    
Priority: medium CC: coli, ehabkost, jinzhao, juzhang, lersek, virt-maint, yiwei
Version: 8.3Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-09 10:48:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1885632    
Attachments:
Description Flags
QEMU command line none

Description Milan Zamazal 2020-10-15 13:32:22 UTC
Created attachment 1721842 [details]
QEMU command line

Description of problem:

When a VM is started with q35 chipset, UEFI BIOS and maxmem at least 16 GB on a certain hardware, it starts but it doesn't boot.

Version-Release number of selected component (if applicable):

qemu-kvm-5.1.0-10.module+el8.3.0+8254+568ca30d.x86_64
kernel-4.18.0-240.el8.x86_64

How reproducible:

It was observed and is always reproducible only on two machine with Intel(R) Xeon(R) CPU E3-1230 V2 and 8 GB RAM. It couldn't be reproduced elsewhere.

Steps to Reproduce:
1. Start a VM with an installed guest OS from RHV, see the attached qemu-kvm command line.
2. Connect to the VM using SPICE -- the VM is stuck with a black screen and doesn't boot. Pinging the VM also doesn't work.

Actual results:

The VM gets stuck immediately after starting, it doesn't reach even bootloader screen and qemu-kvm process consumes 100% CPU.

Expected results:

The VM starts normally.

Additional info:

When maxmem is reduced to e.g. 12 GB, the VM starts normally on the same machine. With a non-UEFI BIOS and 16 GB maxmem, it starts and reaches BIOS. When the same VM is started the same way on a different kind of host with the same amount of RAM, it starts normally.

Comment 4 John Ferlan 2020-10-27 11:52:29 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Looks to be some sort of specific machine type and memory size type issue. Not clear what model(s) would allow boot to continue or whether a specific change to some parameter(s) would make a difference.

Comment 6 Eduardo Habkost 2020-11-10 19:58:39 UTC
This is weirdly similar to the issues we had when trying to go beyond ~700 VCPUs when testing the BZs related to bug 1788991.  I will investigate.

Comment 7 Milan Zamazal 2021-01-04 12:58:18 UTC
Eduardo, did you find out something?

Comment 8 Eduardo Habkost 2021-01-26 14:00:19 UTC
Issue seems unrelated to the ones on bug 1788991.  The CPUs where this bug can be reproduced have a small physical address size (36 bits), and I believe that's the root cause.

Laszlo, any suggestion on where to look?  Do you think OVMF might be using more than 36 bits of physical address space somehow?

Comment 9 Laszlo Ersek 2021-01-26 20:19:03 UTC
(1) General correction for the QEMU command line (not related to this particular symptom, but required for actually securing Secure Boot) -- the following option *must* be appended:

  -global driver=cfi.pflash01,property=secure,value=on \

(2) Regarding the specific symptom, please capture the OVMF debug log, and attach it to this BZ. QEMU options for that:

  -chardev file,id=debugfile,path=ovmf.log \
  -device isa-debugcon,iobase=0x402,chardev=debugfile \

Once you have the OVMF debug log attached, please set needinfo on me again. Thanks.

Comment 10 John Ferlan 2021-09-08 19:08:22 UTC
Bulk update: Move RHEL-AV bugs to RHEL8

Comment 11 Laszlo Ersek 2021-09-09 10:48:26 UTC
No update in ~7 months after I requested the OVMF debug log, closing as "insufficient data". Reopen if necessary please (with the debug log provided).