Bug 1888677 - VM stuck when started with q35 + UEFI + maxmem >= 16 GB
Summary: VM stuck when started with q35 + UEFI + maxmem >= 16 GB
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.3
Assignee: Eduardo Habkost
QA Contact: jingzhao
URL:
Whiteboard:
Depends On:
Blocks: 1885632
TreeView+ depends on / blocked
 
Reported: 2020-10-15 13:32 UTC by Milan Zamazal
Modified: 2021-01-26 20:19 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
QEMU command line (6.47 KB, text/plain)
2020-10-15 13:32 UTC, Milan Zamazal
no flags Details

Description Milan Zamazal 2020-10-15 13:32:22 UTC
Created attachment 1721842 [details]
QEMU command line

Description of problem:

When a VM is started with q35 chipset, UEFI BIOS and maxmem at least 16 GB on a certain hardware, it starts but it doesn't boot.

Version-Release number of selected component (if applicable):

qemu-kvm-5.1.0-10.module+el8.3.0+8254+568ca30d.x86_64
kernel-4.18.0-240.el8.x86_64

How reproducible:

It was observed and is always reproducible only on two machine with Intel(R) Xeon(R) CPU E3-1230 V2 and 8 GB RAM. It couldn't be reproduced elsewhere.

Steps to Reproduce:
1. Start a VM with an installed guest OS from RHV, see the attached qemu-kvm command line.
2. Connect to the VM using SPICE -- the VM is stuck with a black screen and doesn't boot. Pinging the VM also doesn't work.

Actual results:

The VM gets stuck immediately after starting, it doesn't reach even bootloader screen and qemu-kvm process consumes 100% CPU.

Expected results:

The VM starts normally.

Additional info:

When maxmem is reduced to e.g. 12 GB, the VM starts normally on the same machine. With a non-UEFI BIOS and 16 GB maxmem, it starts and reaches BIOS. When the same VM is started the same way on a different kind of host with the same amount of RAM, it starts normally.

Comment 4 John Ferlan 2020-10-27 11:52:29 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Looks to be some sort of specific machine type and memory size type issue. Not clear what model(s) would allow boot to continue or whether a specific change to some parameter(s) would make a difference.

Comment 6 Eduardo Habkost 2020-11-10 19:58:39 UTC
This is weirdly similar to the issues we had when trying to go beyond ~700 VCPUs when testing the BZs related to bug 1788991.  I will investigate.

Comment 7 Milan Zamazal 2021-01-04 12:58:18 UTC
Eduardo, did you find out something?

Comment 8 Eduardo Habkost 2021-01-26 14:00:19 UTC
Issue seems unrelated to the ones on bug 1788991.  The CPUs where this bug can be reproduced have a small physical address size (36 bits), and I believe that's the root cause.

Laszlo, any suggestion on where to look?  Do you think OVMF might be using more than 36 bits of physical address space somehow?

Comment 9 Laszlo Ersek 2021-01-26 20:19:03 UTC
(1) General correction for the QEMU command line (not related to this particular symptom, but required for actually securing Secure Boot) -- the following option *must* be appended:

  -global driver=cfi.pflash01,property=secure,value=on \

(2) Regarding the specific symptom, please capture the OVMF debug log, and attach it to this BZ. QEMU options for that:

  -chardev file,id=debugfile,path=ovmf.log \
  -device isa-debugcon,iobase=0x402,chardev=debugfile \

Once you have the OVMF debug log attached, please set needinfo on me again. Thanks.


Note You need to log in before you can comment on or make changes to this bug.