Bug 2075486 - VM with Q35 UEFI and 64 CPUs is running but without boot screen, console and network.
Summary: VM with Q35 UEFI and 64 CPUs is running but without boot screen, console and ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.5.0-1
: 4.5.0.5
Assignee: Milan Zamazal
QA Contact: Nisim Simsolo
URL:
Whiteboard:
: 2074149 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-14 11:07 UTC by Nisim Simsolo
Modified: 2022-05-03 06:46 UTC (History)
6 users (show)

Fixed In Version: ovirt-engine-4.5.0.5
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-03 06:46:58 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.5?
lsvaty: blocker+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 293 0 None Draft Handle many vCPUs 2022-04-20 12:11:26 UTC
Github oVirt ovirt-engine pull 315 0 None open Handle many vCPUs -- backport 2022-04-26 11:01:53 UTC
Red Hat Issue Tracker RHV-45667 0 None None None 2022-04-14 11:08:26 UTC

Description Nisim Simsolo 2022-04-14 11:07:46 UTC
Description of problem:
- When running RHV VM with Q35/UEFI and more than 48 CPUs (RHV host is with 64 CPUs), there is no VM console (black screen and no TianoCore screen when booting) and there's no IP address for that VM. 
- When running the same VM, this time reducing CPUs to 48, the VM is running properly with console. network address etc.
- running VM with Q35/BIOS with 64 CPUs is also running properly with console and network address.
- this issue reproduced also on a different setup with different host, but there the issue encountered with more than 28 CPUs (RHV host with 48 CPUs, ).

QEMU bug related to this issue: https://bugzilla.redhat.com/show_bug.cgi?id=2074149

Version-Release number of selected component (if applicable):
ovirt-engine-4.5.0-0.237.el8ev
vdsm-4.50.0.10-1.el8ev.x86_64
qemu-kvm-6.2.0-11.module+el8.6.0+14707+5aa4b42d.x86_64
libvirt-daemon-8.0.0-5.module+el8.6.0+14480+c0a3aa0f.x86_64

How reproducible:
100%

Steps to Reproduce:
1. using host with 64 CPUs (2 virtual sockets, 16 cores per socket and 2 threads per core), try to run VM with 64 CPUs and Q35 chipset with UEFI.
2. 
3.

Actual results:
VM is running, but no boot screen (TianoCore), console is black and no IP addresses.

Expected results:
VM should boot with console, network etc.

Additional info:
vdsm.log attached (VM with 48 CPUs run at 2022-04-11 10:51:42,151-0400, VM with 64 CPUs run at 2022-04-11 10:58:01,881-0400)
engine.log, libvirt/qemu.log and VMs domain XML attached.
VM names: rhel8_VM_48_CPUs and rhel8_VM_64_CPUs

Comment 7 Milan Zamazal 2022-04-14 14:26:13 UTC
Platform discussion happens in BZ 2074149. There is also a nice explanatory summary in https://bugzilla.redhat.com/show_bug.cgi?id=1469338#c30.

What we need to do in oVirt is:

- Not setting the maximum number of vCPUs unnecessarily high. Let's make it at most some configurable multiple of the initial vCPUs number (somewhat similar to how we limit maximum memory).

- Adding 

  <features>
    <smm state='on'>
      <tseg unit='MiB'>48</tseg>
    </smm>
  </features>

  to the domain XML when the maximum number of vCPUs exceeds certain limit (e.g. 255?) and UEFI is used (see also `smm' doc in https://libvirt.org/formatdomain.html#hypervisor-features). We may hit a similar problem with large RAM.

This is sort of guesswork but hopefully it should cover most of the typical cases.

Comment 8 Laszlo Ersek 2022-04-18 15:40:58 UTC
*** Bug 2074149 has been marked as a duplicate of this bug. ***

Comment 9 Michal Skrivanek 2022-04-19 14:28:59 UTC
it doesn't sound like too much of an overhead, why not just setting it high enough to cover our maximums?

Comment 10 Milan Zamazal 2022-04-19 14:48:16 UTC
You meant TSEG? Let's not be nasty to small VMs. If I understand the libvirt documentation correctly then this value is taken from the RAM available to the guest and there are other things that eat some memory from the guest memory here and there. There is no need to waste it unnecessarily here in case of small VMs.

Comment 17 Nisim Simsolo 2022-05-02 12:09:49 UTC
Verified:
ovirt-engine-4.5.0.5-0.7.el8ev
vdsm-4.50.0.13-1.el8ev.x86_64
qemu-kvm-6.2.0-11.module+el8.6.0+14707+5aa4b42d.x86_64
libvirt-daemon-8.0.0-5.module+el8.6.0+14480+c0a3aa0f.x86_64

Verification scenario:
1. Run the next VMs with 64 virtual CPUs (2 virtual sockets, 16 cores per virtual socket and 2 threads per core):
- RHEL8 VM Q35/SecureBoot
- RHEL8 VM Q35/UEFI
- RHEL8 VM Q35/BIOS
- RHEL9 VM Q35/SecureBoot
- RHEL9 VM Q35/UEFI
- RHEL9 VM Q35/BIOS
- RHEL8 VM I400FX/BIOS
- Windows VM Q35/SecureBoot
- Windows VM Q35/UEFI
- Windows VM Q35/BIOS
- Windows VM I440FX/BIOS
2. For each VM running, verify console is showing boot screen and after boot it shows VM OS, mouse and keyboard are functioning and inside the VM verify that the correct CPUs (64, 2/16/2) are set.


Note You need to log in before you can comment on or make changes to this bug.