Bug 2124143

Summary: ovmf must consider max cpu count not boot cpu count for apic mode [rhel-9]
Product: Red Hat Enterprise Linux 9 Reporter: Yiqian Wei <yiwei>
Component: edk2Assignee: Gerd Hoffmann <kraxel>
Status: CLOSED ERRATA QA Contact: Xueqiang Wei <xuwei>
Severity: medium Docs Contact:
Priority: medium    
Version: 9.1CC: berrange, coli, imammedo, jinzhao, juzhang, kraxel, lersek, pbonzini, ppolawsk, vgoyal, virt-maint, xuwei
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: edk2-20230524-2.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:24:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yiqian Wei 2022-09-05 02:49:31 UTC
Description of problem:
Failed to hot-plug 448 cpus to rhel9.1.0 guest with q35 + OVMF

Version-Release number of selected component (if applicable):
host version:
kernel-5.14.0-160.el9.x86_64
qemu-kvm-7.0.0-12.el9.x86_64
edk2-ovmf-20220526git16779ede2d36-3.el9.noarch
guest: rhel9.1.0

How reproducible:
100%

Steps to Reproduce:
1.Boot guest with "cpus_hotplug.sh" script
# sh cpus_hotplug.sh ovmf

2.After the cpus_hotplug.sh script is complete, check cpu number and  dmesg message in guest.
# lscpu
# dmesg

Actual results: 
After step 2, guest get 224 vcpus, the following message is found in dmesg:
[ 2021.441073] APIC: NR_CPUS/possible_cpus limit of 224 reached. Processor 670/0x1f7 ignored.
[ 2021.442429] ACPI: Unable to map lapic to logical cpu number
[ 2021.443277] acpi ACPI0007:df: Enumeration failure

Expected results:
guest get 448 vcpus

Additional info:
host name:lenovo-sr950-01.khw2.lab.eng.bos.redhat.com
host cpu:
CPU(s):                  448
  On-line CPU(s) list:   0-447
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
    BIOS Model name:     Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  28
    Socket(s):           8

Comment 2 Igor Mammedov 2022-12-14 08:56:41 UTC
As QE noted it's the same bug as
  https://bugzilla.redhat.com/show_bug.cgi?id=2150267
moving it to edk2 and CCing firmware folks.

Comment 4 Yiqian Wei 2023-02-06 08:27:50 UTC
Reproduce this bug with rhel9.2.0 guest on rhel9.2.0 host

host version:
kernel-5.14.0-244.el9.x86_64
qemu-kvm-7.2.0-5.el9.x86_64
edk2-ovmf-20221207gitfff6d81270b5-4.el9.noarch
guest: rhel9.2.0

Comment 5 Gerd Hoffmann 2023-03-07 16:24:57 UTC
https://edk2.groups.io/g/devel/message/100801

Comment 6 Gerd Hoffmann 2023-06-01 09:16:19 UTC
No upstream fix in edk2-stable202305.
Will use comment 5 patch for downstream 9.3 (after rebase).
Switch to upstream fix for 9.4 (hopefully).

Comment 7 Gerd Hoffmann 2023-06-29 11:47:45 UTC
scratch build: https://kojihub.stream.centos.org/koji/taskinfo?taskID=2425119

Comment 9 Yanan Fu 2023-07-11 09:15:01 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 15 Xueqiang Wei 2023-07-24 01:51:45 UTC
Thank you Yiqian and Gerd. And according to Comment 13 and Comment 14, set status to VERIFIED, track the new issue by bug 2224509. Thank you very much.

Comment 17 errata-xmlrpc 2023-11-07 08:24:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: edk2 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6330