Bug 2124143

Summary: ovmf must consider max cpu count not boot cpu count for apic mode [rhel-9]
Product: Red Hat Enterprise Linux 9 Reporter: Yiqian Wei <yiwei>
Component: edk2Assignee: Gerd Hoffmann <kraxel>
Status: VERIFIED --- QA Contact: Xueqiang Wei <xuwei>
Severity: medium Docs Contact:
Priority: medium    
Version: 9.1CC: berrange, coli, imammedo, jinzhao, juzhang, kraxel, lersek, pbonzini, ppolawsk, vgoyal, virt-maint, xuwei
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: edk2-20230524-2.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yiqian Wei 2022-09-05 02:49:31 UTC
Description of problem:
Failed to hot-plug 448 cpus to rhel9.1.0 guest with q35 + OVMF

Version-Release number of selected component (if applicable):
host version:
kernel-5.14.0-160.el9.x86_64
qemu-kvm-7.0.0-12.el9.x86_64
edk2-ovmf-20220526git16779ede2d36-3.el9.noarch
guest: rhel9.1.0

How reproducible:
100%

Steps to Reproduce:
1.Boot guest with "cpus_hotplug.sh" script
# sh cpus_hotplug.sh ovmf

2.After the cpus_hotplug.sh script is complete, check cpu number and  dmesg message in guest.
# lscpu
# dmesg

Actual results: 
After step 2, guest get 224 vcpus, the following message is found in dmesg:
[ 2021.441073] APIC: NR_CPUS/possible_cpus limit of 224 reached. Processor 670/0x1f7 ignored.
[ 2021.442429] ACPI: Unable to map lapic to logical cpu number
[ 2021.443277] acpi ACPI0007:df: Enumeration failure

Expected results:
guest get 448 vcpus

Additional info:
host name:lenovo-sr950-01.khw2.lab.eng.bos.redhat.com
host cpu:
CPU(s):                  448
  On-line CPU(s) list:   0-447
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
    BIOS Model name:     Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  28
    Socket(s):           8

Comment 2 Igor Mammedov 2022-12-14 08:56:41 UTC
As QE noted it's the same bug as
  https://bugzilla.redhat.com/show_bug.cgi?id=2150267
moving it to edk2 and CCing firmware folks.

Comment 4 Yiqian Wei 2023-02-06 08:27:50 UTC
Reproduce this bug with rhel9.2.0 guest on rhel9.2.0 host

host version:
kernel-5.14.0-244.el9.x86_64
qemu-kvm-7.2.0-5.el9.x86_64
edk2-ovmf-20221207gitfff6d81270b5-4.el9.noarch
guest: rhel9.2.0

Comment 5 Gerd Hoffmann 2023-03-07 16:24:57 UTC
https://edk2.groups.io/g/devel/message/100801

Comment 6 Gerd Hoffmann 2023-06-01 09:16:19 UTC
No upstream fix in edk2-stable202305.
Will use comment 5 patch for downstream 9.3 (after rebase).
Switch to upstream fix for 9.4 (hopefully).

Comment 7 Gerd Hoffmann 2023-06-29 11:47:45 UTC
scratch build: https://kojihub.stream.centos.org/koji/taskinfo?taskID=2425119

Comment 9 Yanan Fu 2023-07-11 09:15:01 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 15 Xueqiang Wei 2023-07-24 01:51:45 UTC
Thank you Yiqian and Gerd. And according to Comment 13 and Comment 14, set status to VERIFIED, track the new issue by bug 2224509. Thank you very much.