Bug 2058503

Summary: BSOD occurs during hotplug max vcpus to win2022 guest
Product: Red Hat Enterprise Linux 9 Reporter: Yiqian Wei <yiwei>
Component: qemu-kvmAssignee: Marek Kedzierski <mkedzier>
qemu-kvm sub component: General QA Contact: Yiqian Wei <yiwei>
Status: NEW --- Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aodaki, chayang, coli, jinzhao, juzhang, mkedzier, phou, qizhu, virt-maint, ymankad
Version: 9.0   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yiqian Wei 2022-02-25 07:20:56 UTC
Description of problem:
BSOD occurs during hotplug max vcpus to win2022 guest

Version-Release number of selected component (if applicable):
host version:
kernel-5.14.0-67.el9.x86_64
qemu-kvm-6.2.0-9.el9.x86_64
virtio-win-prewhql-0.1-215.iso
en-us_windows_server_2022_x64_dvd_620d7eac.iso
guest: win2022

How reproducible:
100%

Steps to Reproduce:
1.execute "hotplug_max_vcpus.sh" script.
# sh hotplug_max_vcpus.sh pc

Actual results:
Win2022 BSOD

Expected results:
Win2022 guest without BSOD and hotplug successfully

Additional info:
1) for q35 and ovmf, also hit the same issue

Comment 3 Yiqian Wei 2022-03-11 08:40:07 UTC
Hit this issue on rhel8.6.0 host.

host version:
qemu-kvm-6.2.0-8.module+el8.6.0+14324+050a5215.x86_64
kernel-4.18.0-369.el8.x86_64
guest: win2022-64-virtio-scsi.qcow2


No memory dump file is generated when virtio_scsi is used as system disk.
Memory dump file is generated when virtio_blk is used as system disk,please check the attachment: Memory.dmp

Comment 7 Yiqian Wei 2022-09-05 10:31:25 UTC
Hit this bug on RHEL.9.1.0 host with win2022 guest

host version:
kernel-5.14.0-160.el9.x86_64
qemu-kvm-7.0.0-12.el9.x86_64
edk2-ovmf-20220526git16779ede2d36-3.el9.noarch
guest: win2022

Comment 12 Yiqian Wei 2023-07-27 06:14:33 UTC
Can reproduce this bug on rhel9.3.0 host with win2022 guest

reproduce version:
kernel-5.14.0-344.el9.x86_64
qemu-kvm-8.0.0-9.el9.x86_64
edk2-ovmf-20230524-2.el9.noarch
guest: win2022

Comment 13 Akihiko Odaki 2023-08-02 05:39:25 UTC
We already have a few tickets for the case that Windows guest runs on many cores. So here is the list:

https://bugzilla.redhat.com/show_bug.cgi?id=1848878
Having more than 128 cores in one socket fails. The reproduction case has 4 sockets and the cores are evenly distributed so it's probably unrelated.
However, I also found Windows does something peculiar when the core number is not a power of two. There are physical machines with the core number not being a power of two (e.g. AMD EPYC 96 cores) so it's not likely to cause a real problem.

https://bugzilla.redhat.com/show_bug.cgi?id=2169904
SMBIOS is corrupted in some cases. It should already have been fixed though.

https://bugzilla.redhat.com/show_bug.cgi?id=2172167
Processor hotplug failure. I have not successfully reproduced it so far. !analyze -v also implies it is a distinct problem.

For me, it looks suspicious that the reproduction case has two threads per core. I remember my Windows 11 guest saw fewer threads with 8 cores/2 threads configuration.