Bug 2172167
| Summary: | [Windows][QEMU7.2] Windows 2022 vm gets BSOD 'PROCESSOR_START_TIMEOUT (1df)' after hotplugging max cores with with_hugepages sometimes | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | liunana <nanliu> |
| Component: | qemu-kvm | Assignee: | Yvugenfi <yvugenfi> |
| qemu-kvm sub component: | Devices | QA Contact: | liunana <nanliu> |
| Status: | CLOSED MIGRATED | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | aodaki, chayang, coli, jinzhao, mdean, virt-maint, yama, yvugenfi |
| Version: | 9.2 | Keywords: | CustomerScenariosInitiative, MigratedToJIRA |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Windows | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-08-15 11:31:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Memory.dmp analysis logs: Microsoft (R) Windows Debugger Version 10.0.22621.755 X86 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [C:\Windows\Memory.dmp] Kernel Bitmap Dump File: Kernel address space is available, User address space may not be available. ************* Path validation summary ************** Response Time (ms) Location Deferred SRV*c:\symbols*http://msdl.microsoft.com/download/symbols Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols Executable search path is: Unable to load image \SystemRoot\system32\ntoskrnl.exe, Win32 error 0n2 Windows 10 Kernel Version 20348 MP (3 procs) Free x64 Product: Server, suite: TerminalServer DataCenter SingleUserTS Edition build lab: 20348.1.amd64fre.fe_release.210507-1500 Machine Name: Kernel base = 0xfffff804`29419000 PsLoadedModuleList = 0xfffff804`2a04c940 Debug session time: Mon Feb 20 22:44:15.269 2023 (UTC + 8:00) System Uptime: 0 days 0:01:28.053 Loading Kernel Symbols ............................................................... ..Page 1f73 not present in the dump file. Type ".hh dbgerr004" for details .............................................................. ............. Loading User Symbols Loading unloaded module list .............. For analysis of this file, run !analyze -v 1: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* PROCESSOR_START_TIMEOUT (1df) A processor failed to start in the allowed time. Arguments: Arg1: ffffad00334ee390, Virtual address of the processor state. Arg2: 0000000000000003, Processor start state. Arg3: 0000000000000003, NT processor number. Arg4: 0000000000000003, Local unit ID for the processor. Debugging Details: ------------------ KEY_VALUES_STRING: 1 Key : Analysis.CPU.mSec Value: 13046 Key : Analysis.DebugAnalysisManager Value: Create Key : Analysis.Elapsed.mSec Value: 30734 Key : Analysis.Init.CPU.mSec Value: 5468 Key : Analysis.Init.Elapsed.mSec Value: 225216 Key : Analysis.Memory.CommitPeak.Mb Value: 95 Key : WER.OS.Branch Value: fe_release Key : WER.OS.Timestamp Value: 2021-05-07T15:00:00Z Key : WER.OS.Version Value: 10.0.20348.1 FILE_IN_CAB: Memory.dmp BUGCHECK_CODE: 1df BUGCHECK_P1: ffffad00334ee390 BUGCHECK_P2: 3 BUGCHECK_P3: 3 BUGCHECK_P4: 3 BLACKBOXBSD: 1 (!blackboxbsd) BLACKBOXNTFS: 1 (!blackboxntfs) BLACKBOXPNP: 1 (!blackboxpnp) BLACKBOXWINLOGON: 1 PROCESS_NAME: System LOCK_ADDRESS: fffff8042a069340 -- (!locks fffff8042a069340) Resource @ nt!PiEngineLock (0xfffff8042a069340) Exclusively owned Contention Count = 5 NumberOfExclusiveWaiters = 1 Threads: ffffd68109fb5040-01<*> Threads Waiting On Exclusive Access: ffffd6810c1760c0 1 total locks PNP_TRIAGE_DATA: Lock address : 0xfffff8042a069340 Thread Count : 1 Thread address: 0xffffd68109fb5040 Thread wait : 0x1603 STACK_TEXT: ffffad00`334ee138 fffff804`29d5091c : 00000000`000001df ffffad00`334ee390 00000000`00000003 00000000`00000003 : nt!KeBugCheckEx ffffad00`334ee140 fffff804`29d515e9 : ffffad00`334eea70 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiStartDynamicProcessor+0x6ec ffffad00`334ee9a0 fffff804`2ad8827a : fffff804`2adbfdc8 ffffd681`0c3b0670 ffffd681`0c3b0670 ffffd681`09109400 : nt!KeStartDynamicProcessor+0x69 ffffad00`334ee9d0 fffff804`2ad5d473 : ffffd681`0e390000 00000000`00000003 ffffad00`00000003 ffffd681`0e39ae10 : ACPI!ACPIProcessorStartDeviceWorker+0x2adea ffffad00`334eea60 fffff804`2ad51263 : ffffd681`0e39af28 ffffd681`0e39ae10 00000000`00000000 00000000`00000000 : ACPI!ACPIProcessorStartDevice+0x33 ffffad00`334eea90 fffff804`297ba207 : 00000000`00000007 ffffd681`0e39ae10 00000000`00000000 ffffd681`0c3b0670 : ACPI!ACPIDispatchIrp+0x253 ffffad00`334eeb10 fffff804`29e6c458 : ffffd681`0e39ae10 ffffd681`0c3b0670 ffffd681`0ed24968 00000000`00000000 : nt!IopfCallDriver+0x53 ffffad00`334eeb50 fffff804`298552cb : ffffd681`0ed24830 ffffd681`0e39ae10 ffff9a81`00000001 ffffd681`0ebe1e60 : nt!IovCallDriver+0x264 ffffad00`334eeb90 fffff804`2abbe648 : ffffd681`0ed24830 00000000`00000000 00000000`0000000e ffffad00`334eecf0 : nt!IofCallDriver+0xe77bb ffffad00`334eebd0 fffff804`2abc2d11 : ffffd681`0e39ae10 00000000`00000106 ffffad00`334eecf0 00000000`00000106 : Wdf01000!FxPkgFdo::PnpSendStartDeviceDownTheStackOverload+0x198 [minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgfdo.cpp @ 1066] ffffad00`334eec50 fffff804`2abb8c13 : 00000000`00000106 00000000`00000105 ffffd681`0ed24830 ffffd681`0dcc35c0 : Wdf01000!FxPkgPnp::PnpEventInitStarting+0x11 [minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 1328] ffffad00`334eec80 fffff804`2abb8d21 : 00000000`000000a0 ffffd681`0ed24830 ffffd681`0ed24830 00000000`00000000 : Wdf01000!FxPkgPnp::PnpEnterNewState+0x103 [minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 1234] ffffad00`334eed10 fffff804`2abb8a8e : ffffd681`0ed24830 ffffad00`334eee00 ffffd681`0ed24990 ffffd681`0ed24968 : Wdf01000!FxPkgPnp::PnpProcessEventInner+0xc5 [minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 1152] ffffad00`334eed80 fffff804`2abb887e : 00000000`00000000 ffffd681`0ed24830 ffffad00`334eeea0 00000000`00000000 : Wdf01000!FxPkgPnp::PnpProcessEvent+0x11e [minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 933] ffffad00`334eee10 fffff804`2ab84ce9 : ffffd681`0ed24830 ffffad00`334eeea0 00000000`0000001b ffffd681`0e39ae10 : Wdf01000!FxPkgPnp::_PnpStartDevice+0x1e [minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgpnp.cpp @ 2031] ffffad00`334eee40 fffff804`2ab85d89 : ffffd681`0e39ae10 ffffd681`0ed24020 ffffd681`0ed25420 fffff804`29e7891e : Wdf01000!FxPkgPnp::Dispatch+0xd9 [minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgpnp.cpp @ 771] ffffad00`334eeeb0 fffff804`297ba207 : ffffd681`0e170120 ffffd681`0e873618 fffff804`2978d6f0 ffffd681`0e39ae10 : Wdf01000!FxDevice::DispatchWithLock+0x1f9 [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1447] ffffad00`334eef10 fffff804`29e6c458 : ffffd681`0e39ae10 ffffd681`0ed25420 ffffd681`0c3b0670 fffff804`29e787c1 : nt!IopfCallDriver+0x53 ffffad00`334eef50 fffff804`298552cb : ffffd681`0e39ae10 ffffad00`334ef0b0 ffffd681`0c3b0670 ffffd681`0e873560 : nt!IovCallDriver+0x264 ffffad00`334eef90 fffff804`29ac1242 : ffffd681`0c3b0670 ffffd681`0ec43bd0 00000000`00000007 00000000`00000000 : nt!IofCallDriver+0xe77bb ffffad00`334eefd0 fffff804`296359b6 : ffffd681`0c3b0670 00000000`00000001 ffffd681`0ec43bd0 fffff804`296346a0 : nt!PnpAsynchronousCall+0xe6 ffffad00`334ef010 fffff804`29634854 : 00000000`00000000 ffffd681`0c3b0670 fffff804`2978d6f0 fffff804`2978d6f0 : nt!PnpSendIrp+0x9e ffffad00`334ef080 fffff804`29bbd52b : ffffd681`0ed3a340 00000000`00000000 ffffd681`0ec43bd0 ffffd681`0e170120 : nt!PnpStartDevice+0x88 ffffad00`334ef110 fffff804`29bbd199 : ffffd681`0ed3a340 ffffad00`334ef248 00000000`00000000 ffffd681`0ed3a340 : nt!PnpStartDeviceNode+0xef ffffad00`334ef1a0 fffff804`29bc12b0 : ffffd681`0ed3a340 ffffd681`090ba9a0 ffffd681`00000000 00000000`00000001 : nt!PipProcessStartPhase1+0x5d ffffad00`334ef1e0 fffff804`29bb7102 : ffffd681`0e170101 ffffad00`334ef301 ffffad00`334ef2e0 fffff804`00000002 : nt!PipProcessDevNodeTree+0x394 ffffad00`334ef290 fffff804`2978c25d : 00000001`00000003 ffffd681`090ba9a0 ffffd681`0e170120 ffffd681`0e170120 : nt!PiProcessReenumeration+0x92 ffffad00`334ef2e0 fffff804`296808e1 : ffffd681`09fb5040 fffff804`2a156640 ffffd681`090aec60 ffffd681`00000000 : nt!PnpDeviceActionWorker+0x47d ffffad00`334ef3a0 fffff804`29701345 : ffffd681`09fb5040 fffff804`28cf8000 ffffd681`09fb5040 72006500`64006c00 : nt!ExpWorkerThread+0x161 ffffad00`334ef5b0 fffff804`29832ef8 : fffff804`28c0b180 ffffd681`09fb5040 fffff804`297012f0 20006500`68007400 : nt!PspSystemThreadStartup+0x55 ffffad00`334ef600 00000000`00000000 : ffffad00`334f0000 ffffad00`334e9000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28 SYMBOL_NAME: ACPI!ACPIProcessorStartDeviceWorker+2adea MODULE_NAME: ACPI IMAGE_NAME: ACPI.sys STACK_COMMAND: .cxr; .ecxr ; kb BUCKET_ID_FUNC_OFFSET: 2adea FAILURE_BUCKET_ID: 0x1DF_VRF_ACPI!ACPIProcessorStartDeviceWorker OS_VERSION: 10.0.20348.1 BUILDLAB_STR: fe_release OSPLATFORM_TYPE: x64 OSNAME: Windows 10 FAILURE_ID_HASH: {8aaee8cf-9dcb-f04e-4e62-6eed422c50e8} Followup: MachineOwner --------- 1: kd> Implicit thread is now ffffd681`09fb5040 1: kd> Implicit thread is now ffffd681`09fb5040 ~ Will check if RHEL vm has a chance to reproduce this issue later. Didn't hit any error with ovmf rhel9.2 vm after repeat test 12 times. And I can't be sure if this is a regression issue since the recurrence rate is very low. Meirav - can someone from the Windows team look at this since it seems non reproducible using a Linux guest. May need to alter the component/subcomponent too. 1. Can you please upload the crash dump file? 2. What is the amount of physical CPUs on the host? Thank you. Hi, Thank you for memory dump. I had a look at it, and found apparently it's starting the 19th core, but only 18 cores are listed for WinDbg command: !running -i -t Can you provide the memory dump *not* converted with elf2dmp? I'd like to know if QEMU also says the core being enabled is offline or not. Also, I'm having a difficulty to reproduce this issue and have some questions. - It seems you had to add 18 extra cores to trigger the bug for the memory dump. How many processors do you have to add to reproduce the issue reliably? - Can you minimize the test case? In particular, remove all options for CPU (try just with -cpu Icelake-Server), extra disks, extra NICs, and TPM, and see if the issue can be reproduced. - Can you update your Windows Server installation with Windows Update? Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. |
Description of problem: Windows 2022 gets BSOD 'PROCESSOR_START_TIMEOUT (1df)' after hotplugging max cores with with_hugepages sometimes Version-Release number of selected component (if applicable): Host: 5.14.0-268.el9.x86_64 qemu-kvm-7.2.0-9.el9.x86_64 edk2-ovmf-20221207gitfff6d81270b5-6.el9.noarch libvirt-9.0.0-6.el9.x86_64 dell-per750-25.lab.eng.pek2.redhat.com (Icelake) swtpm-0.8.0-1.el9.x86_64 Guest: Win2022 How reproducible: 1/6 Steps to Reproduce: 1. Set hugepages on host: # mount -t hugetlbfs -o pagesize=2048K none /mnt/kvm_hugepage 2. Boot windows 2022 vm with ' -smp 1,maxcpus=48,cores=48,threads=1,dies=1,sockets=1', Full qemu command line [1] 3. hotplug max cores: {'execute': 'device_add', 'arguments': OrderedDict([('id', 'vcpu47'), ('driver', 'Icelake-Server-x86_64-cpu'), ('socket-id', 0), ('die-id', 0), ('core-id', 47), ('thread-id', 0)]), 'id': 'kKNeBTZi'} 4. Check guest status. Actual results: Guest gets BSOD 'PROCESSOR_START_TIMEOUT (1df)' Expected results: Guest works well without any error. Additional info: [1] - full qemu command line. /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -sandbox on \ -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' \ -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \ -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_win2022-64-virtio-scsi_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' \ -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \ -machine q35,kernel-irqchip=split,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \ -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \ -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}' \ -nodefaults \ -device '{"intremap": "on", "device-iotlb": true, "driver": "intel-iommu"}' \ -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \ -m 62464 \ -object '{"size": 65498251264, "mem-path": "/mnt/kvm_hugepage", "id": "mem-machine_mem", "qom-type": "memory-backend-file"}' \ -smp 1,maxcpus=48,cores=48,threads=1,dies=1,sockets=1 \ -cpu 'Icelake-Server',ds=on,ss=on,dtes64=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,kvm_pv_unhalt=on \ -chardev socket,id=qmp_id_qmpmonitor1,server=on,path=/var/tmp/avocado_stf8vw0k/monitor-qmpmonitor1-20230220-130049-6JtlC9mT,wait=off \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,server=on,path=/var/tmp/avocado_stf8vw0k/monitor-catch_monitor-20230220-130049-6JtlC9mT,wait=off \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device '{"ioport": 1285, "driver": "pvpanic", "id": "idDCbx2r"}' \ -chardev socket,id=chardev_serial0,server=on,path=/var/tmp/avocado_stf8vw0k/serial-serial0-20230220-130049-6JtlC9mT,wait=off \ -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}' \ -chardev socket,id=seabioslog_id_20230220-130049-6JtlC9mT,path=/var/tmp/avocado_stf8vw0k/seabios-20230220-130049-6JtlC9mT,server=on,wait=off \ -device isa-debugcon,chardev=seabioslog_id_20230220-130049-6JtlC9mT,iobase=0x402 \ -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \ -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \ -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \ -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \ -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0", "iommu_platform": true}' \ -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/win2022-64-virtio-scsi.qcow2", "cache": {"direct": true, "no-flush": false}}' \ -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \ -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \ -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \ -device '{"driver": "virtio-net-pci", "mac": "9a:c1:2d:9f:d9:b4", "id": "idU2n9WB", "netdev": "id01VFAQ", "bus": "pcie-root-port-3", "addr": "0x0", "iommu_platform": true}' \ -netdev tap,id=id01VFAQ,vhost=on \ -blockdev '{"node-name": "file_cd1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/iso/windows/winutils.iso", "cache": {"direct": true, "no-flush": false}}' \ -blockdev '{"node-name": "drive_cd1", "driver": "raw", "read-only": true, "cache": {"direct": true, "no-flush": false}, "file": "file_cd1"}' \ -device '{"driver": "scsi-cd", "id": "cd1", "drive": "drive_cd1", "write-cache": "on"}' \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -chardev socket,id=char_vtpm_avocado-vt-vm1_tpm0,path=/root/avocado/data/avocado-vt/swtpm/avocado-vt-vm1_tpm0_swtpm.sock \ -tpmdev emulator,chardev=char_vtpm_avocado-vt-vm1_tpm0,id=emulator_vtpm_avocado-vt-vm1_tpm0 \ -device '{"id": "tpm-crb_vtpm_avocado-vt-vm1_tpm0", "tpmdev": "emulator_vtpm_avocado-vt-vm1_tpm0", "driver": "tpm-crb"}' \ -enable-kvm \ -monitor stdio \