Bug 1656705 - Win2019 guest BSOD during vcpu hotplug
Summary: Win2019 guest BSOD during vcpu hotplug
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.1
Assignee: Yan Vugenfirer
QA Contact: Yumei Huang
URL:
Whiteboard:
Depends On:
Blocks: 1659244
TreeView+ depends on / blocked
 
Reported: 2018-12-06 06:39 UTC by xiagao
Modified: 2020-01-20 06:38 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-25 08:44:10 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
win2019 update history screenshot (140.54 KB, image/png)
2019-02-25 02:28 UTC, xiagao
no flags Details
Really updated on Yuri's machine (32.69 KB, image/png)
2019-02-25 05:31 UTC, ybendito
no flags Details
install the update by dism (126.03 KB, image/png)
2019-02-25 06:41 UTC, xiagao
no flags Details

Description xiagao 2018-12-06 06:39:41 UTC
Description of problem:
As $summary

Version-Release number of selected component (if applicable):
qemu-kvm-3.0.0-2.module+el8+2208+e41b12e0.x86_64
kernel-4.18.0-40.el8.x86_64
seabios-bin-1.11.1-2.module+el8+2179+85112f94.noarch

How reproducible:
100%

Steps to Reproduce:
1.Install a new win2019 guest with qemu cmd line and without virtio driver installed.
qemu cmd line:
/usr/libexec/qemu-kvm -name win2019 -enable-kvm -m 3G -smp 4,maxcpus=8,sockets=2,cores=2,threads=2 -nodefconfig -nodefaults \
-cpu 'Skylake-Server',+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time \
-rtc base=localtime,driftfix=none -boot order=cd,menu=on -vga std -vnc :19 -monitor stdio -qmp tcp:0:1239,server,nowait \
-drive file=win2019.qcow2,if=none,id=drive-ide0-0-0,format=qcow2,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \
-netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0,vhost=on,queues=4 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:52:3b:35:86:09,mq=on,vectors=10 \
-device piix3-usb-uhci,id=usb -device usb-tablet,id=input0 \
-drive file=en_windows_server_2019_x64_dvd_4cb967d8.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,id=ide0-1-0 \
-cdrom /home/kvm_autotest_root/iso/windows/virtio-win.iso.el8 \

2.hot add cpu
(qemu) cpu-add 4


Actual results:
Win2019 guest BSOD with IRQL_NOT_LESS_OR_EQUAL code

Expected results:
No bsod during cpu hotplug.

Additional info:
1. changed to pc machine type, hit BSOD as well.
2. in rhel7.6 host, also hit BSOD.
3. Tried win2016, didn't hit BSOD.
4. The windbg info of dump file is here

***************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck A, {af0, ff, 0, fffff80423198034}

Probably caused by : ntkrnlmp.exe ( nt!KiInitializeXSave+74 )

Followup:     MachineOwner
---------

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0000000000000af0, memory referenced
Arg2: 00000000000000ff, IRQL
Arg3: 0000000000000000, bitfield :
	bit 0 : value 0 = read operation, 1 = write operation
	bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80423198034, address which referenced memory

Debugging Details:
------------------


DUMP_CLASS: 1

DUMP_QUALIFIER: 401

BUILD_VERSION_STRING:  17763.1.amd64fre.rs5_release.180914-1434

SYSTEM_MANUFACTURER:  Red Hat

SYSTEM_PRODUCT_NAME:  KVM

SYSTEM_VERSION:  RHEL-7.6.0 PC (Q35 + ICH9, 2009)

BIOS_VENDOR:  SeaBIOS

BIOS_VERSION:  1.11.1-2.module+el8+2179+85112f94

BIOS_DATE:  04/01/2014

DUMP_TYPE:  1

BUGCHECK_P1: af0

BUGCHECK_P2: ff

BUGCHECK_P3: 0

BUGCHECK_P4: fffff80423198034

READ_ADDRESS:  0000000000000af0 

CURRENT_IRQL:  0

FAULTING_IP: 
nt!KiInitializeXSave+74
fffff804`23198034 8b88f00a0000    mov     ecx,dword ptr [rax+0AF0h]

CPU_COUNT: 4

CPU_MHZ: 82f

CPU_VENDOR:  GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 55

CPU_STEPPING: 4

CPU_MICROCODE: 6,55,4,0 (F,M,S,R)  SIG: 1'00000000 (cache) 1'00000000 (init)

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  AV

PROCESS_NAME:  System

ANALYSIS_SESSION_HOST:  WIN-3IORRL4PE1F

ANALYSIS_SESSION_TIME:  12-06-2018 13:38:18.0015

ANALYSIS_VERSION: 10.0.16299.15 amd64fre

LAST_CONTROL_TRANSFER:  from fffff80422e24f76 to fffff80422c243ff

STACK_TEXT:  
fffff804`2505f068 fffff804`22e24f76 : 00000000`00000000 ffffcf0e`980c7140 fffff804`21ef0180 ffffcf0e`980c7050 : hal!HalProcessorIdle+0xf
fffff804`2505f070 fffff804`22d1ed4b : 00000000`00000000 00000000`00989680 00000000`00000000 00000000`00005026 : nt!PpmIdleDefaultExecute+0x16
fffff804`2505f0a0 fffff804`22d1e4ff : 00000000`000034c6 00000000`00000002 00000000`00000001 01d48daa`4058ca9b : nt!PpmIdleExecuteTransition+0x6bb
fffff804`2505f3c0 fffff804`22e58a2c : 00000000`00000000 fffff804`21ef0180 fffff804`23185400 ffffcf0e`9d159080 : nt!PoIdle+0x33f
fffff804`2505f520 00000000`00000000 : fffff804`25060000 fffff804`25059000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x2c


THREAD_SHA1_HASH_MOD_FUNC:  a33bbf8d6c1e91853cb3b2e1d5021dc2d1902095

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  e9a583ada42ef56447e7f1044f0434c20ddafe0d

THREAD_SHA1_HASH_MOD:  e067c3ea0fc8d1e6685e7b95f2f08f6effa3525b

FOLLOWUP_IP: 
nt!KiInitializeXSave+74
fffff804`23198034 8b88f00a0000    mov     ecx,dword ptr [rax+0AF0h]

FAULT_INSTR_CODE:  af0888b

SYMBOL_NAME:  nt!KiInitializeXSave+74

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  0

STACK_COMMAND:  .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET:  74

FAILURE_BUCKET_ID:  AV_nt!KiInitializeXSave

BUCKET_ID:  AV_nt!KiInitializeXSave

PRIMARY_PROBLEM_CLASS:  AV_nt!KiInitializeXSave

TARGET_TIME:  2018-12-06T21:25:46.000Z

OSBUILD:  17763

OSSERVICEPACK:  0

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK:  400

PRODUCT_TYPE:  3

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

OSEDITION:  Windows 10 Server TerminalServer DataCenter SingleUserTS

OS_LOCALE:  

USER_LCID:  0

OSBUILD_TIMESTAMP:  unknown_date

BUILDDATESTAMP_STR:  180914-1434

BUILDLAB_STR:  rs5_release

BUILDOSVER_STR:  10.0.17763.1.amd64fre.rs5_release.180914-1434

ANALYSIS_SESSION_ELAPSED_TIME:  5ad

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:av_nt!kiinitializexsave

FAILURE_ID_HASH:  {609d7d2b-4c21-e448-89bc-0907c873ab19}

Followup:     MachineOwner

Comment 1 xiagao 2018-12-06 06:43:40 UTC
Memory dump file can be find in the following link.
http://fileshare.englab.nay.redhat.com/pub/section2/coredump/bz1656705/MEMORY-q35-cpuadd.tar.gz

Comment 4 Eduardo Habkost 2018-12-12 19:54:30 UTC
Have VCPU hotplug ever worked before, with Windows 2019 guests?

Comment 5 xiagao 2018-12-13 02:08:17 UTC
(In reply to Eduardo Habkost from comment #4)
> Have VCPU hotplug ever worked before, with Windows 2019 guests?

This is a new windows guest, and it's the first test.

Comment 7 Igor Mammedov 2018-12-14 12:58:02 UTC
Looking at comment 1 it doesn't look like ACPI related so it probably happens after hotplug event handled.

I'd re-try
  * hotplug with 4G RAM (if I recall correctly there were issues with CPU hotplug if rRAM was less than 2G)
  * disable xsave in cpu flags (since it crashes in kiinitializexsave)

Comment 12 liunana 2019-01-09 09:14:48 UTC
seems the error is the same as before:


*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0000000000000af0, memory referenced
Arg2: 00000000000000ff, IRQL
Arg3: 0000000000000000, bitfield :
        bit 0 : value 0 = read operation, 1 = write operation
        bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80554708034, address which referenced memory

Comment 13 liunana 2019-01-10 07:30:57 UTC
FUll dump file log, seems this has the same "FAULTING_IP" as before: 


Debugging Details:
------------------


READ_ADDRESS: unable to get nt!MmSpecialPoolStart
unable to get nt!MmSpecialPoolEnd
unable to get nt!MmPagedPoolEnd
unable to get nt!MmNonPagedPoolStart
unable to get nt!MmSizeOfNonPagedPoolInBytes
 0000000000000af0

CURRENT_IRQL:  0

FAULTING_IP:
nt!KiInitializeXSave+74
fffff800`71910034 8b88f00a0000    mov     ecx,dword ptr [rax+0AF0h]

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  AV

PROCESS_NAME:  System

ANALYSIS_VERSION: 6.3.9600.16520 (debuggers(dbg).140127-0329) x86fre

DPC_STACK_BASE:  FFFFF80073865FB0

LAST_CONTROL_TRANSFER:  from fffff8007159cf76 to fffff80071e243ff

STACK_TEXT:
fffff800`7385e8e8 fffff800`7159cf76 : 00000000`00000000 ffffb98d`beb02140 fffff800`70389180 ffffb98d`beb02050 : hal!HalProcessorIdle+0xf
fffff800`7385e8f0 fffff800`71496d4b : 00000000`00000000 00000000`00989680 00000000`00000000 00000000`00000053 : nt!PpmIdleDefaultExecute+0x16
fffff800`7385e920 fffff800`714964ff : 00000000`000e0736 00000000`00000002 00000000`00000000 01d4a8f4`734faf97 : nt!PpmIdleExecuteTransition+0x6bb
fffff800`7385ec40 fffff800`715d0a2c : 00000000`00000000 fffff800`70389180 fffff800`718fd400 ffffb98d`c28b6040 : nt!PoIdle+0x33f
fffff800`7385eda0 00000000`00000000 : fffff800`7385f000 fffff800`73859000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x2c


STACK_COMMAND:  .bugcheck ; kb

FOLLOWUP_IP:
nt!KiInitializeXSave+74
fffff800`71910034 8b88f00a0000    mov     ecx,dword ptr [rax+0AF0h]

SYMBOL_NAME:  nt!KiInitializeXSave+74

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  0

BUCKET_ID_FUNC_OFFSET:  74

FAILURE_BUCKET_ID:  AV_nt!KiInitializeXSave

BUCKET_ID:  AV_nt!KiInitializeXSave

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:av_nt!kiinitializexsave

FAILURE_ID_HASH:  {609d7d2b-4c21-e448-89bc-0907c873ab19}

Comment 14 Guo, Zhiyi 2019-01-10 08:51:40 UTC
Change the cpu model to Westmere-IBRS, this cpu model doesn't have xsave flag enabled, I can still reproduce the issue.

For simple reproduce, use libvirt xml:
...
  <vcpu placement='static' current='1'>4</vcpu>
...
  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='forbid'>Westmere-IBRS</model>
    <topology sockets='1' cores='2' threads='2'/>
  </cpu>
...

The hotplug one vcpu:
virsh setvcpus win2019 2

Guest will BSOD, extract dump file, logs:
Microsoft (R) Windows Debugger Version 10.0.18303.1000 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Users\zhiyi\Downloads\MEMORY.DMP]
Kernel Bitmap Dump File: Kernel address space is available, User address space may not be available.


************* Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       srv*
Symbol search path is: srv*
Executable search path is: 
Windows 10 Kernel Version 17763 MP (2 procs) Free x64
Product: Server, suite: TerminalServer DataCenter SingleUserTS
Built by: 17763.1.amd64fre.rs5_release.180914-1434
Machine Name:
Kernel base = 0xfffff807`434a6000 PsLoadedModuleList = 0xfffff807`438c59b0
Debug session time: Fri Jan 11 07:07:14.505 2019 (UTC + 8:00)
System Uptime: 0 days 0:03:32.243
Loading Kernel Symbols
....................................Page 20010ea46 too large to be in the dump file.
Page 20010ea45 too large to be in the dump file.
...................Page 200109de4 too large to be in the dump file.
........
......................................................
Loading User Symbols

Loading unloaded module list
.....
For analysis of this file, run !analyze -v
hal!HalpPmTimerQueryCounterIoPort+0x5:
fffff807`434138e5 8bc0            mov     eax,eax
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0000000000000af0, memory referenced
Arg2: 00000000000000ff, IRQL
Arg3: 0000000000000000, bitfield :
	bit 0 : value 0 = read operation, 1 = write operation
	bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff8074399a034, address which referenced memory

Debugging Details:
------------------


KEY_VALUES_STRING: 1


PROCESSES_ANALYSIS: 1

STACKHASH_ANALYSIS: 1

TIMELINE_ANALYSIS: 1


DUMP_CLASS: 1

DUMP_QUALIFIER: 401

BUILD_VERSION_STRING:  17763.1.amd64fre.rs5_release.180914-1434

SYSTEM_MANUFACTURER:  Red Hat

SYSTEM_PRODUCT_NAME:  KVM

SYSTEM_VERSION:  RHEL-7.6.0 PC (Q35 + ICH9, 2009)

BIOS_VENDOR:  SeaBIOS

BIOS_VERSION:  1.11.1-3.module+el8+2327+339cd21f

BIOS_DATE:  04/01/2014

DUMP_TYPE:  1

BUGCHECK_P1: af0

BUGCHECK_P2: ff

BUGCHECK_P3: 0

BUGCHECK_P4: fffff8074399a034

READ_ADDRESS:  0000000000000af0 

CURRENT_IRQL:  0

FAULTING_IP: 
nt!KiInitializeXSave+74
fffff807`4399a034 8b88f00a0000    mov     ecx,dword ptr [rax+0AF0h]

CPU_COUNT: 2

CPU_MHZ: a20

CPU_VENDOR:  GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 2c

CPU_STEPPING: 1

CPU_MICROCODE: 6,2c,1,0 (F,M,S,R)  SIG: 1'00000000 (cache) 1'00000000 (init)

BLACKBOXBSD: 1 (!blackboxbsd)


DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  AV

PROCESS_NAME:  System

ANALYSIS_SESSION_HOST:  DESKTOP-P3MP83L

ANALYSIS_SESSION_TIME:  01-10-2019 15:52:36.0963

ANALYSIS_VERSION: 10.0.18303.1000 amd64fre

LOCK_ADDRESS:  fffff807438dfd40 -- (!locks fffff807438dfd40)

Resource @ nt!PiEngineLock (0xfffff807438dfd40)    Exclusively owned
    Contention Count = 3
     Threads: ffffcc0e5d36a080-01<*> 
1 total locks

PNP_TRIAGE_DATA: 
	Lock address  : 0xfffff807438dfd40
	Thread Count  : 1
	Thread address: 0xffffcc0e5d36a080
	Thread wait   : 0x350f

LAST_CONTROL_TRANSFER:  from fffff807434104d7 to fffff807434138e5

STACK_TEXT:  
fffff20e`f7f64a28 fffff807`434104d7 : fffffcfe`7f3ef9b8 fffffcfe`7f3f9f78 00000000`00000000 fffff807`435ba513 : hal!HalpPmTimerQueryCounterIoPort+0x5
fffff20e`f7f64a30 fffff807`4341897c : 00000000`00000001 00000000`02000000 00000000`00000000 fffff20e`f7f64b10 : hal!KeStallExecutionProcessor+0x97
fffff20e`f7f64ab0 fffff807`4348d9a5 : 00000000`00000001 00000000`00000000 fffff20e`f7f64b10 00000000`00000000 : hal!HalpApicStartProcessor+0xbc
fffff20e`f7f64ae0 fffff807`4343c2d0 : 00000000`00000002 fffff20e`00000002 fffff20e`f7f64db0 00000000`00000002 : hal!HalpInterruptStartProcessor+0x129
fffff20e`f7f64b70 fffff807`43c6c101 : 00000000`00000000 fffff20e`f7f64ca0 fffff807`43985240 ffffa500`17ddf180 : hal!HalStartDynamicProcessor+0xa0
fffff20e`f7f64ba0 fffff807`43c6cd29 : fffff20e`f7f65440 00000000`00000000 00000000`00000000 00002000`00000000 : nt!KiStartDynamicProcessor+0x3b5
fffff20e`f7f653c0 fffff801`85832499 : ffffcc0e`5d31dbc0 ffffcc0e`610b0ba0 ffffcc0e`610b0ba0 fffff801`8586e27a : nt!KeStartDynamicProcessor+0x69
fffff20e`f7f653f0 fffff801`8580120c : ffffcc0e`5d31dbc0 ffffcc0e`5dc6a128 fffff801`85560000 ffffcc0e`00000002 : ACPI!ACPIProcessorStartDevice+0x256b9
fffff20e`f7f65480 fffff807`43561189 : 00000000`00000007 ffffcc0e`60c448d0 ffffcc0e`5dc6a010 00000000`00000000 : ACPI!ACPIDispatchIrp+0x1fc
fffff20e`f7f65500 fffff801`8555719c : ffffcc0e`60c448d0 00000000`00000000 00000000`00000015 fffff801`854e1a31 : nt!IofCallDriver+0x59
fffff20e`f7f65540 fffff801`8554d011 : ffffcc0e`5dc6a010 ffffcc0e`60c448d0 fffff20e`f7f65650 00000000`00000000 : Wdf01000!FxPkgFdo::PnpSendStartDeviceDownTheStackOverload+0x27c [minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgfdo.cpp @ 1069] 
fffff20e`f7f655b0 fffff801`8554ca87 : ffffcc0e`60c448d0 00000000`00000105 00000000`00000106 00000000`00000000 : Wdf01000!FxPkgPnp::PnpEventInitStarting+0x11 [minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 1328] 
fffff20e`f7f655e0 fffff801`8554e842 : ffffcc0e`60c44a00 ffffcc0e`60c448d0 ffffcc0e`60c448d0 00000000`00000002 : Wdf01000!FxPkgPnp::PnpEnterNewState+0x17b [minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 1234] 
fffff20e`f7f65670 fffff801`8554e5f2 : 00000000`00000000 ffffcc0e`60c44a28 ffffcc0e`60c44a00 00000000`0000000c : Wdf01000!FxPkgPnp::PnpProcessEventInner+0x1e6 [minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 1152] 
fffff20e`f7f656f0 fffff801`85555e3e : 00000000`00000000 ffffcc0e`60c448d0 00000000`0000001b 00000000`00000288 : Wdf01000!FxPkgPnp::PnpProcessEvent+0x19a [minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 933] 
fffff20e`f7f65780 fffff801`854d2ef4 : ffffcc0e`60c448d0 00000000`0000001b 00000000`00000288 00000000`00000000 : Wdf01000!FxPkgPnp::_PnpStartDevice+0x1e [minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgpnp.cpp @ 1999] 
fffff20e`f7f657b0 fffff801`854d1b73 : ffffcc0e`5dc6a010 ffffcc0e`60ea9840 ffffcc0e`5d4738f0 00000000`00000000 : Wdf01000!FxPkgPnp::Dispatch+0xb4 [minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgpnp.cpp @ 745] 
fffff20e`f7f65820 fffff807`43561189 : fffff20e`f7f659a0 ffffcc0e`5d4738f0 00000000`00000200 ffffcc0e`610258c0 : Wdf01000!FxDevice::DispatchWithLock+0x113 [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1430] 
fffff20e`f7f65880 fffff807`43b0954e : ffffcc0e`610b0ba0 ffffcc0e`610258c0 00000000`00000001 00000000`00000000 : nt!IofCallDriver+0x59
fffff20e`f7f658c0 fffff807`43531f01 : ffffcc0e`610b0ba0 00000000`00000000 ffffcc0e`610258c0 ffffcc0e`610258c0 : nt!PnpAsynchronousCall+0xea
fffff20e`f7f65900 fffff807`435faf58 : 00000000`00000000 ffffcc0e`610b0ba0 fffff807`435fa5e0 fffff807`435fa5e0 : nt!PnpSendIrp+0x95
fffff20e`f7f65970 fffff807`43af7f47 : ffffcc0e`5dd0fcc0 ffffcc0e`610258c0 00000000`00000000 ffffcc0e`5dd0fcc0 : nt!PnpStartDevice+0x88
fffff20e`f7f65a00 fffff807`43af812f : ffffcc0e`5dd0fcc0 00000000`00000000 00000000`00000001 fffff807`435faaba : nt!PnpStartDeviceNode+0xdb
fffff20e`f7f65a90 fffff807`43af3278 : ffffcc0e`5dd0fcc0 fffff20e`f7f65b58 00000000`00000002 00000000`00000001 : nt!PipProcessStartPhase1+0x6f
fffff20e`f7f65ae0 fffff807`43b08c96 : ffffcc0e`5d9de600 fffff807`43562f01 fffff20e`f7f65bf0 ffffcc0e`00000002 : nt!PipProcessDevNodeTree+0x3dc
fffff20e`f7f65ba0 fffff807`435ffe3d : ffffcc01`00000003 ffffcc0e`5d363be0 ffffcc0e`5d9de680 00000000`00000000 : nt!PiProcessReenumeration+0x82
fffff20e`f7f65bf0 fffff807`4350311a : ffffcc0e`5d36a080 fffff807`438de5e0 ffffcc0e`5d272a60 ffffcc0e`00000000 : nt!PnpDeviceActionWorker+0x1dd
fffff20e`f7f65cb0 fffff807`435c76c5 : ffffcc0e`5d36a080 ffffcc0e`5d272080 ffffcc0e`5d36a080 00000000`00000000 : nt!ExpWorkerThread+0x16a
fffff20e`f7f65d50 fffff807`4365e49c : fffff807`427c2180 ffffcc0e`5d36a080 fffff807`435c7670 00000000`00000000 : nt!PspSystemThreadStartup+0x55
fffff20e`f7f65da0 00000000`00000000 : fffff20e`f7f66000 fffff20e`f7f60000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x1c


THREAD_SHA1_HASH_MOD_FUNC:  f08da0e576ee31f034a87c00580b6cd3fa5e22b2

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  72d43f2a9685c63760de5a42aafd062a2d2be78e

THREAD_SHA1_HASH_MOD:  71176d6242493fece88e1acc7508848070ee89bb

FOLLOWUP_IP: 
ACPI!ACPIProcessorStartDevice+256b9
fffff801`85832499 0f1f440000      nop     dword ptr [rax+rax]

FAULT_INSTR_CODE:  441f0f

SYMBOL_STACK_INDEX:  7

SYMBOL_NAME:  ACPI!ACPIProcessorStartDevice+256b9

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: ACPI

IMAGE_NAME:  ACPI.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  0

STACK_COMMAND:  .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET:  256b9

FAILURE_BUCKET_ID:  AV_ACPI!ACPIProcessorStartDevice

BUCKET_ID:  AV_ACPI!ACPIProcessorStartDevice

PRIMARY_PROBLEM_CLASS:  AV_ACPI!ACPIProcessorStartDevice

TARGET_TIME:  2019-01-10T23:07:14.000Z

OSBUILD:  17763

OSSERVICEPACK:  0

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK:  400

PRODUCT_TYPE:  3

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

OSEDITION:  Windows 10 Server TerminalServer DataCenter SingleUserTS

OS_LOCALE:  

USER_LCID:  0

OSBUILD_TIMESTAMP:  unknown_date

BUILDDATESTAMP_STR:  180914-1434

BUILDLAB_STR:  rs5_release

BUILDOSVER_STR:  10.0.17763.1.amd64fre.rs5_release.180914-1434

ANALYSIS_SESSION_ELAPSED_TIME:  c647

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:av_acpi!acpiprocessorstartdevice

FAILURE_ID_HASH:  {61e49226-f01f-aa3c-f6bd-56108140cc16}

Followup:     MachineOwner
---------

Comment 16 Igor Mammedov 2019-01-10 17:07:38 UTC
I've run it with ACPI debugger on and it looks fine on ACPI side,
guest sees CPU that's enabled and get MADT entry for it
and KVM trace confirms that hotplugged CPU gets SIPI and is running

Symptoms look like it's NULL pointer dereference
...
Arg1: 0000000000000af0, memory referenced
Arg2: 00000000000000ff, IRQL
Arg3: 0000000000000000, bitfield :
	bit 0 : value 0 = read operation, 1 = write operation
	bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff8074399a034, address which referenced memory
...
fffff807`4399a034 8b88f00a0000    mov     ecx,dword ptr [rax+0AF0h]

We probably should bounce this bug to MS, perhaps they broke CPU hotplug again
(another option left for us is to find baremetal system wiht hotpluggable cpus and verify that cpu hotplug works at all)

Comment 20 ybendito 2019-02-15 09:07:23 UTC
Some investigation notes:
1. On the same setup with upstream qemu 2019 crashes during cpu hotplug, 2016 works (with respective registry fixes)
2. The setup is simplest (host CPU)
3. The new CPU is already present in PnP tree, and 'starting' from PnP perspective
4. nt!KiInitializeXSave - cpu hot plug enters this procedure also in 2016, but no possibility to compare between 2016 and 2019 code as they are very different
5. VirtualBox with 2019 has exactly the same behavior, i.e. crash at the same place
6. After crash/reboot the added CPU works as usual (on any setup).
Seems as 2019 bug.

Comment 21 ybendito 2019-02-18 11:39:08 UTC
Submitted technical support request to Microsoft.

Comment 22 ybendito 2019-02-22 17:46:44 UTC
There is fresh (Feb 20) cumulative update for Win10/2019
It will be available via Windows update soon.
I've tried it from 
https://www.tenforums.com/windows-10-news/127582-cumulative-update-kb4482887-windows-10-rp-build-17763-346-feb-20-a.html
For installation use dism, as described on this page.
Please check it with several typical models of processors with and without xsave
In order to prevent closing support ticket, we'd like to have response ASAP.

Comment 23 xiagao 2019-02-25 02:28:08 UTC
After applying cumulative update for win2019, tried with several typical models of processors with and without xsave, all hit BSOD.

cpu flag in qemu cmd line:
/usr/libexec/qemu-kvm -name win2019 -enable-kvm -m 3G -smp 4,maxcpus=8,sockets=8,cores=1,threads=1 -nodefaults -cpu 'Skylake-Server',+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,xsave -rtc base=localtime,driftfix=none -boot order=cd,menu=on -monitor stdio -qmp tcp:0:1234,server,nowait -M q35 -vga std -vnc :10 \
-device pcie-root-port,id=pcie-root-port-6,slot=6,chassis=6,bus=pcie.0  \
-object secret,id=sec0,data=xiagao \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=win2019.luks,node-name=system_disk_file \
-blockdev driver=luks,key-secret=sec0,node-name=system_disk,file=system_disk_file \
-device virtio-blk-pci,bus=pcie-root-port-6,drive=system_disk,id=disk_system,werror=stop,rerror=stop,serial=MYDISK-1 \       
-device pcie-root-port,id=pcie-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0  \
-device virtio-net-pci,mac=9a:d0:d1:d2:d3:d4,id=net1,vectors=4,netdev=hostnet1,bus=pcie-root-port-7,addr=0x0  \
-netdev tap,id=hostnet1,vhost=on \


screenshot of windows update history in attachment

Comment 24 xiagao 2019-02-25 02:28:58 UTC
Created attachment 1538274 [details]
win2019 update history screenshot

Comment 25 ybendito 2019-02-25 05:31:47 UTC
Created attachment 1538307 [details]
Really updated on Yuri's machine

Comment 26 ybendito 2019-02-25 05:35:09 UTC
I do not see KB4482887 in your list of updates.
Please refer attachment in comment #25 - the update appears.
Please read comment #22 for link to CAB file and read instruction on downloading page on how to install it.
After install the system shall be restarted.

Comment 27 ybendito 2019-02-25 06:14:43 UTC
If you're absolutely sure the KB is installed (Control Panel - Program and Features - View installed updates), please provide the zipped dump file.
Note that on my system where the crash was easily reproduced before, this solves the problem.

Comment 28 xiagao 2019-02-25 06:41:58 UTC
Created attachment 1538311 [details]
install the update by dism

After install the KB4482887 update via dism cmd, tried with several typical models of processors with and without xsave, neither of them hit BSOD.

-cpu 'Skylake-Server',+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,xsave

Comment 29 ybendito 2019-02-25 06:53:43 UTC
Can you see the CPUs are added as expected?

Comment 30 xiagao 2019-02-25 07:06:53 UTC
(In reply to ybendito from comment #29)
> Can you see the CPUs are added as expected?

yes, it's added as expected.

Comment 31 Yan Vugenfirer 2019-02-25 09:46:49 UTC
(In reply to xiagao from comment #30)
> (In reply to ybendito from comment #29)
> > Can you see the CPUs are added as expected?
> 
> yes, it's added as expected.

Thanks a lot for the tests!

Because we want to be completely sure we don't have any issues with Windows Server 2019 and CPU hotplug before we close the case with MS, could you please test on other hosts as well?
If possible please test on at least one AMD and one Intel host.

Comment 32 xiagao 2019-02-25 10:30:08 UTC
No crash dump and cpus are added as expected with and without xsave flag.

amd machine cpu info:

processor	: 63
vendor_id	: AuthenticAMD
cpu family	: 23
model		: 1
model name	: AMD EPYC 7351 16-Core Processor
stepping	: 2
microcode	: 0x8001227
cpu MHz		: 2462.128
cache size	: 512 KB
physical id	: 1
siblings	: 32
core id		: 29
cpu cores	: 16
apicid		: 123
initial apicid	: 123
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
bugs		: sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 4767.74
TLB size	: 2560 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

intel machine:
processor	: 39
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
stepping	: 1
microcode	: 0xb00001e
cpu MHz		: 2385.103
cache size	: 25600 KB
physical id	: 1
siblings	: 20
core id		: 12
cpu cores	: 10
apicid		: 57
initial apicid	: 57
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 4399.36
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Comment 33 Yan Vugenfirer 2019-02-25 11:04:16 UTC
Thanks!

Comment 35 xiagao 2019-03-04 02:13:15 UTC
Verify this bug with KB4482887 .


Note You need to log in before you can comment on or make changes to this bug.