Bug 1445603

Summary: Windows 2016 guest will crash after hot plug one vcpu
Product: Red Hat Enterprise Linux 8 Reporter: Guo, Zhiyi <zhguo>
Component: qemu-kvmAssignee: ybendito
qemu-kvm sub component: Devices QA Contact: Yumei Huang <yuhuang>
Status: CLOSED WONTFIX Docs Contact:
Severity: high    
Priority: high CC: ailan, chayang, imammedo, jinzhao, juzhang, knappch, knoel, lijin, mkalinin, phou, rbalakri, virt-maint, ybendito, yvugenfi
Version: 8.0Keywords: Reopened
Target Milestone: rc   
Target Release: 8.0   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-01 07:28:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1377155    
Bug Blocks: 1473046, 1558351, 1649160, 1746622    

Description Guo, Zhiyi 2017-04-26 06:04:20 UTC
Description of problem:
Boot windows 2016 guest with 2GB or less memory, guest will crash after hot plug one vcpu

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-1.el7.x86_64
3.10.0-655.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot windows 2016 guest with cli:
/usr/libexec/qemu-kvm -name win2016 -m 2G -machine pc,accel=kvm\
	-S \
        -cpu qemu64,enforce \
        -smp 1,maxcpus=4 \
        -vnc :0 \
        -monitor stdio \
        -device VGA \
        -serial unix:/tmp/console,server,nowait \
        -drive file=/home/test1.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop  -device ide-drive,drive=drive-scsi-disk0 \
	-netdev tap,id=idinWyYp,vhost=on -device e1000,mac=42:ce:a9:d2:4d:d7,id=idlbq7eA,netdev=idinWyYp \
	-qmp tcp:0:4444,server,nowait \

2.After guest boot, hot plug one vcpu through qmp:
{ "execute": "qmp_capabilities" }
{ "execute": "device_add","arguments":{"driver":"qemu64-x86_64-cpu","core-id": 0, "thread-id":0, "socket-id": 1,"id":"core1"}}
3.Check vcpu number inside guest

Actual results:
Guest will reboot immediately.

Expected results:
No reboot happen after cpu hotplug

Additional info:
No such issue happen if boot guest with 4G or above ram.No such issue happen to windows 10.

Comment 2 Igor Mammedov 2017-04-26 16:22:14 UTC
One probably needs to apply workaround to WS2016 for broken by default CPU hotplug
 https://bugzilla.redhat.com/show_bug.cgi?id=1377155#c17
to trigger the crash, otherwise windows won't even try to online hotplugged cpu.

Comment 4 Igor Mammedov 2017-04-26 16:48:32 UTC
Bug reproduces in both KVM and TCG modes, and according to KVM trace, hotplugged CPU wakes up but then during bring up it goes into triple fault and guest reboots.

Googling also shows that the same regression happens on vmware hosts.

Comment 8 ybendito 2019-06-13 08:53:04 UTC
There is latest (announced June 11) cumulative update for 2016 KB4503267.
It was probably was expected to solve this problem and reboot does not happen upon cpu-add.
But the CPU does not work, PnP operation does not finish and the system stops working correctly.
I've running the qemu as '-smp 2,maxcpus=4,sockets=4,cores=1,threads=1', then add 3rd cpu as 'cpu-add 2'
msinfo32 does not work, taskmgr does not show tasks, shutdown/reboot stucks.
All this happens when memory size set to 2G(2048M)
When it is set to 2080M - cpu is added correctly.
Note that the same thing happens with 'core' server (without desktop experience), which does not declare 2G as minimal amount of memory.
I'm going to open a support ticket at Microsoft.

Comment 9 ybendito 2019-06-13 09:21:52 UTC
Support request 119061321000566

Comment 10 ybendito 2019-07-04 13:17:58 UTC
According to Microsoft feedback: 
"the issue initially reported is in effect by a bug that affect Windows 2016 (it was solved in Windows 2019 in the KB4482887) that needs to be solved as soon as possible. According my notes from the develop team the solution for this bug is planned to be published with the last hotfix KB next month of August"
So, we will put this on hold till August and will check it with next cumulative update of 2016.

Comment 12 Igor Mammedov 2019-07-23 13:13:37 UTC
Reopening it to RHEL8, to keep track on a fix from Microsoft side.

Comment 13 Marina Kalinin 2019-09-06 19:56:07 UTC
Is it even realistic scenario when Windows machine has only 2G of RAM? I see they recommend minimum 512M. But from my experience, usually it takes 4G+ to make things working.

Comment 15 Ademar Reis 2020-02-05 22:43:32 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 18 Yumei Huang 2020-11-19 02:53:48 UTC
The issue still exists on 8.3-av. 
(In reply to ybendito from comment #10)
> According to Microsoft feedback: 
> "the issue initially reported is in effect by a bug that affect Windows 2016
> (it was solved in Windows 2019 in the KB4482887) that needs to be solved as
> soon as possible. According my notes from the develop team the solution for
> this bug is planned to be published with the last hotfix KB next month of
> August"
> So, we will put this on hold till August and will check it with next
> cumulative update of 2016.

Hi Yuri, 

Seems KB4482887 is only provided for windows 10 and 2019 according to [1]. Would you please double check if they will fix windows 2016? Thanks.


[1] https://www.catalog.update.microsoft.com/Search.aspx?q=KB4482887

Comment 19 RHEL Program Management 2020-12-01 07:28:38 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 21 Yvugenfi@redhat.com 2020-12-21 13:15:40 UTC
As we cannot force MS to release a hotfix for Windows Server 2016, closing based on comment 18.

Comment 22 Peixiu Hou 2021-01-08 03:14:17 UTC
Also hit this issue with -m 14336 (mem>4G) on win2016 guest vm.

1. Boot win2016 with qemu commands:
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2 \
    -m 14336  \
    -smp 2,maxcpus=4,cores=2,threads=1,dies=1,sockets=2  \
    -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,+kvm_pv_unhalt \
    -device pvpanic,ioport=0x505,id=idASHu6b \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=4,bus=pci.0,addr=0x4 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/win2016-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device virtio-net-pci,mac=9a:cf:62:20:54:41,id=ida7fdGT,netdev=idsWmwuh,bus=pci.0,addr=0x5  \
    -netdev tap,id=idsWmwuh,vhost=on \
    -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \
    -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on  \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -qmp tcp:0:4445,server,nowait \
    -enable-kvm \
    -monitor stdio

2. {'execute': 'qmp_capabilities', 'id': 'i7jIHH13'}
{"return": {}, "id": "i7jIHH13"}
{'execute': 'device_add', 'arguments': {'id':'vcpu1','driver': 'Skylake-Server-x86_64-cpu', 'socket-id':1, 'die-id': 0, 'core-id':0, 'thread-id':0},'id': 'UBjvk5E2'}
{"return": {}, "id": "UBjvk5E2"}
{"timestamp": {"seconds": 1610025648, "microseconds": 249453}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1610025648, "microseconds": 262512}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1610025663, "microseconds": 181241}, "event": "RTC_CHANGE", "data": {"offset": 30512}}
{"timestamp": {"seconds": 1610025663, "microseconds": 181566}, "event": "RTC_CHANGE", "data": {"offset": 42932}}

3. Check the guest vm, guest reset immediately after hot added a cpu.

used versions:
kernel-4.18.0-240.5.1.el8_3.x86_64
qemu-kvm-5.1.0-15.module+el8.3.1+8772+a3fdeccd.x86_64
virtio-win-1.9.15-0.el8
seabios-1.14.0-1.module+el8.3.0+7638+07cf13d2.x86_64

Best Regards~
Peixiu

Comment 23 Yumei Huang 2021-01-08 03:44:35 UTC
Hi Yuri, 

Would you please have a look at comment 22? The guest memory is more than 4G, is it the same issue in windows 2016? Thanks!

Comment 24 ybendito 2021-01-08 12:59:05 UTC
Peixiu, do you have the VM fully updated?

Comment 25 Peixiu Hou 2021-01-12 14:20:38 UTC
(In reply to ybendito from comment #24)
> Peixiu, do you have the VM fully updated?

Hi Yuri,

Sorry for late reply, for comment#22 result, did not with fully updates.
I tried to check latest updates and installed them today, There were 3 updates installed, KB4593226, KB4576750, KB4049065, rerun this job, guest vm hang after the cpu hotplugged.

Thanks~
Peixiu

Comment 26 Peixiu Hou 2021-01-12 14:26:44 UTC
qemu command line as:
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2 \
    -m 14336  \
    -smp 2,maxcpus=4,cores=2,threads=1,dies=1,sockets=2  \
    -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,+kvm_pv_unhalt \
    -chardev socket,nowait,path=/tmp/avocado_vjgk4b08/monitor-qmpmonitor1-20210112-090715-W7219jUP,id=qmp_id_qmpmonitor1,server  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,nowait,path=/tmp/avocado_vjgk4b08/monitor-catch_monitor-20210112-090715-W7219jUP,id=qmp_id_catch_monitor,server  \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idMxL9VE \
    -chardev socket,nowait,path=/tmp/avocado_vjgk4b08/serial-serial0-20210112-090715-W7219jUP,id=chardev_serial0,server \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20210112-090715-W7219jUP,path=/tmp/avocado_vjgk4b08/seabios-20210112-090715-W7219jUP,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20210112-090715-W7219jUP,iobase=0x402 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=4,bus=pci.0,addr=0x4 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/win2016-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device virtio-net-pci,mac=9a:3b:a1:89:70:38,id=idckNDqD,netdev=idZlnS4u,bus=pci.0,addr=0x5  \
    -netdev tap,id=idZlnS4u,vhost=on,vhostfd=20,fd=14 \
    -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \
    -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on  \
    -vnc :1  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm

Comment 27 ybendito 2021-01-12 14:45:25 UTC
Please open a BZ for that. This BZ is for hotplug with small memory size.
For new BZ please specify the qemu version, we'll need to check whether this is a regression of qemu or not.
Probably such test was done in the past for 2016.

Comment 28 Peixiu Hou 2021-01-13 10:28:00 UTC
(In reply to ybendito from comment #27)
> Please open a BZ for that. This BZ is for hotplug with small memory size.
> For new BZ please specify the qemu version, we'll need to check whether this
> is a regression of qemu or not.
> Probably such test was done in the past for 2016.

Ok, filed a new Bug 1915715        - Windows 2016 guest will reboot/hang/quit after hot plug a vcpu with large memory

Thanks a lot~
Peixiu