Bug 1915715
| Summary: | Windows 2016 guest will reboot/hang/quit after hot plug a vcpu with large memory | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Peixiu Hou <phou> |
| Component: | qemu-kvm | Assignee: | Vadim Rozenfeld <vrozenfe> |
| qemu-kvm sub component: | Devices | QA Contact: | liunana <nanliu> |
| Status: | CLOSED WONTFIX | Docs Contact: | Jiri Herrmann <jherrman> |
| Severity: | high | ||
| Priority: | high | CC: | ailan, chayang, imammedo, jherrman, lijin, phou, virt-maint, vrozenfe, ybendito, yuhuang, yvugenfi |
| Version: | unspecified | Keywords: | Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Windows | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: |
.Windows Server 2016 VMs sometimes stops working after hot-plugging a vCPU
Currently, assigning a vCPU to a running virtual machine (VM) with a Windows Server 2016 guest operating system might cause a variety of problems, such as the VM terminating unexpectedly, becoming unresponsive, or rebooting. There is currently no workaround for this issue.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-07-13 07:28:26 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Reproduced with qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f. Igor, would you please help confirm if it's qemu issue or win2016 issue? (In reply to Yumei Huang from comment #1) > Reproduced with qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f. > > Igor, would you please help confirm if it's qemu issue or win2016 issue? I'll try to run it under debugger to see if it's ACPI issue. Meanwhile could you test several other combinations? 1. did the same guest image works with RHEL8.0? 2. could you try to test with vanilla win2016 (without updates on RHEL8.0 and the latest QEMU) Do we have the same problem with all hyper-v flags disabled? Thanks, Vadim. (In reply to Igor Mammedov from comment #2) > (In reply to Yumei Huang from comment #1) > > Reproduced with qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f. > > > > Igor, would you please help confirm if it's qemu issue or win2016 issue? > > I'll try to run it under debugger to see if it's ACPI issue. > > Meanwhile could you test several other combinations? > 1. did the same guest image works with RHEL8.0? I got same results on RHEL8.0. After hotplug one vcpu, 1) sometimes guest hang immediately. 2) sometimes guest keep working, the number of processors in device manager are correct, but in task manager and cmd 'echo %NUMBER_OF_PROCESSORS%', the number is same as before hotplug. Then hotplug another one vcpu, guest still looks fine, but the processors number doesn't change in either device manager or task manager or cmd. A few minutes later, guest may hit bsod, stop code is IRQL NOT LESS OR EQUAL. qemu-kvm-3.1.0-20.module+el8.0.0+3273+6bc1ee54.1 kernel-4.18.0-80.el8.x86_64 (In reply to Igor Mammedov from comment #2) > (In reply to Yumei Huang from comment #1) > 2. could you try to test with vanilla win2016 (without updates on RHEL8.0 > and the latest QEMU) What do you mean by vanilla win2016? In my test, the iso is en_windows_server_2016_updated_feb_2018_x64_dvd_11636692.iso, it includes KB4048953 and KB4049065 two updates, and they can't be uninstalled. As to the latest QEMU, do you mean the latest package for rhel8.0 or rhel8.4? (In reply to Vadim Rozenfeld from comment #4) > Do we have the same problem with all hyper-v flags disabled? > > Thanks, > Vadim. Yes, reproduced with no hyper-v flags. (In reply to Yumei Huang from comment #6) > (In reply to Igor Mammedov from comment #2) > > (In reply to Yumei Huang from comment #1) > > 2. could you try to test with vanilla win2016 (without updates on RHEL8.0 > > and the latest QEMU) > > What do you mean by vanilla win2016? In my test, the iso is > en_windows_server_2016_updated_feb_2018_x64_dvd_11636692.iso, it includes > KB4048953 and KB4049065 two updates, and they can't be uninstalled. I meant win2016 without updates (use first released ISO image, not updated one) > As to the latest QEMU, do you mean the latest package for rhel8.0 or rhel8.4? 8.4 (In reply to Igor Mammedov from comment #8) > (In reply to Yumei Huang from comment #6) > > (In reply to Igor Mammedov from comment #2) > > > (In reply to Yumei Huang from comment #1) > > > 2. could you try to test with vanilla win2016 (without updates on RHEL8.0 > > > and the latest QEMU) > > > > What do you mean by vanilla win2016? In my test, the iso is > > en_windows_server_2016_updated_feb_2018_x64_dvd_11636692.iso, it includes > > KB4048953 and KB4049065 two updates, and they can't be uninstalled. > > I meant win2016 without updates (use first released ISO image, not updated > one) > > > As to the latest QEMU, do you mean the latest package for rhel8.0 or rhel8.4? > > 8.4 Still hit the same issue. ISO: en_windows_server_2016_x64_dvd_9327751.iso Windows 2016 os build version: 10.0.14393 Host kernel: 4.18.0-80.el8.x86_64 qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08 Amnon - moving from the virt-maint untriaged backlog into your queue for assignment (In reply to Peixiu Hou from comment #0) ... > Steps to Reproduce: ... > 2. Try to install 'HID Button over Interrupt Driver': > D:\devcon\win7_amd64\devcon.exe install C:\Windows\INF\machine.inf > "ACPI\VEN_ACPI&DEV_0010" why do you do that? it doesn't reproduce on my host. it's running 4.18.0-249.el8.x86_64 kernel and host CPU is older than skylake. Can you try with that kernel and also try to change CPU model to haswell to see if that helps. It it doesn't help, I'll need access to the host where it reproduces. (In reply to Yumei Huang from comment #13) > (In reply to Igor Mammedov from comment #12) > > > > (In reply to Peixiu Hou from comment #0) > > ... > > > Steps to Reproduce: > > ... > > > 2. Try to install 'HID Button over Interrupt Driver': > > > D:\devcon\win7_amd64\devcon.exe install C:\Windows\INF\machine.inf > > > "ACPI\VEN_ACPI&DEV_0010" > > > > why do you do that? > > It's a workaround for windows 2016. "Processors" only appears in device > manager after install the driver. So I've tried several images on several AMD hosts. above command might work or might not depending on version (and installed updates) Instead I've used https://bugzilla.redhat.com/show_bug.cgi?id=1377155#c33 or alternatively CLI Yan's https://bugzilla.redhat.com/show_bug.cgi?id=1377155#c17 > > it doesn't reproduce on my host. > > it's running 4.18.0-249.el8.x86_64 kernel and host CPU is older than skylake. > > > > > > Can you try with that kernel and also try to change CPU model to haswell to > > see if that helps. > > It it doesn't help, I'll need access to the host where it reproduces. > > Tried with kernel -249, it's reproducible. And I tested with other cpu > models before, including haswell, it's not helping. > ... I'm done with testing on your host. As mentioned previously it worked for me on RHEL8.4 Intel host, your's is AMD though. So I've tried clean install using en_windows_server_2016_x64_dvd_9718492.iso on your host and another AMD host. Both worked fine wrt CPU hotplug (after replacing HID driver with 'Generic Bus') Then I've updated guest with latest updates from MS, and even after that it still worked as expected. (modulo updates returned HID driver, so I had to replace driver again) Then I've copied your guest image to Intel host, and it still has reboot problem. At this point I'd rule out QEMU and KVM as culprit, it seems that your guest image is the problem. I can suggest to start testing from clean install and then repeating all configuration steps with testing after each step. (perhaps there is an issue with installed software/drivers) PS: You are testing on AMD host an Intel CPU model (it might work but it also might confuse guest, I'd advise not doing so) and what you get only approximation of that CPU model as many feature bits aren't available (as you can see at QEMU start up) so request CPU model isn't actually tested. PS: your guest images crashes in a way that it doesn't trap into attached debugger (so I have no idea where it crashes), perhaps windows drivers team has more experience debugging Windows. Vadim, can you take this BZ (or reassign it to someone in your team), to see if it's virtio-win or Windows issue. Can QE try dumping the VM memory into a file with dump-guest-memory command the next time when VM hangs out? Can I also see the system event log for review? Thanks, Vadim. (In reply to Igor Mammedov from comment #14) > (In reply to Yumei Huang from comment #13) > > (In reply to Igor Mammedov from comment #12) > > > > > > (In reply to Peixiu Hou from comment #0) > > > ... > > > > Steps to Reproduce: > > > ... > > > > 2. Try to install 'HID Button over Interrupt Driver': > > > > D:\devcon\win7_amd64\devcon.exe install C:\Windows\INF\machine.inf > > > > "ACPI\VEN_ACPI&DEV_0010" > > > > > > why do you do that? > > > > It's a workaround for windows 2016. "Processors" only appears in device > > manager after install the driver. > > So I've tried several images on several AMD hosts. > > above command might work or might not depending on version (and installed > updates) > Instead I've used > https://bugzilla.redhat.com/show_bug.cgi?id=1377155#c33 > or alternatively CLI Yan's > https://bugzilla.redhat.com/show_bug.cgi?id=1377155#c17 > > > > > it doesn't reproduce on my host. > > > it's running 4.18.0-249.el8.x86_64 kernel and host CPU is older than skylake. > > > > > > > > > Can you try with that kernel and also try to change CPU model to haswell to > > > see if that helps. > > > It it doesn't help, I'll need access to the host where it reproduces. > > > > Tried with kernel -249, it's reproducible. And I tested with other cpu > > models before, including haswell, it's not helping. > > > ... > > I'm done with testing on your host. > > As mentioned previously it worked for me on RHEL8.4 Intel host, > your's is AMD though. So I've tried clean install using > en_windows_server_2016_x64_dvd_9718492.iso > on your host and another AMD host. > Both worked fine wrt CPU hotplug (after replacing HID driver with 'Generic > Bus') > Then I've updated guest with latest updates from MS, and even after that it > still worked as expected. (modulo updates returned HID driver, so I had to > replace driver again) > > Then I've copied your guest image to Intel host, > and it still has reboot problem. > > At this point I'd rule out QEMU and KVM as culprit, > it seems that your guest image is the problem. > > I can suggest to start testing from clean install and then repeating > all configuration steps with testing after each step. > (perhaps there is an issue with installed software/drivers) > > PS: > You are testing on AMD host an Intel CPU model (it might work but it also > might confuse guest, I'd advise not doing so) > and what you get only approximation of that CPU model as many feature bits > aren't available (as you can see at QEMU start up) > so request CPU model isn't actually tested. Thanks for all the work. Just wanna clarify that I tested on both Intel and AMD hosts with corresponding CPU models. This host is just one of them, AMD cpu model is used when testing on this host. Thanks. > > PS: > your guest images crashes in a way that it doesn't trap into attached > debugger (so I have no idea where it crashes), > perhaps windows drivers team has more experience debugging Windows. (In reply to Vadim Rozenfeld from comment #16) > Can QE try dumping the VM memory into a file with dump-guest-memory > command the next time when VM hangs out? Can I also see the system event > log for review? > > Thanks, > Vadim. Hi Peixiu, Would you please take this bug and help handle the request? It seems to be virtio-win or Windows issue, and you have more experience in windows. Thanks! Hit the same issue on Hyper_v on 2016,
pkg:
kernel-4.18.0-325.el8.x86_64
qemu-kvm-4.2.0-56.module+el8.5.0+12039+0434c559.x86_64
seabios-bin-1.13.0-2.module+el8.3.0+7353+9de0a3cc.noarch
virtio-win-prewhql-205
RHEL-8.5.0-20210730.n.0
How reproducible:
10/10
autocase:
hv_cpu_hotplug
Hit the same issue on vioscsi on win2012, pkg: kernel-4.18.0-330.el8.x86_64 qemu-kvm-6.0.0-28.module+el8.5.0+12271+fffa967b.x86_64 seabios-bin-1.13.0-2.module+el8.3.0+7353+9de0a3cc.noarch RHEL-8.5.0-20210816.n.0 How reproducible: 100% Hit the same issue when run vioscsi function testing on 2016,
pkg:
qemu-kvm-6.0.0-28.module+el8.5.0+12271+fffa967b.x86_64
kernel-4.18.0-330.el8.x86_64
seabios-1.14.0-1.module+el8.4.0+8855+a9e237a9.x86_64
RHEL-8.5.0-20210816.n.0
case:
cpu_device_hotpluggable.hotplug.single_vcpu.with_reboot.shell_reboot
(In reply to ChenNana from comment #26) > Hit the same issue on vioscsi on win2012, > > pkg: > kernel-4.18.0-330.el8.x86_64 > qemu-kvm-6.0.0-28.module+el8.5.0+12271+fffa967b.x86_64 > seabios-bin-1.13.0-2.module+el8.3.0+7353+9de0a3cc.noarch > RHEL-8.5.0-20210816.n.0 > > How reproducible: > 100% Reproduced it three times manually, and this problem occurred (huest will hang after plugging). The three versions are: 1.virtio-win-prewhql-0.1-207.iso 2.virtio-win-prewhql-0.1-204.iso 3.virtio-win-1.9.17-4.el8_4.iso (In reply to ChenNana from comment #28) > (In reply to ChenNana from comment #26) > > Hit the same issue on vioscsi on win2012, > > > > pkg: > > kernel-4.18.0-330.el8.x86_64 > > qemu-kvm-6.0.0-28.module+el8.5.0+12271+fffa967b.x86_64 > > seabios-bin-1.13.0-2.module+el8.3.0+7353+9de0a3cc.noarch > > RHEL-8.5.0-20210816.n.0 > > > > How reproducible: > > 100% > > Reproduced it three times manually, and this problem occurred (huest will > hang after plugging). The three versions are: > 1.virtio-win-prewhql-0.1-207.iso > 2.virtio-win-prewhql-0.1-204.iso > 3.virtio-win-1.9.17-4.el8_4.iso Please ignore this comment, my problem is not the same as this one Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Didn't reproduce this issue with win2022. Test Env: dell-per750-22.lab.eng.pek2.redhat.com 5.14.0-162.el9.x86_64 qemu-kvm-7.1.0-1.el9.x86_64 edk2-ovmf-20220526git16779ede2d36-3.el9.noarch windows_server_2022_x64_testsigned_enable_dvd.iso Hi Peixiu, Did you meet such issue with newer windows version than win2016? Thanks. Best regards Nana Liu (In reply to liunana from comment #37) > Didn't reproduce this issue with win2022. > > Test Env: > dell-per750-22.lab.eng.pek2.redhat.com > 5.14.0-162.el9.x86_64 > qemu-kvm-7.1.0-1.el9.x86_64 > edk2-ovmf-20220526git16779ede2d36-3.el9.noarch > windows_server_2022_x64_testsigned_enable_dvd.iso > > Hi Peixiu, > > Did you meet such issue with newer windows version than win2016? > Thanks. > Hi Nana, Not hit this issue on newer windows version~ Thanks~ Peixiu > > Best regards > Nana Liu Hi Gabi and Jiri, Looks good to me, thanks a lot~ Best Regards~ Peixiu |
Description of problem: Tested with -m 14336 on win2016 guest vm. 1. Windows 2016 guest will reboot immediately after hot plug a vcpu, checked the installed windows updates as: KB4048953, KB4049065. 2. Checked the available updates and install them, then tried to test again, windows 2016 guest will hang after hot plug a vcpu, checked the installed windows updates as: KB4576750, KB4593226, KB4598243. 3. Checked the available updates again, found there has a 2021-1-11 updated patch "2021-01 Cumulative Update for Windows Server 2016 for x64-based Systems (KB4598243)" and some others, downloaded and installed them all, tried to test again, after hot plug a vcpu, wait a few minutes or try to reboot the guest, windows 2016 guest will hang or quit. Checked the installed windows updates as: KB4598243,KB4535680,KB4576750,KB4049065 Version-Release number of selected component (if applicable): kernel-4.18.0-240.5.1.el8_3.x86_64 qemu-kvm-5.1.0-15.module+el8.3.1+8772+a3fdeccd.x86_64 virtio-win-1.9.15-0.el8 seabios-1.14.0-1.module+el8.3.0+7638+07cf13d2.x86_64 How reproducible: 100% Steps to Reproduce: 1.Boot the guest with qemu commands: --------------------------------------------------------------------------------------------- /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pc \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -m 14336 \ -smp 2,maxcpus=4,cores=2,threads=1,dies=1,sockets=2 \ -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,+kvm_pv_unhalt \ -device pvpanic,ioport=0x505,id=idASHu6b \ -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=4,bus=pci.0,addr=0x4 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/win2016-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device virtio-net-pci,mac=45:df:62:20:54:41,id=ida7fdGT,netdev=idsWmwuh,bus=pci.0,addr=0x5 \ -netdev tap,id=idsWmwuh,vhost=on \ -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \ -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -qmp tcp:0:4445,server,nowait \ -enable-kvm \ -monitor stdio \ -cdrom /home/kvm_autotest_root/iso/windows/virtio-win-1.9.15-0.el8.iso 2. Try to install 'HID Button over Interrupt Driver': D:\devcon\win7_amd64\devcon.exe install C:\Windows\INF\machine.inf "ACPI\VEN_ACPI&DEV_0010" 3. #telnet host_ip 4445 {'execute': 'qmp_capabilities', 'id': 'i7jIHH13'} {"return": {}, "id": "i7jIHH13"} {'execute': 'device_add', 'arguments': {'id':'vcpu1','driver': 'Skylake-Server-x86_64-cpu', 'socket-id':1, 'die-id': 0, 'core-id':0, 'thread-id':0},'id': 'UBjvk5E2'} 4. Check vcpu number inside guest. 5. Check the guest status(may need a few minutes or can try to reboot the guest), will reboot/hang/quit. Actual results: Windows 2016 reboot/hang/quit after hot plug a vcpu Expected results: No reboot/hang/quit happen after cpu hotplug Additional info: Tried on latest versions, also hit this issue. versions info as: kernel-4.18.0-240.11.1.el8_3.x86_64 qemu-kvm-5.1.0-17.module+el8.3.1+9213+7ace09c3.x86_64 seabios-bin-1.14.0-1.module+el8.3.0+7638+07cf13d2.noarch virtio-win-1.9.15-0.el8