RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1915715 - Windows 2016 guest will reboot/hang/quit after hot plug a vcpu with large memory
Summary: Windows 2016 guest will reboot/hang/quit after hot plug a vcpu with large memory
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: unspecified
Hardware: x86_64
OS: Windows
high
high
Target Milestone: rc
: ---
Assignee: Vadim Rozenfeld
QA Contact: liunana
Jiri Herrmann
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-13 10:15 UTC by Peixiu Hou
Modified: 2023-05-03 16:54 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.Windows Server 2016 VMs sometimes stops working after hot-plugging a vCPU Currently, assigning a vCPU to a running virtual machine (VM) with a Windows Server 2016 guest operating system might cause a variety of problems, such as the VM terminating unexpectedly, becoming unresponsive, or rebooting.
Clone Of:
Environment:
Last Closed: 2022-07-13 07:28:26 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Peixiu Hou 2021-01-13 10:15:32 UTC
Description of problem:
Tested with -m 14336 on win2016 guest vm.
1. Windows 2016 guest will reboot immediately after hot plug a vcpu, checked the installed windows updates as: KB4048953, KB4049065.
2. Checked the available updates and install them, then tried to test again, windows 2016 guest will hang after hot plug a vcpu, checked the installed windows updates as: KB4576750, KB4593226, KB4598243.
3. Checked the available updates again, found there has a 2021-1-11 updated patch "2021-01 Cumulative Update for Windows Server 2016 for x64-based Systems (KB4598243)" and some others, downloaded and installed them all, tried to test again, after hot plug a vcpu, wait a few minutes or try to reboot the guest, windows 2016 guest will hang or quit. Checked the installed windows updates as: KB4598243,KB4535680,KB4576750,KB4049065

Version-Release number of selected component (if applicable):
kernel-4.18.0-240.5.1.el8_3.x86_64
qemu-kvm-5.1.0-15.module+el8.3.1+8772+a3fdeccd.x86_64
virtio-win-1.9.15-0.el8
seabios-1.14.0-1.module+el8.3.0+7638+07cf13d2.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot the guest with qemu commands:
---------------------------------------------------------------------------------------------
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2 \
    -m 14336  \
    -smp 2,maxcpus=4,cores=2,threads=1,dies=1,sockets=2  \
    -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0xfff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,+kvm_pv_unhalt \
    -device pvpanic,ioport=0x505,id=idASHu6b \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=4,bus=pci.0,addr=0x4 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/win2016-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device virtio-net-pci,mac=45:df:62:20:54:41,id=ida7fdGT,netdev=idsWmwuh,bus=pci.0,addr=0x5  \
    -netdev tap,id=idsWmwuh,vhost=on \
    -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \
    -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on  \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -qmp tcp:0:4445,server,nowait \
    -enable-kvm \
    -monitor stdio \
    -cdrom /home/kvm_autotest_root/iso/windows/virtio-win-1.9.15-0.el8.iso

2. Try to install 'HID Button over Interrupt Driver':
D:\devcon\win7_amd64\devcon.exe install C:\Windows\INF\machine.inf "ACPI\VEN_ACPI&DEV_0010"

3. #telnet host_ip 4445
{'execute': 'qmp_capabilities', 'id': 'i7jIHH13'}
{"return": {}, "id": "i7jIHH13"}
{'execute': 'device_add', 'arguments': {'id':'vcpu1','driver': 'Skylake-Server-x86_64-cpu', 'socket-id':1, 'die-id': 0, 'core-id':0, 'thread-id':0},'id': 'UBjvk5E2'}

4. Check vcpu number inside guest.
5. Check the guest status(may need a few minutes or can try to reboot the guest), will reboot/hang/quit.

Actual results:
Windows 2016 reboot/hang/quit after hot plug a vcpu

Expected results:
No reboot/hang/quit happen after cpu hotplug

Additional info:
Tried on latest versions, also hit this issue. versions info as:
kernel-4.18.0-240.11.1.el8_3.x86_64
qemu-kvm-5.1.0-17.module+el8.3.1+9213+7ace09c3.x86_64
seabios-bin-1.14.0-1.module+el8.3.0+7638+07cf13d2.noarch
virtio-win-1.9.15-0.el8

Comment 1 Yumei Huang 2021-01-18 11:07:58 UTC
Reproduced with qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f.

Igor, would you please help confirm if it's qemu issue or win2016 issue?

Comment 2 Igor Mammedov 2021-01-18 19:49:43 UTC
(In reply to Yumei Huang from comment #1)
> Reproduced with qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f.
> 
> Igor, would you please help confirm if it's qemu issue or win2016 issue?

I'll try to run it under debugger to see if it's ACPI issue.

Meanwhile could you test several other combinations?
1. did the same guest image works with RHEL8.0?
2. could you try to test with vanilla win2016 (without updates on RHEL8.0 and the latest QEMU)

Comment 4 Vadim Rozenfeld 2021-01-20 10:42:31 UTC
Do we have the same problem with all hyper-v flags disabled? 

Thanks,
Vadim.

Comment 5 Yumei Huang 2021-01-21 07:11:50 UTC
(In reply to Igor Mammedov from comment #2)
> (In reply to Yumei Huang from comment #1)
> > Reproduced with qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f.
> > 
> > Igor, would you please help confirm if it's qemu issue or win2016 issue?
> 
> I'll try to run it under debugger to see if it's ACPI issue.
> 
> Meanwhile could you test several other combinations?
> 1. did the same guest image works with RHEL8.0?

I got same results on RHEL8.0. 

After hotplug one vcpu,
1) sometimes guest hang immediately. 
2) sometimes guest keep working, the number of processors in device manager are correct, but in task manager and cmd 'echo %NUMBER_OF_PROCESSORS%', the number is same as before hotplug. Then hotplug another one vcpu, guest still looks fine, but the processors number doesn't change in either device manager or task manager or cmd. A few minutes later, guest may hit bsod, stop code is IRQL NOT LESS OR EQUAL.

qemu-kvm-3.1.0-20.module+el8.0.0+3273+6bc1ee54.1 
kernel-4.18.0-80.el8.x86_64

Comment 6 Yumei Huang 2021-01-21 07:29:34 UTC
(In reply to Igor Mammedov from comment #2)
> (In reply to Yumei Huang from comment #1)
> 2. could you try to test with vanilla win2016 (without updates on RHEL8.0
> and the latest QEMU)

What do you mean by vanilla win2016?  In my test, the iso is en_windows_server_2016_updated_feb_2018_x64_dvd_11636692.iso, it includes KB4048953 and KB4049065 two updates, and they can't be uninstalled.
As to the latest QEMU, do you mean the latest package for rhel8.0 or rhel8.4?

Comment 7 Yumei Huang 2021-01-21 07:34:30 UTC
(In reply to Vadim Rozenfeld from comment #4)
> Do we have the same problem with all hyper-v flags disabled? 
> 
> Thanks,
> Vadim.

Yes, reproduced with no hyper-v flags.

Comment 8 Igor Mammedov 2021-01-21 09:47:39 UTC
(In reply to Yumei Huang from comment #6)
> (In reply to Igor Mammedov from comment #2)
> > (In reply to Yumei Huang from comment #1)
> > 2. could you try to test with vanilla win2016 (without updates on RHEL8.0
> > and the latest QEMU)
> 
> What do you mean by vanilla win2016?  In my test, the iso is
> en_windows_server_2016_updated_feb_2018_x64_dvd_11636692.iso, it includes
> KB4048953 and KB4049065 two updates, and they can't be uninstalled.

I meant win2016 without updates (use first released ISO image, not updated one)

> As to the latest QEMU, do you mean the latest package for rhel8.0 or rhel8.4?

8.4

Comment 9 Yumei Huang 2021-01-22 05:53:30 UTC
(In reply to Igor Mammedov from comment #8)
> (In reply to Yumei Huang from comment #6)
> > (In reply to Igor Mammedov from comment #2)
> > > (In reply to Yumei Huang from comment #1)
> > > 2. could you try to test with vanilla win2016 (without updates on RHEL8.0
> > > and the latest QEMU)
> > 
> > What do you mean by vanilla win2016?  In my test, the iso is
> > en_windows_server_2016_updated_feb_2018_x64_dvd_11636692.iso, it includes
> > KB4048953 and KB4049065 two updates, and they can't be uninstalled.
> 
> I meant win2016 without updates (use first released ISO image, not updated
> one)
> 
> > As to the latest QEMU, do you mean the latest package for rhel8.0 or rhel8.4?
> 
> 8.4

Still hit the same issue.

ISO: en_windows_server_2016_x64_dvd_9327751.iso 
Windows 2016 os build version: 10.0.14393
Host kernel: 4.18.0-80.el8.x86_64
qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08

Comment 10 John Ferlan 2021-01-22 13:26:45 UTC
Amnon - moving from the virt-maint untriaged backlog into your queue for assignment

Comment 12 Igor Mammedov 2021-01-26 00:58:13 UTC

(In reply to Peixiu Hou from comment #0)
...
> Steps to Reproduce:
...
> 2. Try to install 'HID Button over Interrupt Driver':
> D:\devcon\win7_amd64\devcon.exe install C:\Windows\INF\machine.inf
> "ACPI\VEN_ACPI&DEV_0010"

why do you do that?

it doesn't reproduce on my host.
it's running 4.18.0-249.el8.x86_64 kernel and host CPU is older than skylake.


Can you try with that kernel and also try to change CPU model to haswell to see if that helps.
It it doesn't help, I'll need access to the host where it reproduces.

Comment 14 Igor Mammedov 2021-01-29 17:35:46 UTC
(In reply to Yumei Huang from comment #13)
> (In reply to Igor Mammedov from comment #12)
> > 
> > (In reply to Peixiu Hou from comment #0)
> > ...
> > > Steps to Reproduce:
> > ...
> > > 2. Try to install 'HID Button over Interrupt Driver':
> > > D:\devcon\win7_amd64\devcon.exe install C:\Windows\INF\machine.inf
> > > "ACPI\VEN_ACPI&DEV_0010"
> > 
> > why do you do that?
> 
> It's a workaround for windows 2016. "Processors" only appears in device
> manager after install the driver.

So I've tried several images on several AMD hosts.

above command might work or might not depending on version (and installed updates)
Instead I've used 
https://bugzilla.redhat.com/show_bug.cgi?id=1377155#c33
or alternatively CLI Yan's 
https://bugzilla.redhat.com/show_bug.cgi?id=1377155#c17


> > it doesn't reproduce on my host.
> > it's running 4.18.0-249.el8.x86_64 kernel and host CPU is older than skylake.
> > 
> > 
> > Can you try with that kernel and also try to change CPU model to haswell to
> > see if that helps.
> > It it doesn't help, I'll need access to the host where it reproduces.
> 
> Tried with kernel -249, it's reproducible. And I tested with other cpu
> models before, including haswell, it's not helping.
> 
...

I'm done with testing on your host.

As mentioned previously it worked for me on RHEL8.4 Intel host,
your's is AMD though. So I've tried clean install using
   en_windows_server_2016_x64_dvd_9718492.iso
on your host and another AMD host.
Both worked fine wrt CPU hotplug (after replacing HID driver with 'Generic Bus')
Then I've updated guest with latest updates from MS, and even after that it
still worked as expected. (modulo updates returned HID driver, so I had to replace driver again)

Then I've copied your guest image to Intel host,
and it still has reboot problem.

At this point I'd rule out QEMU and KVM as culprit,
it seems that your guest image is the problem.

I can suggest to start testing from clean install and then repeating
all configuration steps with testing after each step.
(perhaps there is an issue with installed software/drivers)

PS:
You are testing on AMD host an Intel CPU model (it might work but it also might confuse guest, I'd advise not doing so)
and what you get only approximation of that CPU model as many feature bits aren't available (as you can see at QEMU start up)
so request CPU model isn't actually tested.

PS:
your guest images crashes in a way that it doesn't trap into attached debugger (so I have no idea where it crashes),
perhaps windows drivers team has more experience debugging Windows.

Comment 15 Igor Mammedov 2021-01-29 17:41:23 UTC
Vadim,

can you take this BZ (or reassign it to someone in your team),
to see if it's virtio-win or Windows issue.

Comment 16 Vadim Rozenfeld 2021-01-30 04:14:23 UTC
Can QE try dumping the VM memory into a file with dump-guest-memory
command the next time when VM hangs out? Can I also see the system event 
log for review?

Thanks,
Vadim.

Comment 17 Yumei Huang 2021-02-01 03:07:41 UTC
(In reply to Igor Mammedov from comment #14)
> (In reply to Yumei Huang from comment #13)
> > (In reply to Igor Mammedov from comment #12)
> > > 
> > > (In reply to Peixiu Hou from comment #0)
> > > ...
> > > > Steps to Reproduce:
> > > ...
> > > > 2. Try to install 'HID Button over Interrupt Driver':
> > > > D:\devcon\win7_amd64\devcon.exe install C:\Windows\INF\machine.inf
> > > > "ACPI\VEN_ACPI&DEV_0010"
> > > 
> > > why do you do that?
> > 
> > It's a workaround for windows 2016. "Processors" only appears in device
> > manager after install the driver.
> 
> So I've tried several images on several AMD hosts.
> 
> above command might work or might not depending on version (and installed
> updates)
> Instead I've used 
> https://bugzilla.redhat.com/show_bug.cgi?id=1377155#c33
> or alternatively CLI Yan's 
> https://bugzilla.redhat.com/show_bug.cgi?id=1377155#c17
> 
> 
> > > it doesn't reproduce on my host.
> > > it's running 4.18.0-249.el8.x86_64 kernel and host CPU is older than skylake.
> > > 
> > > 
> > > Can you try with that kernel and also try to change CPU model to haswell to
> > > see if that helps.
> > > It it doesn't help, I'll need access to the host where it reproduces.
> > 
> > Tried with kernel -249, it's reproducible. And I tested with other cpu
> > models before, including haswell, it's not helping.
> > 
> ...
> 
> I'm done with testing on your host.
> 
> As mentioned previously it worked for me on RHEL8.4 Intel host,
> your's is AMD though. So I've tried clean install using
>    en_windows_server_2016_x64_dvd_9718492.iso
> on your host and another AMD host.
> Both worked fine wrt CPU hotplug (after replacing HID driver with 'Generic
> Bus')
> Then I've updated guest with latest updates from MS, and even after that it
> still worked as expected. (modulo updates returned HID driver, so I had to
> replace driver again)
> 
> Then I've copied your guest image to Intel host,
> and it still has reboot problem.
> 
> At this point I'd rule out QEMU and KVM as culprit,
> it seems that your guest image is the problem.
> 
> I can suggest to start testing from clean install and then repeating
> all configuration steps with testing after each step.
> (perhaps there is an issue with installed software/drivers)
> 
> PS:
> You are testing on AMD host an Intel CPU model (it might work but it also
> might confuse guest, I'd advise not doing so)
> and what you get only approximation of that CPU model as many feature bits
> aren't available (as you can see at QEMU start up)
> so request CPU model isn't actually tested.

Thanks for all the work. Just wanna clarify that I tested on both Intel and AMD hosts with corresponding CPU models. This host is just one of them, AMD cpu model is used when testing on this host.

Thanks.

> 
> PS:
> your guest images crashes in a way that it doesn't trap into attached
> debugger (so I have no idea where it crashes),
> perhaps windows drivers team has more experience debugging Windows.

Comment 18 Yumei Huang 2021-02-01 03:10:05 UTC
(In reply to Vadim Rozenfeld from comment #16)
> Can QE try dumping the VM memory into a file with dump-guest-memory
> command the next time when VM hangs out? Can I also see the system event 
> log for review?
> 
> Thanks,
> Vadim.

Hi Peixiu,

Would you please take this bug and help handle the request? It seems to be virtio-win or Windows issue, and you have more experience in windows. Thanks!

Comment 25 yimsong 2021-08-08 15:51:59 UTC
Hit the same issue on Hyper_v on 2016,

pkg:
    kernel-4.18.0-325.el8.x86_64
    qemu-kvm-4.2.0-56.module+el8.5.0+12039+0434c559.x86_64
    seabios-bin-1.13.0-2.module+el8.3.0+7353+9de0a3cc.noarch
    virtio-win-prewhql-205
    RHEL-8.5.0-20210730.n.0

How reproducible:
10/10

autocase:
hv_cpu_hotplug

Comment 26 ChenNana 2021-08-30 09:01:47 UTC
Hit the same issue on vioscsi on win2012,

pkg:
   kernel-4.18.0-330.el8.x86_64
   qemu-kvm-6.0.0-28.module+el8.5.0+12271+fffa967b.x86_64
   seabios-bin-1.13.0-2.module+el8.3.0+7353+9de0a3cc.noarch
   RHEL-8.5.0-20210816.n.0

How reproducible:
100%

Comment 27 yimsong 2021-08-30 11:48:36 UTC
Hit the same issue when run vioscsi function testing on 2016,

pkg:
    qemu-kvm-6.0.0-28.module+el8.5.0+12271+fffa967b.x86_64
    kernel-4.18.0-330.el8.x86_64
    seabios-1.14.0-1.module+el8.4.0+8855+a9e237a9.x86_64
    RHEL-8.5.0-20210816.n.0
case:
    cpu_device_hotpluggable.hotplug.single_vcpu.with_reboot.shell_reboot

Comment 28 ChenNana 2021-08-30 12:37:41 UTC
(In reply to ChenNana from comment #26)
> Hit the same issue on vioscsi on win2012,
> 
> pkg:
>    kernel-4.18.0-330.el8.x86_64
>    qemu-kvm-6.0.0-28.module+el8.5.0+12271+fffa967b.x86_64
>    seabios-bin-1.13.0-2.module+el8.3.0+7353+9de0a3cc.noarch
>    RHEL-8.5.0-20210816.n.0
> 
> How reproducible:
> 100%

Reproduced it three times manually, and this problem occurred (huest will hang after plugging). The three versions are:
1.virtio-win-prewhql-0.1-207.iso
2.virtio-win-prewhql-0.1-204.iso 
3.virtio-win-1.9.17-4.el8_4.iso

Comment 29 ChenNana 2021-09-06 09:23:00 UTC
(In reply to ChenNana from comment #28)
> (In reply to ChenNana from comment #26)
> > Hit the same issue on vioscsi on win2012,
> > 
> > pkg:
> >    kernel-4.18.0-330.el8.x86_64
> >    qemu-kvm-6.0.0-28.module+el8.5.0+12271+fffa967b.x86_64
> >    seabios-bin-1.13.0-2.module+el8.3.0+7353+9de0a3cc.noarch
> >    RHEL-8.5.0-20210816.n.0
> > 
> > How reproducible:
> > 100%
> 
> Reproduced it three times manually, and this problem occurred (huest will
> hang after plugging). The three versions are:
> 1.virtio-win-prewhql-0.1-207.iso
> 2.virtio-win-prewhql-0.1-204.iso 
> 3.virtio-win-1.9.17-4.el8_4.iso

Please ignore this comment, my problem is not the same as this one

Comment 30 Eric Hadley 2021-09-08 16:56:59 UTC
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 34 RHEL Program Management 2022-07-13 07:28:26 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 37 liunana 2022-09-14 15:55:16 UTC
Didn't reproduce this issue with win2022.

Test Env:
  dell-per750-22.lab.eng.pek2.redhat.com
  5.14.0-162.el9.x86_64
  qemu-kvm-7.1.0-1.el9.x86_64
  edk2-ovmf-20220526git16779ede2d36-3.el9.noarch
  windows_server_2022_x64_testsigned_enable_dvd.iso

Hi Peixiu,

Did you meet such issue with newer windows version than win2016?
Thanks.


Best regards
Nana Liu

Comment 39 Peixiu Hou 2022-09-21 07:24:16 UTC
(In reply to liunana from comment #37)
> Didn't reproduce this issue with win2022.
> 
> Test Env:
>   dell-per750-22.lab.eng.pek2.redhat.com
>   5.14.0-162.el9.x86_64
>   qemu-kvm-7.1.0-1.el9.x86_64
>   edk2-ovmf-20220526git16779ede2d36-3.el9.noarch
>   windows_server_2022_x64_testsigned_enable_dvd.iso
> 
> Hi Peixiu,
> 
> Did you meet such issue with newer windows version than win2016?
> Thanks.
> 

Hi Nana,

Not hit this issue on newer windows version~

Thanks~
Peixiu
> 
> Best regards
> Nana Liu

Comment 43 Peixiu Hou 2022-11-03 01:48:26 UTC
Hi Gabi and Jiri,

Looks good to me, thanks a lot~

Best Regards~
Peixiu


Note You need to log in before you can comment on or make changes to this bug.