Bug 1455488
Summary: | [Virtio-win][vioser][ovmf] Guest occured BSOD when hotunplug virtio-serial-pci. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | xiagao |
Component: | virtio-win | Assignee: | Amnon Ilan <ailan> |
virtio-win sub component: | virtio-win-prewhql | QA Contact: | xiagao |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | ailan, lijin, michen, mtessun, phou, vrozenfe, wyu, xiagao |
Version: | 7.4 | ||
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-10 06:28:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1473046 |
Description
xiagao
2017-05-25 10:19:44 UTC
Did not hit this bug under pc and seabios. (In reply to xiagao from comment #3) > Did not hit this bug under pc and seabios. Also not hit it under q35 and seabios. The driver failed to initialize because it couldn't set MSI vector for 809th virtqueue (STATUS_DEVICE_BUSY - only set on set_queue_vector failure). The error path has a bug and accesses potentially uninitialized memory which triggered the BSOD. The fix is simple but it's not clear why QEMU would fail to set the vector (we use the same vector for all queues and QEMU doesn't allocate or do anything fail-worthy there). It's also not clear why the driver would be initializing on *unplug*. Hi, (In reply to xiagao from comment #0) > Description of problem: > Guest occured BSOD when hotunplug virtio-serial-pci. > > Version-Release number of selected component (if applicable): > kernel-3.10.0-671.el7.x86_64 > qemu-kvm-rhev-2.9.0-5.el7.x86_64 > virtio-win-prewhql-0.1-137 > > How reproducible: > 9/10 > > Steps to Reproduce: > 1.boot win2016 guest with serial device in q35 and ovmf env. > 2.install vioser driver > 3.hotunplug port > 4.hotunplug virtio-serial-pci > > > Actual results: > Guest BSOD > > Expected results: > no BSOD Would it be possible to get access to the VM and host? I haven't been able to reproduce this locally. Thanks! Fix for the BSOD has been committed: https://github.com/virtio-win/kvm-guest-drivers-windows/commit/09e62053b315ad7e09eaddf0431f76ab694c65da (In reply to Ladi Prosek from comment #10) > The driver failed to initialize because it couldn't set MSI vector for 809th > virtqueue (STATUS_DEVICE_BUSY - only set on set_queue_vector failure). The > error path has a bug and accesses potentially uninitialized memory which > triggered the BSOD. > > The fix is simple but it's not clear why QEMU would fail to set the vector > (we use the same vector for all queues and QEMU doesn't allocate or do > anything fail-worthy there). It's also not clear why the driver would be > initializing on *unplug*. So the reason why virt queue initialization fails is that the device simply disappears and setting queue MSI vector to 1 fails because the default 0 is read back. To recap: 1. virtio-serial-pci is hotunplugged and the driver correctly shuts down. 2. For an unknown reason, the device re-appears and Windows loads the driver again. 3. While the driver initializes, the device disappears which triggers the BSOD. So far we have fixed the BSOD on second device removal but something is still wrong with hotunplug, likely only if the device is connected via PCI Express. should be fixed in build 139 http://download.eng.bos.redhat.com/brewroot/work/tasks/1396/13321396/virtio-win-prewhql-0.1.zip Opened bug 1457920 to track the driver reload issue. Tested this issue with virtio-win-prewhql-139, used a new created image, cannot reproduce this bug, did not hit BSOD, passed 10/10. And also tried to reproduced it with virtio-win-prewhql-137, did not reproduced it, tried 10 times, did not hit BSOD. Steps as comment#0 Used version: kernel-3.10.0-679.el7.x86_64 qemu-kvm-rhev-2.9.0-10.el7.x86_64 seabios-bin-1.10.2-1.el7.noarch Best Regards~ Peixiu Hou (In reply to Peixiu Hou from comment #20) > Tested this issue with virtio-win-prewhql-139, used a new created image, > cannot reproduce this bug, did not hit BSOD, passed 10/10. > And also tried to reproduced it with virtio-win-prewhql-137, did not > reproduced it, tried 10 times, did not hit BSOD. > > Steps as comment#0 > > Used version: > kernel-3.10.0-679.el7.x86_64 > qemu-kvm-rhev-2.9.0-10.el7.x86_64 > seabios-bin-1.10.2-1.el7.noarch Thanks, yes, the timing has to be just right to hit this bug. It is possible that it wouldn't reproduce on other hosts or with fresh installed Windows. I would maybe try different vcpu count: -smp 4, -smp 2, and -smp 1. Any chance you can use the host and VM where this was originally found? (In reply to Ladi Prosek from comment #21) > (In reply to Peixiu Hou from comment #20) > > Tested this issue with virtio-win-prewhql-139, used a new created image, > > cannot reproduce this bug, did not hit BSOD, passed 10/10. > > And also tried to reproduced it with virtio-win-prewhql-137, did not > > reproduced it, tried 10 times, did not hit BSOD. > > > > Steps as comment#0 > > > > Used version: > > kernel-3.10.0-679.el7.x86_64 > > qemu-kvm-rhev-2.9.0-10.el7.x86_64 > > seabios-bin-1.10.2-1.el7.noarch > > Thanks, yes, the timing has to be just right to hit this bug. It is possible > that it wouldn't reproduce on other hosts or with fresh installed Windows. I > would maybe try different vcpu count: -smp 4, -smp 2, and -smp 1. > > Any chance you can use the host and VM where this was originally found? Yeah, tests mentioned on comment#20 were executed on the original host, but the original VM image has been deleted. And I also can try with different vcpu count, any result will update to here, thank you so much~ (In reply to Peixiu Hou from comment #22) > (In reply to Ladi Prosek from comment #21) > > (In reply to Peixiu Hou from comment #20) > > > Tested this issue with virtio-win-prewhql-139, used a new created image, > > > cannot reproduce this bug, did not hit BSOD, passed 10/10. > > > And also tried to reproduced it with virtio-win-prewhql-137, did not > > > reproduced it, tried 10 times, did not hit BSOD. > > > > > > Steps as comment#0 > > > > > > Used version: > > > kernel-3.10.0-679.el7.x86_64 > > > qemu-kvm-rhev-2.9.0-10.el7.x86_64 > > > seabios-bin-1.10.2-1.el7.noarch > > > > Thanks, yes, the timing has to be just right to hit this bug. It is possible > > that it wouldn't reproduce on other hosts or with fresh installed Windows. I > > would maybe try different vcpu count: -smp 4, -smp 2, and -smp 1. > > > > Any chance you can use the host and VM where this was originally found? > > Yeah, tests mentioned on comment#20 were executed on the original host, but > the original VM image has been deleted. And I also can try with different > vcpu count, any result will update to here, thank you so much~ On the original host: Tried with "-smp 4", did not reproduce this bug, used build 137 and 139, both cannot reproduce it, passed 6/6. Tried with "-smp 2", did not reproduce this bug, used build 137 and 139, both cannot reproduce it, passed 5/5. Tried with "-smp 1", did not reproduce this bug, used build 137 and 139, both cannot reproduce it, passed 5/5. Best Regards~ Peixiu I'd like to change status to verified as no bsod after many times' try. Feel free to re-open it if anyone hit it again. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0657 |