Bug 613949 - [WHQL] Win2k8-R2 guest got BSOD when running virtio-serial Plug and Play Driver Test
[WHQL] Win2k8-R2 guest got BSOD when running virtio-serial Plug and Play Driv...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: virtio-win (Show other bugs)
6.0
All Linux
urgent Severity high
: rc
: ---
Assigned To: Vadim Rozenfeld
Virtualization Bugs
: Regression, TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-13 06:44 EDT by Qunfang Zhang
Modified: 2013-01-09 17:51 EST (History)
8 users (show)

See Also:
Fixed In Version: 1.1.12-0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-11 10:01:38 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
BSOD screenshot (22.21 KB, image/png)
2010-07-13 06:44 EDT, Qunfang Zhang
no flags Details

  None (edit)
Description Qunfang Zhang 2010-07-13 06:44:07 EDT
Description of problem:
When I test virtio-serial test, BSOD happened. The first time it happened on "CHAOS - Concurrent Hardware And OS Test (WDF Preview)",failed on the case "Run Pnpdtest", the second time it happened on "Plug and Play Driver Test". They have the same BSOD error code. (Attachment and memory.dmp will be attached.)

Command line:
/usr/libexec/qemu-kvm -m 6G -smp 4 -cpu qemu64,+x2apic -usbdevice tablet -drive file=win2k8-R2-serial.qcow2,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none,serial=win2k8-r2-230 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0  -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,mac=00:10:1a:4a:20:1e,bus=pci.0,addr=0x4,id=net0 -boot c -uuid 2b942d25-8e9d-4212-947e-c9e81d59f580 -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -name win2k8-R2-serial-230  -vnc :11 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=4,bus=pci.0 -chardev pty,id=channel0 -device virtserialport,chardev=channel0,name=org.linux-kvm.port.0,bus=virtio-serial0.0


Version-Release number of selected component (if applicable):
virtio-win-1.1.7-2
qemu-kvm-0.12.1.2-2.91.el6.x86_64
2.6.32-44.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Boot a win2k8-32 guest and run virtio-serial test, "CHAOS - Concurrent Hardware And OS Test (WDF Preview)" job and "Plug and Play Driver Test" job.
2.
3.
  
Actual results:
Guest got BSOD.

Expected results:
Pass the test.

Additional info:
Comment 1 Qunfang Zhang 2010-07-13 06:44:39 EDT
Created attachment 431412 [details]
BSOD screenshot
Comment 2 Qunfang Zhang 2010-07-13 06:46:53 EDT
Memory dump file:
http://10.66.65.120/mem-dump/MEMORY-2k832-serial.DMP
Comment 3 Qunfang Zhang 2010-07-13 06:50:57 EDT
Confirmed with szhou, this issue does not happen on virtio-win-1.1.0 version, so it is a regression.
Comment 5 Qunfang Zhang 2010-07-13 07:09:36 EDT
> Steps to Reproduce:
> 1.Boot a win2k8-32 guest and run virtio-serial test, "CHAOS - Concurrent
> Hardware And OS Test (WDF Preview)" job and "Plug and Play Driver Test" job.
> 2.
Sorry,the guest I tested is win2k8-R2
Comment 7 Yaniv Kaul 2010-07-13 07:50:40 EDT
(In reply to comment #2)
> Memory dump file:
> http://10.66.65.120/mem-dump/MEMORY-2k832-serial.DMP    

Why not analyze the dump file? please do and post the stack.
Comment 8 Qunfang Zhang 2010-07-13 23:36:12 EDT
Microsoft (R) Windows Debugger Version 6.10.0003.233 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Users\DTMLLUAdminUser\Desktop\MEMORY-2k.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is: SRV*C:\symbols;SRV*C:\virtio
Executable search path is: 
Windows 7 Kernel Version 7600 MP (4 procs) Free x64
Product: Server, suite: TerminalServer DataCenter SingleUserTS
Built by: 7600.16539.amd64fre.win7_gdr.100226-1909
Machine Name:
Kernel base = 0xfffff800`01409000 PsLoadedModuleList = 0xfffff800`01646e50
Debug session time: Tue Jul 13 08:26:23.703 2010 (GMT-7)
System Uptime: 0 days 0:02:10.109
Loading Kernel Symbols
...............................................................
............................................................
Loading User Symbols

Loading unloaded module list
.......
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck D1, {fffff88004842010, 2, 0, fffff880043ef5c4}

*** ERROR: Module load completed but symbols could not be loaded for vioser.sys
Probably caused by : vioser.sys ( vioser+25c4 )

Followup: MachineOwner
---------

2: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: fffff88004842010, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff880043ef5c4, address which referenced memory

Debugging Details:
------------------


READ_ADDRESS:  fffff88004842010 Paged pool

CURRENT_IRQL:  2

FAULTING_IP: 
vioser+25c4
fffff880`043ef5c4 488b4310        mov     rax,qword ptr [rbx+10h]

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0xD1

PROCESS_NAME:  System

TRAP_FRAME:  fffff88001c8cc10 -- (.trap 0xfffff88001c8cc10)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000002087 rbx=0000000000000058 rcx=fffffa800637c4e8
rdx=0000057ff95dfa48 rsi=0000000000000000 rdi=0000057ff9409f78
rip=fffff880043ef5c4 rsp=fffff88001c8cda0 rbp=0000057ff9409f78
 r8=fffff880043f2230  r9=0000000000000000 r10=fffffa8006663680
r11=fffff88001c8cdb0 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na po nc
vioser+0x25c4:
fffff880`043ef5c4 488b4310        mov     rax,qword ptr [rbx+10h] ds:0001:00000000`00000068=????????????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff80001478b69 to fffff80001479600

STACK_TEXT:  
fffff880`01c8cac8 fffff800`01478b69 : 00000000`0000000a fffff880`04842010 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
fffff880`01c8cad0 fffff800`014777e0 : fffff880`0110e8d0 fffff880`04842000 fffff880`01c8cd00 fffff880`010c32dc : nt!KiBugCheckDispatch+0x69
fffff880`01c8cc10 fffff880`043ef5c4 : fffff880`043efa72 fffffa80`06a205b0 fffffa80`04f6ac50 fffffa80`06bf63f0 : nt!KiPageFault+0x260
fffff880`01c8cda0 fffff880`043efa7a : fffffa80`06a205b0 fffffa80`06d44000 0000057f`f9409f78 fffffa80`0637c4e8 : vioser+0x25c4
fffff880`01c8cdd0 fffff880`043ee889 : fffffa80`04f6ac50 00000000`00000000 fffff880`04984000 00000000`00000002 : vioser+0x2a7a
fffff880`01c8ce20 fffff880`010fb8a7 : 00000000`00000008 0000057f`f9c85b18 00000000`1be05741 00000000`12ef75f9 : vioser+0x1889
fffff880`01c8ced0 fffff800`01484cdc : fffff880`01c5d180 00000000`00002087 fffff880`0116f280 00000000`00000086 : Wdf01000!FxInterrupt::_InterruptDpcThunk+0x8f
fffff880`01c8cf00 fffff800`0147f765 : 00000000`00000000 fffffa80`04efb770 00000000`00000000 fffff880`010fb818 : nt!KiRetireDpcList+0x1bc
fffff880`01c8cfb0 fffff800`0147f57c : 00000000`000000c6 fffff880`043ee777 fffffa80`0637a4e0 00000000`00000000 : nt!KyRetireDpcList+0x5
fffff880`01f32730 fffff800`014c4b13 : fffff800`01475436 fffff800`014754a2 00000000`000000c6 fffff880`01f32701 : nt!KiDispatchInterruptContinue
fffff880`01f32760 fffff800`014754a2 : 00000000`000000c6 fffff880`01f32701 fffffa80`05dfb9c0 00000000`00000013 : nt!KiDpcInterruptBypass+0x13
fffff880`01f32770 fffff800`0148746c : 00000000`00000000 fffff800`00000001 00000000`00000000 00000000`00000002 : nt!KiInterruptDispatch+0x212
fffff880`01f32900 fffff800`0142947b : fffff800`01605b40 00000000`00000000 00000000`00000000 fffff800`014ad9d1 : nt!KeFlushMultipleRangeTb+0x1cc
fffff880`01f329d0 fffff800`015802ba : fffff800`01605b40 fffff880`01f32be0 fffff780`00000000 fffff780`00000013 : nt! ?? ::FNODOBFM::`string'+0x203be
fffff880`01f32bc0 fffff800`015817a7 : fffffa80`04efb770 00000000`00000000 00000000`00000000 ffffffff`ffffffff : nt!MiEmptyWorkingSet+0x24a
fffff880`01f32c70 fffff800`0191a261 : 00000000`00000000 fffffa80`04efb770 00000000`00000080 00000000`00000080 : nt!MiTrimAllSystemPagableMemory+0x218
fffff880`01f32cd0 fffff800`0191a2d9 : fffffa80`04efb770 fffffa80`04ee1b30 00000000`00000000 00770066`00760000 : nt!MmVerifierTrimMemory+0xf1
fffff880`01f32d00 fffff800`0171ea86 : 00000000`00500060 00010000`000005c4 01cb1f5f`c52ae80e 01cb1f5f`c52ae80e : nt!ViKeTrimWorkerThreadRoutine+0x29
fffff880`01f32d40 fffff800`01457b06 : fffff880`01c5d180 fffffa80`04efb770 fffff880`01c67fc0 00010000`00005ae0 : nt!PspSystemThreadStartup+0x5a
fffff880`01f32d80 00000000`00000000 : fffff880`01f33000 fffff880`01f2d000 fffff880`01f32a50 00000000`00000000 : nt!KxStartSystemThread+0x16


STACK_COMMAND:  kb

FOLLOWUP_IP: 
vioser+25c4
fffff880`043ef5c4 488b4310        mov     rax,qword ptr [rbx+10h]

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  vioser+25c4

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: vioser

IMAGE_NAME:  vioser.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4c2e1e75

FAILURE_BUCKET_ID:  X64_0xD1_VRF_vioser+25c4

BUCKET_ID:  X64_0xD1_VRF_vioser+25c4

Followup: MachineOwner
---------
Comment 9 Shirley Zhou 2010-07-14 23:41:01 EDT
This bug happens when I did hot-plug and hot-unplug for windows 2003 x86 as following steps:
1.run windows 2003 x86 guest with virtio serial
-device virtio-serial -device spicevmc,id=spicevmc-2,debug=1,nr=0
2.do hot-unplug using qmp
{"execute":"device_del","arguments":{"id":"spicevmc-2"}}
This guest become BSOD
3.do system reset from qmp
{"execute":"system_reset"}
4.do hot-plug using qmp
{"execute":"device_add","arguments":{"driver":"spicevmc","id":"spice-vmc1","debug":1,"nr":2}}
This guest become bsod again

While there is no reproducer in windows xp guest.
Comment 10 Dor Laor 2010-07-15 05:44:38 EDT
(In reply to comment #9)
> This bug happens when I did hot-plug and hot-unplug for windows 2003 x86 as
> following steps:
> 1.run windows 2003 x86 guest with virtio serial
> -device virtio-serial -device spicevmc,id=spicevmc-2,debug=1,nr=0
> 2.do hot-unplug using qmp
> {"execute":"device_del","arguments":{"id":"spicevmc-2"}}

Does the whql test demands it or it is unrelated to whql?
The reason for me asking is that whql is a must have while virtio-serial is a nice to have.

> This guest become BSOD
> 3.do system reset from qmp
> {"execute":"system_reset"}
> 4.do hot-plug using qmp
> {"execute":"device_add","arguments":{"driver":"spicevmc","id":"spice-vmc1","debug":1,"nr":2}}
> This guest become bsod again
> 
> While there is no reproducer in windows xp guest.
Comment 11 Vadim Rozenfeld 2010-07-15 07:43:57 EDT
WHQL performs generic tests on unclassified devices. 
So it's nice to have. But IMO it must be tested anyway.
Comment 12 Shirley Zhou 2010-07-16 02:18:23 EDT
(In reply to comment #10)
> (In reply to comment #9)
> > This bug happens when I did hot-plug and hot-unplug for windows 2003 x86 as
> > following steps:
> > 1.run windows 2003 x86 guest with virtio serial
> > -device virtio-serial -device spicevmc,id=spicevmc-2,debug=1,nr=0
> > 2.do hot-unplug using qmp
> > {"execute":"device_del","arguments":{"id":"spicevmc-2"}}
> 
> Does the whql test demands it or it is unrelated to whql?
> The reason for me asking is that whql is a must have while virtio-serial is a
> nice to have.
Hi,Dor
I think hot-plug and hot-unplug related to whql testing.
I tried to do hot-plug and hot-unplug with driver virtio-win-1.1.8.0 on windows 2008 R2, bsod also happens, and the error code same as whql testing.
> 
> > This guest become BSOD
> > 3.do system reset from qmp
> > {"execute":"system_reset"}
> > 4.do hot-plug using qmp
> > {"execute":"device_add","arguments":{"driver":"spicevmc","id":"spice-vmc1","debug":1,"nr":2}}
> > This guest become bsod again
> > 
> > While there is no reproducer in windows xp guest.
Comment 13 Qunfang Zhang 2010-07-16 02:45:11 EDT
Run whql virtio-serial test on virtio-win-1.1.8-0, still meet this issue.
Comment 14 Qunfang Zhang 2010-07-26 22:43:27 EDT
Set this issue as a test blocker since when I test win7-32 and win7-64, I meet the same BSOD even at the beginning of running serial jobs. For win7-32, after I installed the serial driver and reboot guest, BSOD happened.
For win7-64, when the first job " PCI Hardware Compliance Test (PCIHCT)" was running, BSOD happened.
And I can not get the dump file because I can not succeed to restart the guest and log in desktop again. But the error code are the same as the bug screenshot attachment.
Comment 15 Vadim Rozenfeld 2010-07-30 03:03:02 EDT
please try the latest driver from virtio-win-1.1.8-0
Comment 16 Qunfang Zhang 2010-07-30 06:04:18 EDT
(In reply to comment #15)
> please try the latest driver from virtio-win-1.1.8-0    

Maybe you mean virtio-win-1.1.9-0. :-)
Test with only win2k8-R2 guest using virtio-win-1.1.9-0, run a job "Plug and Play Driver Test" and it passed.
Now a couple of guests are running virtio-serial test on my hosts, and I will change the status to VERIFIED after collect all the passed results.

Thanks~
Comment 17 Qunfang Zhang 2010-07-30 06:14:27 EDT
BSOD (with the same error code) happens when running "CHAOS - Concurrent Hardware And OS Test (WDF Preview)". Guest is still win2k8-R2, with virtio-win-1.1.8-0.
Comment 18 Qunfang Zhang 2010-07-30 06:18:16 EDT
(In reply to comment #17)
> BSOD (with the same error code) happens when running "CHAOS - Concurrent
> Hardware And OS Test (WDF Preview)". Guest is still win2k8-R2, with
> virtio-win-1.1.8-0.    

Sorry, the virtio version is virtio-win-1.1.9-0.
Comment 19 Qunfang Zhang 2010-08-03 03:05:00 EDT
Change the stauts to Assigned according to Comment 18.
Comment 20 Vadim Rozenfeld 2010-08-03 03:11:31 EDT
Hi, Qunfang.

Since we have two issues described here:
- crash due to performing device_del device_add sequence
- crases while WHQL'ing.

Could you please recheck both of them and update with results.
In case of crash please post a crash dump, it can speed-up the 
bug-fixing process dramatically.

Thanks & regards,
Vadim.
Comment 22 Qunfang Zhang 2010-08-03 07:30:34 EDT
(In reply to comment #20)
> Hi, Qunfang.
> 
> Since we have two issues described here:
> - crash due to performing device_del device_add sequence
Have not reproduced this time using win2k8-R2. BTW, as Dor said in Comment 21, how about we just follow the whql test in this issue. And if we reproduce the usability for hot plug issue, will file another issue for a not urgent priority?

> - crases while WHQL'ing.
Can reproduce in virtio-win-1.1.10-0.
mem dump:
http://10.66.65.120/mem-dump/MEMORY-win2k8-R2-serial-0xD1-BZ613949.DMP.gz
dump analyze result:
http://10.66.65.120/mem-dump/win2k8-R2-serial-0xD1-bz613949.txt

> 
> Could you please recheck both of them and update with results.
> In case of crash please post a crash dump, it can speed-up the 
> bug-fixing process dramatically.
> 
> Thanks & regards,
> Vadim.
Comment 23 Qunfang Zhang 2010-08-03 07:41:01 EDT
BTW, when running virtio-serial jobs, there are many error in qemu monitor,like:
(qemu) qemu-kvm: virtio-serial-bus: Guest failure in adding device virtio-serial0.0
Comment 24 Vadim Rozenfeld 2010-08-03 08:59:54 EDT
(In reply to comment #23)
> BTW, when running virtio-serial jobs, there are many error in qemu
> monitor,like:
> (qemu) qemu-kvm: virtio-serial-bus: Guest failure in adding device
> virtio-serial0.0    

It not an error, just a friendly warning from QEMU that  VIRTIO_CONSOLE_DEVICE_READY message comes with parameter 0,
which is fine on device unplugging, but indicates a some sort of
critical error while plugging device in.

And thanks for the dump file.
Vadim.
Comment 28 Qunfang Zhang 2010-08-23 04:25:20 EDT
Update:
The issue does not exist on virtio-win-1.1.12-0 using win2k8-R2.
I will change the status to VERIFIED after finish all windows guests.
Comment 29 Vadim Rozenfeld 2010-08-23 04:31:23 EDT
(In reply to comment #28)
> Update:
> The issue does not exist on virtio-win-1.1.12-0 using win2k8-R2.
> I will change the status to VERIFIED after finish all windows guests.

Great.
Thank you,
Vadim.
Comment 30 Qunfang Zhang 2010-08-23 04:45:20 EDT
(In reply to comment #29)
> (In reply to comment #28)
> > Update:
> > The issue does not exist on virtio-win-1.1.12-0 using win2k8-R2.
> > I will change the status to VERIFIED after finish all windows guests.
> 
> Great.
> Thank you,
> Vadim.

But Vadim, seems another bug introduced, see Bug 626333.
Comment 31 Qunfang Zhang 2010-08-26 21:11:36 EDT
Verified this bug in virtio-win-1.1.12-0 with win2k8-R2, win2k8-32&64, win7-32&64, this issue does not exist any more.

Related packages version:
kernel-2.6.32-66.el6.x86_64
qemu-kvm-0.12.1.2-2.112.el6.x86_64

So, I will change the status to VERIFIED.
Comment 32 releng-rhel@redhat.com 2010-11-11 10:01:38 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.