Bug 1013336 - [virtio-win][netkvm] BSoD occurs when running NDISTest6.5 -[2 Machine] - MPE_Ethernet job on windows 2012 (Win10)
[virtio-win][netkvm] BSoD occurs when running NDISTest6.5 -[2 Machine] - MPE_...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: virtio-win (Show other bugs)
7.0
Unspecified Unspecified
high Severity high
: rc
: 7.0
Assigned To: Yan Vugenfirer
Virtualization Bugs
Fixed_Not_Ship
:
Depends On: 1056934
Blocks: 1288337
  Show dependency treegraph
 
Reported: 2013-09-29 07:35 EDT by Mike Cao
Modified: 2016-11-04 04:43 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
NO_DOCS
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-04 04:43:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
win10-32 bsod screenshot (32.78 KB, image/png)
2015-07-16 22:37 EDT, lijin
no flags Details
win10-64 bsod screenshot (31.98 KB, image/png)
2015-07-16 22:38 EDT, lijin
no flags Details
win7-32 job failed log (2.80 MB, application/zip)
2016-06-20 01:48 EDT, Peixiu Hou
no flags Details

  None (edit)
Description Mike Cao 2013-09-29 07:35:16 EDT
Description of problem:


Version-Release number of selected component (if applicable):
virtio-win-prwehql-72
2.6.32-420.el6
qemu-kvm-rhev-0.12.1.2.405
seabios-0.6.1.2-28

How reproducible:
1/1

Steps to Reproduce:
1.Start VM
usr/libexec/qemu-kvm -m 6G -smp 8,cores=8 -M rhel6.5.0 -cpu cpu64-rhel6,+x2apic,family=0xf -usb -device usb-tablet -netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet0,mac=00:52:26:33:33:33 -netdev tap,sndbuf=0,id=hostnet1,script=/etc/qemu-ifup-private,downscript=no,vhost=on -device virtio-net-pci,netdev=hostnet1,ctrl_guest_offloads=on,mac=00:52:36:01:02:03 -uuid fbf464a3-17a6-41fe-95a6-c87c180ff30c -vga cirrus -vnc :2 -name win2012-nic2-72 -no-kvm-pit-reinjection -chardev socket,id=111a,path=/tmp/monitor-2012-nic2-72,server,nowait -mon chardev=111a,mode=readline -rtc base=localtime,clock=host,driftfix=slew -drive file=win2012-nic2.raw,if=none,media=disk,format=raw,rerror=stop,werror=stop,cache=none,aio=native,id=scsi-disk0 -device ide-drive,drive=scsi-disk0,id=disk,bootindex=1 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1

 #/usr/libexec/qemu-kvm -m 6G -smp 8,cores=8 -M rhel6.5.0 -cpu cpu64-rhel6,+x2apic,family=0xf -usb -device usb-tablet -netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet0,mac=00:52:e6:8e:22:66 -netdev tap,sndbuf=0,id=hostnet1,script=/etc/qemu-ifup-private,downscript=no,vhost=on -device virtio-net-pci,netdev=hostnet1,ctrl_guest_offloads=on,mac=00:52:36:77:66:55 -uuid e814de72-dfd3-4041-b5e3-5749ec737e17 -no-kvm-pit-reinjection -vga cirrus -vnc :1 -chardev socket,id=111a,path=/tmp/monitor-2012-nic1-72,server,nowait -mon chardev=111a,mode=readline -rtc base=localtime,clock=host,driftfix=slew -drive file=win2012-nic1.raw,if=none,media=disk,format=raw,rerror=stop,werror=stop,cache=none,aio=native,id=scsi-disk0 -device ide-drive,drive=scsi-disk0,id=disk,bootindex=1 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1

2.Run NDISTest6.5 -[2 Machine] - MPE_Ethernet job
3.

Actual results:
Guest BSOD 

Expected results:
no BSOD occurs 

Additional info:
Comment 1 Mike Cao 2013-09-29 07:37:00 EDT
Loading Dump File [I:\bcao\win2012-MPE-netkvm-72\MEMORY_MPE_Job.DMP]
Kernel Bitmap Dump File: Only kernel address space is available

WARNING: Inaccessible path: 'D:\symbolsI:\virtio-win-prewhql-0.1-72\win8\amd64\netkvm.pdb'
Symbol search path is: D:\symbolsI:\virtio-win-prewhql-0.1-72\win8\amd64\netkvm.pdb;SRV*D:\symbols\*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 8 Kernel Version 9200 MP (8 procs) Free x64
Product: Server, suite: TerminalServer DataCenter SingleUserTS
Built by: 9200.16628.amd64fre.win8_gdr.130531-1504
Machine Name:
Kernel base = 0xfffff801`8be73000 PsLoadedModuleList = 0xfffff801`8c13fa20
Debug session time: Sun Sep 29 17:30:09.804 2013 (UTC + 8:00)
System Uptime: 0 days 2:57:08.506
Loading Kernel Symbols
...............................................................
................................................................

Loading User Symbols

Loading unloaded module list
..............
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 7E, {ffffffff80000003, fffff8018bec7d78, fffff88006388948, fffff88006388180}

*** ERROR: Module load completed but symbols could not be loaded for NDProt630.sys
Probably caused by : NDProt630.sys ( NDProt630+8b86f )

Followup: MachineOwner
---------

6: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffff80000003, The exception code that was not handled
Arg2: fffff8018bec7d78, The address that the exception occurred at
Arg3: fffff88006388948, Exception Record Address
Arg4: fffff88006388180, Context Record Address

Debugging Details:
------------------


EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

FAULTING_IP: 
nt!DebugPrompt+18
fffff801`8bec7d78 c3              ret

EXCEPTION_RECORD:  fffff88006388948 -- (.exr 0xfffff88006388948)
ExceptionAddress: fffff8018bec7d78 (nt!DebugPrompt+0x0000000000000018)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 1
   Parameter[0]: 0000000000000002

CONTEXT:  fffff88006388180 -- (.cxr 0xfffff88006388180)
rax=0000000000000002 rbx=fffffa800c2cd600 rcx=fffff880067b5050
rdx=fffff88006380044 rsi=fffffa80050a2218 rdi=fffff88002bf0f40
rip=fffff8018bec7d77 rsp=fffff88006388b88 rbp=fffff88006388c30
 r8=fffff88006388c18  r9=0000000000000002 r10=0000000000000000
r11=0000000000000000 r12=0000000000000001 r13=0000000000000000
r14=fffff8800677ff00 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
nt!DebugPrompt+0x17:
fffff801`8bec7d77 cc              int     3
Resetting default scope

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  AV

PROCESS_NAME:  System

CURRENT_IRQL:  0

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_PARAMETER1:  0000000000000002

LAST_CONTROL_TRANSFER:  from fffff8018c011319 to fffff8018bec7d77

STACK_TEXT:  
fffff880`06388b88 fffff801`8c011319 : fffff880`06388c30 fffff801`8bf3bc08 fffffa80`0c2cd600 fffffa80`050a2218 : nt!DebugPrompt+0x17
fffff880`06388b90 fffff880`0670786f : fffffa80`0501a220 fffff880`067bec50 fffff880`067b4ff0 81010101`01010100 : nt!DbgPrompt+0x35
fffff880`06388be0 fffff880`06729bc9 : fffff880`00000001 fffff880`067bec50 00000000`00000158 81010101`01010100 : NDProt630+0x8b86f
fffff880`06388c40 fffff880`0675b7f9 : fffffa80`050a21e0 fffff880`00000001 fffff880`001b7740 fffff880`06388cd0 : NDProt630+0xadbc9
fffff880`06388c80 fffff880`0677ff57 : fffffa80`050a21e0 0000065c`002850a1 fffff901`002098a0 000005a0`00000001 : NDProt630+0xdf7f9
fffff880`06388d00 fffff801`8be9ffd9 : fffffa80`050a2218 00000000`00000020 00000000`00000001 fffffa80`0a47db00 : NDProt630+0x103f57
fffff880`06388d50 fffff801`8bf547e6 : fffff880`02be5180 fffffa80`0c2cd600 fffff880`02bf0f40 fffffa80`04f2b380 : nt!PspSystemThreadStartup+0x59
fffff880`06388da0 00000000`00000000 : fffff880`06389000 fffff880`06383000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


FOLLOWUP_IP: 
NDProt630+8b86f
fffff880`0670786f 8b442448        mov     eax,dword ptr [rsp+48h]

SYMBOL_STACK_INDEX:  2

SYMBOL_NAME:  NDProt630+8b86f

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: NDProt630

IMAGE_NAME:  NDProt630.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  5049c61b

STACK_COMMAND:  .cxr 0xfffff88006388180 ; kb

FAILURE_BUCKET_ID:  AV_NDProt630+8b86f

BUCKET_ID:  AV_NDProt630+8b86f

Followup: MachineOwner
---------
Comment 4 Mike Cao 2013-09-29 22:53:21 EDT
The test passed when I test the 2nd time .remove testlblocker&Regression keyword ,postpone to rhel6.6.0
Comment 6 Mike Cao 2014-02-27 21:53:51 EST
Yan, Dima

Pls help to check whether we can use Manual errata 3162 to workaround it 

Thanks,
Mike
Comment 8 Yan Vugenfirer 2014-03-02 06:35:33 EST
Hi,

 Can you please upload the dump file? I assume this is with mst's patch, right?


Thanks,
Yan.
Comment 14 Min Deng 2014-03-13 23:18:41 EDT
Upload a new dump file but it should be same to the old one.
Thanks
Min
Comment 19 lijin 2015-03-31 05:51:16 EDT
job can pass with build 101
So this issue has been fixed already.
Comment 25 lijin 2015-07-16 22:37:45 EDT
Created attachment 1052932 [details]
win10-32 bsod screenshot
Comment 26 lijin 2015-07-16 22:38:15 EDT
Created attachment 1052933 [details]
win10-64 bsod screenshot
Comment 27 lijin 2015-07-16 22:52:47 EDT
strangely win10-32/64 did not generate dump file after bsod,I tried set paging file size more than mem+200M,but that doesn't help.
I only get the screenshot during bsod,please check the attachment
Comment 28 Yan Vugenfirer 2015-07-22 03:26:36 EDT
(In reply to lijin from comment #27)
> strangely win10-32/64 did not generate dump file after bsod,I tried set
> paging file size more than mem+200M,but that doesn't help.
> I only get the screenshot during bsod,please check the attachment

Hi,

Did you try this:

"Crash dumps From Windows 7 and up - OS will auto-delete large crash dumps To keep crash dumps: Key: HKLMSystemCurrentControlSetControl CrashControl Value: “AlwaysKeepMemoryDump”:DWORD set to 1"

Best regards,
Yan.
Comment 37 Yu Wang 2016-05-30 06:58:29 EDT
According to comment#36, change status to "verified"
Comment 40 Peixiu Hou 2016-06-07 03:17:15 EDT
MPE_2012R2_BSOD_MEMORY dump analysis as follows:

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffff80000003, The exception code that was not handled
Arg2: fffff8026c965cd8, The address that the exception occurred at
Arg3: ffffd0002f853938, Exception Record Address
Arg4: ffffd0002f853140, Context Record Address

Debugging Details:
------------------


OVERLAPPED_MODULE: Address regions for 'NDProt630' and 'NDProt630.sys' overlap

EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

FAULTING_IP: 
nt!DebugPrompt+18
fffff802`6c965cd8 c3              ret

EXCEPTION_RECORD:  ffffd0002f853938 -- (.exr 0xffffd0002f853938)
ExceptionAddress: fffff8026c965cd8 (nt!DebugPrompt+0x0000000000000018)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 1
   Parameter[0]: 0000000000000002

CONTEXT:  ffffd0002f853140 -- (.cxr 0xffffd0002f853140;r)
rax=0000000000000002 rbx=ffffe00002e24880 rcx=fffff8000245fa00
rdx=ffffd0002f850044 rsi=ffffe00002e24880 rdi=ffffe000000b5400
rip=fffff8026c965cd7 rsp=ffffd0002f853b78 rbp=0000000000000080
 r8=ffffd0002f853bf8  r9=0000000000000002 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000007ffe
r14=ffffe00000c5f078 r15=fffff8000242a100
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
nt!DebugPrompt+0x17:
fffff802`6c965cd7 cc              int     3
Last set context:
rax=0000000000000002 rbx=ffffe00002e24880 rcx=fffff8000245fa00
rdx=ffffd0002f850044 rsi=ffffe00002e24880 rdi=ffffe000000b5400
rip=fffff8026c965cd7 rsp=ffffd0002f853b78 rbp=0000000000000080
 r8=ffffd0002f853bf8  r9=0000000000000002 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000007ffe
r14=ffffe00000c5f078 r15=fffff8000242a100
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
nt!DebugPrompt+0x17:
fffff802`6c965cd7 cc              int     3
Resetting default scope

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  AV

PROCESS_NAME:  System

CURRENT_IRQL:  0

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_PARAMETER1:  0000000000000002

ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre

LAST_CONTROL_TRANSFER:  from fffff8026ca14ea5 to fffff8026c965cd7

STACK_TEXT:  
ffffd000`2f853b78 fffff802`6ca14ea5 : 00000000`00000080 fffff802`6c90d654 ffffe000`02e24880 ffffe000`02e24880 : nt!DebugPrompt+0x17
ffffd000`2f853b80 fffff800`02385a76 : ffffe000`00c3dcc0 fffff800`02468fc0 fffff800`0245f9a0 81010101`01010100 : nt!DbgPrompt+0x35
ffffd000`2f853bd0 fffff800`023b35b4 : fffff800`00000001 fffff800`02468fc0 00000000`0000039d ffffd000`2f853c00 : NDProt630+0xa5a76
ffffd000`2f853c20 fffff800`0242a153 : ffffe000`00c5f040 00000000`00000000 ffffe000`02c50018 00000000`00000000 : NDProt630+0xd35b4
ffffd000`2f853cf0 fffff802`6c8f6664 : ffffe000`00c5f078 00000000`00000000 ffffd000`00000001 ffffd000`2f853dc8 : NDProt630+0x14a153
ffffd000`2f853d40 fffff802`6c9656c6 : ffffd000`20880180 ffffe000`02e24880 ffffe000`0323f140 ffffd000`2fd736c0 : nt!PspSystemThreadStartup+0x58
ffffd000`2f853da0 00000000`00000000 : ffffd000`2f854000 ffffd000`2f84e000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


FOLLOWUP_IP: 
NDProt630+a5a76
fffff800`02385a76 8b442438        mov     eax,dword ptr [rsp+38h]

SYMBOL_STACK_INDEX:  2

SYMBOL_NAME:  NDProt630+a5a76

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: NDProt630

IMAGE_NAME:  NDProt630.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  550cea5c

STACK_COMMAND:  .cxr 0xffffd0002f853140 ; kb

FAILURE_BUCKET_ID:  AV_VRF_NDProt630+a5a76

BUCKET_ID:  AV_VRF_NDProt630+a5a76

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:av_vrf_ndprot630+a5a76

FAILURE_ID_HASH:  {d6e7f5d5-5abe-1d65-180b-ef9f5099ab2e}

Followup: MachineOwner
---------


************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       SRV*c:\symbols\*http://msdl.microsoft.com/download/symbols
Comment 42 Peixiu Hou 2016-06-07 03:57:06 EDT
Hi Dmitry,

The dump file MPE_2012R2_MEMORY.DMP.zip has been uploaded to this server,
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1013336/
Please help to check, thanks~

Best Regard~
Peixiu Hou
Comment 50 Peixiu Hou 2016-06-20 01:48 EDT
Created attachment 1169674 [details]
win7-32 job failed log
Comment 51 Peixiu Hou 2016-06-20 01:49:59 EDT
Hi Yan,Dmitry,

I retested this case with virtio-win prewhql-118, steps as comment#0, the result as follows:

with e1000 as message device

win7-32 passed 1/4 (failed 3 times, passed 1 time)
win7-64 passed 1/1
win2012-r2 passed 1/1

with rtl8139 as message device

win7-32 failed 4/4
win7-64 failed 3/3
win2012-r2 failed 4/4
win8-64/ win2012 passed 1/1

Retested this case with virtio-win prewhql-117, use rtl8139 as message device, all test can be passed.

The win7-32 failed HCK logs as attachment.


Best Regards~
Peixiu Hou
Comment 52 Yan Vugenfirer 2016-06-20 03:29:12 EDT
Thanks. We have additional fixes for MPE test that we will post today or tomorrow.
Comment 58 Peixiu Hou 2016-07-12 21:59:01 EDT
Hi Yan, 

Retested this case with you provide driver on Win2012R2, test passed.


Best Regards~
Peixiu Hou
Comment 70 Ladi Prosek 2016-07-26 05:51:25 EDT
Hi Peixiu Hou,

Would it be possible to retest with build 123? There's no need to capture the network traffic anymore, just run the MPE_Ethernet job as usual.

Thanks!
Ladi
Comment 71 Peixiu Hou 2016-07-28 01:44:53 EDT
(In reply to Ladi Prosek from comment #70)
> Hi Peixiu Hou,
> 
> Would it be possible to retest with build 123? There's no need to capture
> the network traffic anymore, just run the MPE_Ethernet job as usual.
> 
> Thanks!
> Ladi

Hi Ladi,

I retested this case with build 123 on Win2012-R2, the MPE_Ethernet job passed.


Best Regards~
Peixiu Hou
Comment 72 Ladi Prosek 2016-07-29 10:38:58 EDT
(In reply to Peixiu Hou from comment #71)
> (In reply to Ladi Prosek from comment #70)
> > Hi Peixiu Hou,
> > 
> > Would it be possible to retest with build 123? There's no need to capture
> > the network traffic anymore, just run the MPE_Ethernet job as usual.
> > 
> > Thanks!
> > Ladi
> 
> Hi Ladi,
> 
> I retested this case with build 123 on Win2012-R2, the MPE_Ethernet job
> passed.

Hi Peixiu Hou,

Sorry for not responding earlier and thanks for the good news. We should be good now then as 123 was built from the head of the branch.

Ladi
Comment 74 lijin 2016-09-02 04:23:51 EDT
MPE job passed on all windows guest with virtio-win-prewhql-126.

So change status to verified.
Comment 76 errata-xmlrpc 2016-11-04 04:43:14 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2609.html

Note You need to log in before you can comment on or make changes to this bug.