Bug 997368

Summary: [whql][netkvm]win8-64 NDIS sever guest BSOD(1e) when running RSC job
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: virtio-winAssignee: Dmitry Fleytman <dfleytma>
Status: CLOSED DEFERRED QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.5CC: acathrow, bcao, bsarathy, chayang, lijin, mdeng, michen, virt-bugs, yvugenfi
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-28 12:53:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Cao 2013-08-15 10:03:53 UTC
Description of problem:


Version-Release number of selected component (if applicable):
2.6.32-393.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.390.el6.x86_64
seabios-0.6.1.2-27.el6.x86_64
sgabios-0-0.3.20110621svn.el6.x86_64
virtio-win-prewhql-66

How reproducible:
2/2

Steps to Reproduce:
1.On one host : running win8-64 guest with -smp 8 as NDIS Client:
/usr/libexec/qemu-kvm -m 6G -smp 8,cores=8 -cpu cpu64-rhel6,+x2apic,+sep -usb -device usb-tablet -drive file=win8-64-nic1.raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,sndbuf=0,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=no -device virtio-net-pci,netdev=hostnet0,mac=00:32:45:22:51:12,bus=pci.0,addr=0x4,id=virtio-net-pci0,ctrl_guest_offloads=on -netdev tap,sndbuf=0,id=hostnet2,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet2,mac=00:22:42:13:17:2c,bus=pci.0,addr=0x6 -uuid d7a34f4d-b09a-4547-b150-3aee68c2767a -no-kvm-pit-reinjection -chardev socket,id=111a,path=/tmp/monitor-win8-64-64-nic1,server,nowait -mon chardev=111a,mode=readline -vnc :1 -vga cirrus -name win8-64-nic1-65-HCK-new -rtc base=localtime,clock=host,driftfix=slew -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio
2.on another host ,running guest with -smp 2 as NDIS Server
CLI:
/usr/libexec/qemu-kvm -m 2G -smp 2,cores=2 -cpu cpu64-rhel6,+x2apic -usb -device usb-tablet -drive file=win8-64-nic2.raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,sndbuf=0,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=no -device virtio-net-pci,netdev=hostnet0,mac=00:22:22:34:22:01,bus=pci.0,addr=0x4,id=virtio-net-pci0,ctrl_guest_offloads=on -netdev tap,sndbuf=0,id=hostnet2,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet2,mac=00:32:22:34:18:20,bus=pci.0,addr=0x6 -uuid 9740898f-cf07-4366-b575-793be7cde9f6 -no-kvm-pit-reinjection -chardev socket,id=111a,path=/tmp/monitor-win8-64-66-nic2,server,nowait -mon chardev=111a,mode=readline -vnc :2 -vga cirrus -name win8-64-nic2-66-HCK -rtc base=localtime,clock=host,driftfix=slew -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=0 -monitor stdio
3.Running RSC job

Actual results:
BSOD occurs win8-64 guest which is used as NDIS server 

Expected results:


Additional info:

Comment 1 Mike Cao 2013-08-15 10:05:29 UTC
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

KMODE_EXCEPTION_NOT_HANDLED (1e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffffc0000420, The exception code that was not handled
Arg2: fffff8800515141a, The address that the exception occurred at
Arg3: 0000000000000001, Parameter 0 of the exception
Arg4: 000000000000dd86, Parameter 1 of the exception

Debugging Details:
------------------

*** ERROR: Module load completed but symbols could not be loaded for netkvm.sys

EXCEPTION_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.

FAULTING_IP: 
spartadrv!SpartaReceiveNetBufferLists+7e
fffff880`0515141a cd2c            int     2Ch

EXCEPTION_PARAMETER1:  0000000000000001

EXCEPTION_PARAMETER2:  000000000000dd86

BUGCHECK_STR:  0x1E_c0000420

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  2

TAG_NOT_DEFINED_c000000f:  FFFFF802313BEFB0

EXCEPTION_RECORD:  000000000000c4c0 -- (.exr 0xc4c0)
Cannot read Exception record @ 000000000000c4c0

LAST_CONTROL_TRANSFER:  from fffff80231776546 to fffff8023167dd40

STACK_TEXT:  
fffff802`313b6e38 fffff802`31776546 : 00000000`0000001e ffffffff`c0000420 fffff880`0515141a 00000000`00000001 : nt!KeBugCheckEx
fffff802`313b6e40 fffff802`316de85d : fffff880`018c9f3e fffffa80`02a2f290 fffff802`313b6fb0 fffffa80`02ee6980 : nt!KiFatalExceptionHandler+0x22
fffff802`313b6e80 fffff802`316e05f3 : 00000000`00000000 fffff802`313b3000 fffffa80`00008868 fffff802`313b9000 : nt!RtlpExecuteHandlerForException+0xd
fffff802`313b6eb0 fffff802`316fca3e : fffff802`313b7db8 fffff802`313b7af0 fffff802`313b7db8 fffffa80`04e1e480 : nt!RtlDispatchException+0x44b
fffff802`313b75c0 fffff802`3167d142 : 00000000`0000c4c0 00000000`00000000 fffffa80`02881510 fffff880`01b5f8ce : nt!KiDispatchException+0x455
fffff802`313b7c80 fffff802`3167c6ed : fffffa80`04cf4b30 00000000`00000000 00000000`00000001 fffff880`0485d730 : nt!KiExceptionDispatch+0xc2
fffff802`313b7e60 fffff880`0515141a : fffffa80`028b9000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiRaiseAssertion+0xed
fffff802`313b7ff0 fffff880`018cae2e : 00000000`00000001 fffff880`01d64875 00000000`00000000 fffff802`313b86e0 : spartadrv!SpartaReceiveNetBufferLists+0x7e
fffff802`313b8670 fffff880`018ca6db : fffff802`00000002 fffffa80`04e1dd00 fffffa80`00000000 fffffb80`00000001 : ndis!ndisMIndicateNetBufferListsToOpen+0x373
fffff802`313b8710 fffff880`018caa05 : fffffa80`02d6a1a0 00000000`00000000 00000000`00000000 00000000`00001f00 : ndis!ndisInvokeNextReceiveHandler+0x5db
fffff802`313b87e0 fffff880`03c903ab : fffffa80`02e56000 00000000`00000001 fffffa80`02e565e8 fffffa80`02e56000 : ndis!NdisMIndicateReceiveNetBufferLists+0xc5
fffff802`313b8860 fffff880`03c89e4b : fffffa80`02e56000 00000000`00000001 fffffa80`02e565e8 fffffa80`02e565e8 : netkvm+0xc3ab
fffff802`313b88a0 fffff880`03c87020 : fffffa80`02e56000 00000000`00000010 00000000`00000003 00000000`00000000 : netkvm+0x5e4b
fffff802`313b88f0 fffff880`03c8f90b : fffffa80`02cfa4e0 00000000`000003e7 fffff802`313b8a20 fffffa80`02f48000 : netkvm+0x3020
fffff802`313b8930 fffff880`018c9e2e : 00000000`00000000 fffffa80`02cb8010 000001ee`2fe9aa0f fffff802`31d8303f : netkvm+0xb90b
fffff802`313b89c0 fffff880`018c9f3e : fffffa80`02cfa6c8 00000000`ffffffff fffffa80`02cfa4e0 00000000`2fcbd749 : ndis!ndisMiniportDpc+0xfe
fffff802`313b8a60 fffff802`31674968 : fffff802`318fbf00 fffff802`318f9180 fffffa80`0185f118 fffff802`313b8ca0 : ndis!ndisInterruptDpc+0x9e
fffff802`313b8af0 fffff802`316a4bd0 : fffffa80`02cb8010 00000000`ffffffff fffffa80`02cb82e8 00000000`00000000 : nt!KiExecuteAllDpcs+0x198
fffff802`313b8c30 fffff802`316a93fa : fffff802`318f9180 fffff802`318f9180 00000000`00183de0 fffff802`31953880 : nt!KiRetireDpcList+0xd0
fffff802`313b8da0 00000000`00000000 : fffff802`313b9000 fffff802`313b3000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x5a


STACK_COMMAND:  kb

FOLLOWUP_IP: 
spartadrv!SpartaReceiveNetBufferLists+7e
fffff880`0515141a cd2c            int     2Ch

SYMBOL_STACK_INDEX:  7

SYMBOL_NAME:  spartadrv!SpartaReceiveNetBufferLists+7e

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: spartadrv

IMAGE_NAME:  spartadrv.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  5010f925

BUCKET_ID_FUNC_OFFSET:  7e

FAILURE_BUCKET_ID:  0x1E_c0000420_spartadrv!SpartaReceiveNetBufferLists

BUCKET_ID:  0x1E_c0000420_spartadrv!SpartaReceiveNetBufferLists

Followup: MachineOwner
---------

Comment 3 Dmitry Fleytman 2013-08-18 14:26:02 UTC
Mike, what is NDIS server?
Is it a test machine or a support machine?

Thanks,
Dmitry

Comment 4 Mike Cao 2013-08-18 16:09:20 UTC
(In reply to Dmitry Fleytman from comment #3)
> Mike, what is NDIS server?
> Is it a test machine or a support machine?
> 
> Thanks,
> Dmitry

Support machine .
We will always see the test machine is running as NDIS Client ,and the Support machie is running as NDIS Server in the guest 

Mike

Comment 5 Dmitry Fleytman 2013-08-19 07:58:13 UTC
Dump analysis with loaded debug symbols:

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

KMODE_EXCEPTION_NOT_HANDLED (1e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffffc0000420, The exception code that was not handled
Arg2: fffff8800515141a, The address that the exception occurred at
Arg3: 0000000000000001, Parameter 0 of the exception
Arg4: 000000000000dd86, Parameter 1 of the exception

Debugging Details:
------------------


EXCEPTION_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.

FAULTING_IP: 
spartadrv!SpartaReceiveNetBufferLists+7e
fffff880`0515141a cd2c            int     2Ch

EXCEPTION_PARAMETER1:  0000000000000001

EXCEPTION_PARAMETER2:  000000000000dd86

BUGCHECK_STR:  0x1E_c0000420

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  2

TAG_NOT_DEFINED_c000000f:  FFFFF802313BEFB0

EXCEPTION_RECORD:  000000000000c4c0 -- (.exr 0xc4c0)
Cannot read Exception record @ 000000000000c4c0

LAST_CONTROL_TRANSFER:  from fffff80231776546 to fffff8023167dd40

STACK_TEXT:  
fffff802`313b6e38 fffff802`31776546 : 00000000`0000001e ffffffff`c0000420 fffff880`0515141a 00000000`00000001 : nt!KeBugCheckEx
fffff802`313b6e40 fffff802`316de85d : fffff880`018c9f3e fffffa80`02a2f290 fffff802`313b6fb0 fffffa80`02ee6980 : nt!KiFatalExceptionHandler+0x22
fffff802`313b6e80 fffff802`316e05f3 : 00000000`00000000 fffff802`313b3000 fffffa80`00008868 fffff802`313b9000 : nt!RtlpExecuteHandlerForException+0xd
fffff802`313b6eb0 fffff802`316fca3e : fffff802`313b7db8 fffff802`313b7af0 fffff802`313b7db8 fffffa80`04e1e480 : nt!RtlDispatchException+0x44b
fffff802`313b75c0 fffff802`3167d142 : 00000000`0000c4c0 00000000`00000000 fffffa80`02881510 fffff880`01b5f8ce : nt!KiDispatchException+0x455
fffff802`313b7c80 fffff802`3167c6ed : fffffa80`04cf4b30 00000000`00000000 00000000`00000001 fffff880`0485d730 : nt!KiExceptionDispatch+0xc2
fffff802`313b7e60 fffff880`0515141a : fffffa80`028b9000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiRaiseAssertion+0xed
fffff802`313b7ff0 fffff880`018cae2e : 00000000`00000001 fffff880`01d64875 00000000`00000000 fffff802`313b86e0 : spartadrv!SpartaReceiveNetBufferLists+0x7e
fffff802`313b8670 fffff880`018ca6db : fffff802`00000002 fffffa80`04e1dd00 fffffa80`00000000 fffffb80`00000001 : ndis!ndisMIndicateNetBufferListsToOpen+0x373
fffff802`313b8710 fffff880`018caa05 : fffffa80`02d6a1a0 00000000`00000000 00000000`00000000 00000000`00001f00 : ndis!ndisInvokeNextReceiveHandler+0x5db
fffff802`313b87e0 fffff880`03c903ab : fffffa80`02e56000 00000000`00000001 fffffa80`02e565e8 fffffa80`02e56000 : ndis!NdisMIndicateReceiveNetBufferLists+0xc5
fffff802`313b8860 fffff880`03c89e4b : fffffa80`02e56000 00000000`00000001 fffffa80`02e565e8 fffffa80`02e565e8 : netkvm!ParaNdis_IndicateReceivedBatch+0x47 [c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows\netkvm\wlh\parandis6-impl.c @ 936]
fffff802`313b88a0 fffff880`03c87020 : fffffa80`02e56000 00000000`00000010 00000000`00000003 00000000`00000000 : netkvm!ProcessReceiveQueue+0x1bf [c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows\netkvm\common\parandis-common.c @ 2166]
fffff802`313b88f0 fffff880`03c8f90b : fffffa80`02cfa4e0 00000000`000003e7 fffff802`313b8a20 fffffa80`02f48000 : netkvm!ParaNdis_DPCWorkBody+0xb4 [c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows\netkvm\common\parandis-common.c @ 2214]
fffff802`313b8930 fffff880`018c9e2e : 00000000`00000000 fffffa80`02cb8010 000001ee`2fe9aa0f fffff802`31d8303f : netkvm!MiniportMSIInterruptDpc+0x8f [c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows\netkvm\wlh\parandis6-impl.c @ 381]
fffff802`313b89c0 fffff880`018c9f3e : fffffa80`02cfa6c8 00000000`ffffffff fffffa80`02cfa4e0 00000000`2fcbd749 : ndis!ndisMiniportDpc+0xfe
fffff802`313b8a60 fffff802`31674968 : fffff802`318fbf00 fffff802`318f9180 fffffa80`0185f118 fffff802`313b8ca0 : ndis!ndisInterruptDpc+0x9e
fffff802`313b8af0 fffff802`316a4bd0 : fffffa80`02cb8010 00000000`ffffffff fffffa80`02cb82e8 00000000`00000000 : nt!KiExecuteAllDpcs+0x198
fffff802`313b8c30 fffff802`316a93fa : fffff802`318f9180 fffff802`318f9180 00000000`00183de0 fffff802`31953880 : nt!KiRetireDpcList+0xd0
fffff802`313b8da0 00000000`00000000 : fffff802`313b9000 fffff802`313b3000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x5a


STACK_COMMAND:  kb

FOLLOWUP_IP: 
spartadrv!SpartaReceiveNetBufferLists+7e
fffff880`0515141a cd2c            int     2Ch

SYMBOL_STACK_INDEX:  7

SYMBOL_NAME:  spartadrv!SpartaReceiveNetBufferLists+7e

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: spartadrv

IMAGE_NAME:  spartadrv.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  5010f925

BUCKET_ID_FUNC_OFFSET:  7e

FAILURE_BUCKET_ID:  0x1E_c0000420_spartadrv!SpartaReceiveNetBufferLists

BUCKET_ID:  0x1E_c0000420_spartadrv!SpartaReceiveNetBufferLists

Followup: MachineOwner
---------

Comment 6 Dmitry Fleytman 2013-08-19 08:22:38 UTC
Triage results:

The crash is an assertion in receive callback of MS test driver spartadrv.sys.
Upon packet receipt spartadrv checks its data size, if it is bigger than 1514 bytes assertion occurs. This is exactly the case we observe - packet of 4314 bytes received by device under test.

The packet itself doesn't look related to the test - it contains payload generated by Windows UPNP services, also the test operates with batches of short packets that shorter than 1514 bytes even after coalescing. This makes me think this is a hostile traffic interference.

According to HCK requirements there should be no hostile traffic in the test devices'  network segment. It there any way to physically isolate connection line between test devices, i.e. use dedicated pare of physical adapters connected back-to-back and bridged with test devices?

Comment 7 Dmitry Fleytman 2013-08-19 08:24:09 UTC
Hello Mike,

See my previous comment.
Is it possible to built topology as I suggested?

Thanks in advance,
Dmitry

Comment 8 Mike Cao 2013-08-19 14:33:51 UTC
(In reply to Dmitry Fleytman from comment #7)
> Hello Mike,
> 
> See my previous comment.
> Is it possible to built topology as I suggested?
> 
> Thanks in advance,
> Dmitry

It is not easy ... 
All the hosts are in a private subnet. How about poweroff all the VMs in other hosts in this subnet ?

Mike

Comment 9 Dmitry Fleytman 2013-08-19 14:53:19 UTC
Yes, this could help.
Also you could try to play with VLANs maybe...

Comment 10 Min Deng 2013-08-20 09:38:29 UTC
Hi All,
   We power off the other hosts in the same subnet and re-run the RSC job again.As a result,the guest that server was running on didn't get bsod but job itself still failed,QE will upload the latest hck files after a while.Thanks.
   Build info 
   kernel-2.6.32-413.el6.x86_64
   qemu-kvm-rhev-0.12.1.2-2.397.el6.x86_64
   seabios-0.6.1.2-28.el6.x86_64
   vgabios-0.6b-3.7.el6.noarch
   spice-server-0.12.4-2.el6.x86_64

Best Regards,
Min Deng

Comment 11 Min Deng 2013-08-20 09:46:17 UTC
Created attachment 788401 [details]
hck-log

Comment 12 Yvugenfi@redhat.com 2013-08-28 12:53:56 UTC
Closing this bug.

We will disable RSC support in production for RHEL 6.5 (BZ #1002073) until host side parts are developed (BZ #950611).