Bug 1401835

Summary: [virtio-win][netkvm] Guest win7-32 occurs BSoD when running NDISTest 6.0 - [1 Machine] - 1c_Mini6Send
Product: Red Hat Enterprise Linux 7 Reporter: Peixiu Hou <phou>
Component: qemu-kvm-rhevAssignee: ybendito
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: knoel, lijin, michen, phou, virt-maint, xiagao, ybendito
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-20 09:34:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
dbgview log with virtio-win-preqwhql-128
none
dbgview log with virtio-win-preqwhql-126 none

Description Peixiu Hou 2016-12-06 08:36:23 UTC
Description of problem:
BSoD occurs when running NDISTest 6.0 - [1 Machine] - 1c_Mini6Send under q35.
Guest:win-32

Version-Release number of selected component (if applicable):
kernel-3.10.0-524.el7.x86_64
qemu-kvm-rhev-2.6.0-27.el7.x86_64
seabios-1.9.1-5.el7.x86_64.rpm
virtio-win-prewhql-128

How reproducible:
5/5

Steps to Reproduce:
1./usr/libexec/qemu-kvm -name 128NICWIN732CHE -enable-kvm -m 2G -smp 2 -uuid 92397d96-6189-41b4-b518-d2ef49baff62 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/tmp/128NICWIN732CHE,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb -drive file=128NICWIN732CHE,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=en_windows_7_ultimate_with_sp1_x86_dvd_u_677460.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=128NICWIN732CHE.vfd,if=floppy,id=drive-fdc0-0-0,format=raw,cache=none -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:52:73:18:c5:e8 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -vga cirrus -M q35 -device ioh3420,bus=pcie.0,id=root1.0,slot=1 -netdev tap,script=/etc/qemu-ifup1,downscript=no,id=hostnet1,vhost=on,queues=2 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:19:62:aa:d3,bus=root1.0,mq=on,vectors=6
2.Run job NDISTest 6.0 - [1 Machine] - 1c_Mini6Send 
3.Check the job status

Actual results:
BSOD

Expected results:
Pass

Additional info:
1. Tried with single queue under q35, reproduced this issue.
2. Tried connect the netkvm device to the bus root1.0, reproduced this issue.
3. Tried connect the netkvm device to the bus pcie.0, reproduced this issue.
4. Memory Dump file Location:
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/virtio-win/1c_Mini6Send/

Comment 2 Peixiu Hou 2016-12-07 03:05:36 UTC
The memory dump file analyze info as following, and will continue debug where the issue happens in~

1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: 80000003, The exception code that was not handled
Arg2: 9928d195, The address that the exception occurred at
Arg3: 933f5bf4, Exception Record Address
Arg4: 933f57d0, Context Record Address

Debugging Details:
------------------


EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

FAULTING_IP: 
NDProt62+6c195
9928d195 cc              int     3

EXCEPTION_RECORD:  933f5bf4 -- (.exr 0xffffffff933f5bf4)
ExceptionAddress: 9928d195 (NDProt62+0x0006c195)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 3
   Parameter[0]: 00000000
   Parameter[1]: 86077d48
   Parameter[2]: 00000000

CONTEXT:  933f57d0 -- (.cxr 0xffffffff933f57d0;r)
eax=99360138 ebx=00000000 ecx=7785bb95 edx=00000000 esi=000036b0 edi=00000000
eip=9928d195 esp=933f5cbc ebp=933f5cbc iopl=0         nv up ei ng nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000282
NDProt62+0x6c195:
9928d195 cc              int     3
Last set context:
eax=99360138 ebx=00000000 ecx=7785bb95 edx=00000000 esi=000036b0 edi=00000000
eip=9928d195 esp=933f5cbc ebp=933f5cbc iopl=0         nv up ei ng nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000282
NDProt62+0x6c195:
9928d195 cc              int     3
Resetting default scope

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

BUGCHECK_STR:  0x7E

PROCESS_NAME:  System

CURRENT_IRQL:  0

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_PARAMETER1:  00000000

EXCEPTION_PARAMETER2:  86077d48

EXCEPTION_PARAMETER3:  00000000

ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre

LAST_CONTROL_TRANSFER:  from 99292923 to 9928d195

STACK_TEXT:  
WARNING: Stack unwind information not available. Following frames may be wrong.
933f5cbc 99292923 99336334 00000380 99336e1c NDProt62+0x6c195
933f5d30 992abf70 7627621a 00000000 00000100 NDProt62+0x71923
933f5d50 8280df5e 84701174 8fdd445d 00000000 NDProt62+0x8af70
933f5d90 826b5219 992abf30 84701174 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19


FOLLOWUP_IP: 
NDProt62+6c195
9928d195 cc              int     3

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  NDProt62+6c195

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: NDProt62

IMAGE_NAME:  NDProt62.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  550cdfe1

STACK_COMMAND:  .cxr 0xffffffff933f57d0 ; kb

FAILURE_BUCKET_ID:  0x7E_NDProt62+6c195

BUCKET_ID:  0x7E_NDProt62+6c195

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:0x7e_ndprot62+6c195

FAILURE_ID_HASH:  {96abaf59-83f2-e645-196f-0abb49232b07}

Followup: MachineOwner
---------

Comment 3 ybendito 2016-12-07 17:51:35 UTC
MSFT test driver debug
1: kd> !ndtkd.assertions
Debugging Module: ndprot62
NDPROT assertions hit so far: 1
Seq.#	 Message	 - 	Source control
000	 "pCurrentNetBufferList"
[testsrc\nettest\ndis\ndistest\commengine\legacy\simplesendcommmanager.cpp @ 896]
1: kd> !ndtkd.debugspew
Debugging Module: ndprot62
Available debugger log messages: 20
Seq.#		 Debugger log
010	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_WAIT_FOR_SEND_COMPLETION
009	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_NET_BUFFERS
008	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_RESULTS
007	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_WAIT_FOR_SEND_COMPLETION
006	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_NET_BUFFERS
005	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_RESULTS
004	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_WAIT_FOR_SEND_COMPLETION
003	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_NET_BUFFERS
002	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_RESULTS
001	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_WAIT_FOR_SEND_COMPLETION
000	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_NET_BUFFERS
019	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_RESULTS
018	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_WAIT_FOR_SEND_COMPLETION
017	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_NET_BUFFERS
016	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_RESULTS
015	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_WAIT_FOR_SEND_COMPLETION
014	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_NET_BUFFERS
013	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_RESULTS
012	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_WAIT_FOR_SEND_COMPLETION
011	 [CNDTCLEndPoint::OnNdisTestCommand] Processing CMD_ENDPOINT_SEND_NET_BUFFERS

Comment 4 Peixiu Hou 2016-12-19 08:46:54 UTC
Isolation:

1. On following newer test builds:           
kernel-3.10.0-524.el7.x86_64
qemu-kvm-rhev-2.6.0-27.el7.x86_64 
seabios-1.9.1-5.el7.x86_64.rpm

1).Tried this case with virtio-prewhql-128 and with mq under pc, this bug reproduced, bsod occurred.
2).Tried this case with virtio-prewhql-128 and without mq under pc, this bug reproduced, bsod occurred.
3).Tried this case with virtio-prewhql-126 and with mq under pc, this bug reproduced, bsod occurred.
4).Tried this case with virtio-prewhql-126 and without mq under pc, this bug reproduced, bsod occurred.
 
2. Downgrade to following old builds:
kernel-3.10.0-493.el7.x86_64
qemu-kvm-rhev-2.6.0-22.el7.x86_64 
seabios-1.9.1-4.el7.x86_64.rpm

1).Tried this case with virtio-prewhql-126 and with params 'disable-legacy=on,disable-modern=off' and w/o mq under pc, this bug reproduced.
2).Tried this case with virtio-prewhql-126/128 and with params 'disable-legacy=off,disable-modern=off' and w/o mq under pc, this bug reproduced.
3).Tried this case with virtio-prewhql-126/128 and with params 'disable-legacy=off,disable-modern=on' and w/o mq under pc, this bug reproduced.
4).Tried this case with params 'disable-legacy=on,disable-modern=on' under pc, on that situation, the netkvm driver cannot be installed normally(with code10 error), only 1 e1000 device worked on the guest, tried to run this job, it can be passed, none bsod occurred.

For upper results, we also feel so strange, before this job was passed with build-126~ 


Best Regards~
Peixiu

Comment 5 ybendito 2016-12-20 15:08:42 UTC
Please provide data about host CPU.
What is the physical platform used for test?
Is possible to know on which platform this job passed previously?

Comment 6 ybendito 2016-12-20 16:35:08 UTC
Are there results of NDISTest on other operating systems with -M q35?

Comment 7 Peixiu Hou 2016-12-21 09:36:08 UTC
Hi,

Additional info for this issue:
1. I tried this case on 2 hosts, the same result. The platform and cpu info as follows:
----------------------------------------------------------------------------
1). Hostname: dell-me02-pem620-01.lab.eng.pek2.redhat.com
CPU info:
processor	: 23
vendor_id	: GenuineIntel
cpu family	: 6
model		: 45
model name	: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
stepping	: 7
microcode	: 0x710
cpu MHz		: 1201.796
cache size	: 15360 KB
physical id	: 1
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 43
initial apicid	: 43
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat pln pts dtherm tpr_shadow vnmi flexpriority ept vpid xsaveopt
bogomips	: 4003.85
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

2). Hostname: dell-me02-pem610-06.lab.eng.pek2.redhat.com
CPU info:
processor	: 15
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
stepping	: 5
microcode	: 0x19
cpu MHz		: 2261.017
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 4521.29
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:
--------------------------------------------------------------------------------

2. The platform of this job passed previously is dell-me02-pem610-11.lab.eng.pek2.redhat.com, but I tried to run this job on it, with the same passed previously version(kernel,qemu,seabios,virtio-win), remain occurred bsod, uninstall the netkvm driver, run with e1000, it can be passed.

3. With -M q35, this job passed on win7-64 & 2008-r2 & win8-32/64 & win2012/r2. Only on win7-32, it occurred bsod. And tried with virtio-win-prewhql-126 under q35, it also occurred bsod, no matter with single queue or multi-queue. 


Best Regards~
Peixiu Hou

Comment 9 ybendito 2016-12-21 14:12:10 UTC
Please without running the NDISTEst job just login to the machine under test (guest system), disable the redhat virtio ethernet adapter in the device manager, run dbgview as administrator with 'capture kernel = on', enable virtio ethernet, save log collected in dbgview and attach it to the bug.
If the disable operation does not finish - please try 'set_link <virtio device id> off'

Thanks,
Yuri

Comment 11 Peixiu Hou 2016-12-22 03:34:47 UTC
Created attachment 1234603 [details]
dbgview log with virtio-win-preqwhql-128

Comment 12 Peixiu Hou 2016-12-22 03:35:14 UTC
Created attachment 1234604 [details]
dbgview log with virtio-win-preqwhql-126

Comment 13 Peixiu Hou 2016-12-27 02:28:23 UTC
Reproduced this issue with virtio-win-prewhql-129 under q35, tried with disable-legacy=off,disable-modern=on' and the default value, both reproduced.

Comment 14 ybendito 2017-01-08 15:57:57 UTC
Please retry without CDROM device

Comment 16 Peixiu Hou 2017-01-11 15:00:15 UTC
On win7-32: 
Reproduced this issue with virtio-win-prewhql-129 and multi-queue under pc.
Cannot reproduce this issue with virtio-win-prewhql-129 and sigle-queue under pc.

The dump file(bsod(7e)_1cmini6send_pc.DMP.zip) has uploaded to follow location:
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/virtio-win/1c_Mini6Send/

Best Regards~
Peixiu Hou

Comment 21 xiagao 2017-03-22 04:32:58 UTC
still fail in virtio-win-prewhql-134

Comment 22 xiagao 2017-03-24 02:07:36 UTC
"NDISTest 6.0 - [2 Machine] - 2c_Mini6RSSSendRecv" also hit this issue on win732 guest in virtio-win-prewhql-134.