Bug 1408771

Summary: [virtio-win][viorng] Guest win2008-32 occurs BSoD when running job "WDF Logo Tests - Final" under q35
Product: Red Hat Enterprise Linux 7 Reporter: Peixiu Hou <phou>
Component: virtio-winAssignee: ybendito
virtio-win sub component: virtio-win-prewhql QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: ailan, lijin, lmiksik, lprosek, phou, vrozenfe, yvugenfi
Version: 7.3   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 12:55:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peixiu Hou 2016-12-27 06:41:27 UTC
Description of problem:
Guest win2008-32 occurs BSoD(7E) when running job "WDF Logo Tests - Final"

Version-Release number of selected component (if applicable):
kernle-3.10.0-537.el7.x86_64
qemu-kvm-rhev-2.6.0-29.el7.x86_64
seabios-1.9.1-5.el7.x86_64
virtio-win-prewhql-129

How reproducible:
3/3

Steps to Reproduce:
1.Boot with cli:
/usr/libexec/qemu-kvm -name 129RNG200832T8W -enable-kvm -m 4G -smp 4 -uuid 5a346a32-6590-4daa-a5b6-e735768e0c19 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/tmp/129RNG200832T8W,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb -drive file=129RNG200832T8W,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=en_windows_server_2008_datacenter_enterprise_standard_sp2_x86_dvd_342333.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=129RNG200832T8W.vfd,if=floppy,id=drive-fdc0-0-0,format=raw,cache=none -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:52:6b:5c:16:d1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -vga cirrus -M q35 -device ioh3420,bus=pcie.0,id=root1.0,slot=1 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0
2. Run the job "WDF Logo Tests - Final""
3. Check the guest status

Actual results:
BSOD

Expected results:
Normally pass

Additional info:
1. Tried with -M q35 and 'disable-modern=on' on build 129, reproduced the issue.
2. Tried with -M pc on build 129, cannot reproduce the issue.
2. Host CPU info:
processor	: 15
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
stepping	: 5
microcode	: 0x19
cpu MHz		: 2261.009
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 4521.29
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

Comment 2 ybendito 2016-12-28 09:15:53 UTC
Please provide link to zipped memory.dmp file

Comment 3 Peixiu Hou 2016-12-29 07:09:39 UTC
Hi,

Ran this job again, we also occurred another BSOD(D1), reproduced BSOD(D1) 2 times and BSOD(7E) 1 time.

Reproduced BSOD(D1) with the command "-M q35 -device ioh3420,bus=pcie.0,id=root1.0,slot=1 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0".
Reproduced BSOD(D1) with the command "-M q35 -device ioh3420,bus=pcie.0,id=root1.0,slot=1 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0,bus=root1.0".
Reproduced BSOD(7E) with the command "-M q35 -device ioh3420,bus=pcie.0,id=root1.0,slot=1 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0,bus=root1.0"

The BSOD(D1) dump analyze as following:
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0b9f4640, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000001, value 0 = read operation, 1 = write operation
Arg4: 806d0c53, address which referenced memory

Debugging Details:
------------------


WRITE_ADDRESS:  0b9f4640 

CURRENT_IRQL:  2

FAULTING_IP: 
NDIS!ndisMTimerDpc+37
806d0c53 002400          add     byte ptr [eax+eax],ah

DEFAULT_BUCKET_ID:  CODE_CORRUPTION

BUGCHECK_STR:  0xD1

PROCESS_NAME:  System

ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre

TRAP_FRAME:  81937ad8 -- (.trap 0xffffffff81937ad8)
ErrCode = 00000002
eax=85cfa320 ebx=00000000 ecx=8197a302 edx=00000000 esi=858040e8 edi=81937b94
eip=806d0c53 esp=81937b4c ebp=81937b68 iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246
NDIS!ndisMTimerDpc+0x37:
806d0c53 002400          add     byte ptr [eax+eax],ah      ds:0023:0b9f4640=??
Resetting default scope

LAST_CONTROL_TRANSFER:  from 806d0c53 to 8188ffb9

STACK_TEXT:  
81937ad8 806d0c53 badb0d00 00000000 818e8a00 nt!KiTrap0E+0x2e1
81937b68 818ec2eb 85cfa348 85cfa320 77140f5c NDIS!ndisMTimerDpc+0x37
81937c88 818ebf21 81937cd0 85785402 81937cd8 nt!KiTimerListExpire+0x367
81937ce8 818ec615 00000000 00000000 000010a4 nt!KiTimerExpiration+0x2a0
81937d50 818ea87d 00000000 0000000e 00000000 nt!KiRetireDpcList+0xba
81937d54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x49


STACK_COMMAND:  kb

CHKIMG_EXTENSION: !chkimg -lo 50 -d !NDIS
    806d0c48-806d0c59  18 bytes - NDIS!ndisMTimerDpc+2c
	[ 70 64 80 89 45 f0 89 55:f0 00 05 00 00 00 00 0a ]
18 errors : !NDIS (806d0c48-806d0c59)

MODULE_NAME: memory_corruption

IMAGE_NAME:  memory_corruption

FOLLOWUP_NAME:  memory_corruption

DEBUG_FLR_IMAGE_TIMESTAMP:  0

MEMORY_CORRUPTOR:  LARGE

FAILURE_BUCKET_ID:  MEMORY_CORRUPTION_LARGE

BUCKET_ID:  MEMORY_CORRUPTION_LARGE

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:memory_corruption_large

FAILURE_ID_HASH:  {e29154ac-69a4-0eb8-172a-a860f73c0a3c}

Followup: memory_corruption
---------

And the BSOD(7E) dump analyze as following:
2: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 806ed848, The address that the exception occurred at
Arg3: 8ddeb5ac, Exception Record Address
Arg4: 8ddeb2a8, Context Record Address

Debugging Details:
------------------


EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

FAULTING_IP: 
NDIS!ndisRegisterMiniportDriver+108
806ed848 f0000500000000  lock add byte ptr ds:[0],al

EXCEPTION_RECORD:  8ddeb5ac -- (.exr 0xffffffff8ddeb5ac)
ExceptionAddress: 806ed848 (NDIS!ndisRegisterMiniportDriver+0x00000108)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000001
   Parameter[1]: 00000000
Attempt to write to address 00000000

CONTEXT:  8ddeb2a8 -- (.cxr 0xffffffff8ddeb2a8;r)
eax=85a00dd0 ebx=0000019c ecx=85a00d98 edx=00000000 esi=85a00f40 edi=8ddeb6dc
eip=806ed848 esp=8ddeb674 ebp=8ddeb694 iopl=0         nv up ei ng nz ac po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00210292
NDIS!ndisRegisterMiniportDriver+0x108:
806ed848 f0000500000000  lock add byte ptr ds:[0],al        ds:0023:00000000=??
Last set context:
eax=85a00dd0 ebx=0000019c ecx=85a00d98 edx=00000000 esi=85a00f40 edi=8ddeb6dc
eip=806ed848 esp=8ddeb674 ebp=8ddeb694 iopl=0         nv up ei ng nz ac po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00210292
NDIS!ndisRegisterMiniportDriver+0x108:
806ed848 f0000500000000  lock add byte ptr ds:[0],al        ds:0023:00000000=??
Resetting default scope

DEFAULT_BUCKET_ID:  CODE_CORRUPTION

PROCESS_NAME:  System

CURRENT_IRQL:  0

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_PARAMETER1:  00000001

EXCEPTION_PARAMETER2:  00000000

WRITE_ADDRESS:  00000000 

FOLLOWUP_IP: 
NDIS!ndisRegisterMiniportDriver+108
806ed848 f0000500000000  lock add byte ptr ds:[0],al

BUGCHECK_STR:  0x7E

ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre

LOCK_ADDRESS:  8196e600 -- (!locks 8196e600)

Resource @ nt!PiEngineLock (0x8196e600)    Exclusively owned
     Threads: 8549a2d8-01<*> 
1 total locks, 1 locks currently held

PNP_TRIAGE: 
	Lock address  : 0x8196e600
	Thread Count  : 1
	Thread address: 0x8549a2d8
	Thread wait   : 0x1ec

LAST_CONTROL_TRANSFER:  from 81a0fc87 to 81907b0d

STACK_TEXT:  
8ddeb694 806ed709 00000060 8ddeb6dc 00000060 NDIS!ndisRegisterMiniportDriver+0x108
8ddeb6bc 90c930f9 85a00f40 8ddeb6dc 00000060 NDIS!NdisMRegisterMiniport+0x7f
8ddeb740 819a5a68 859fe808 859ff000 8ddeba98 rasl2tp!DriverEntry+0xb6
8ddeb924 8199dcec 00000000 8ddeb900 8ddeb954 nt!IopLoadDriver+0x805
8ddeb968 81a0d2e1 8dbc7a38 00000001 8dbc7a24 nt!PipCallDriverAddDeviceQueryRoutine+0x309
8ddeb9a0 81a0d611 00000001 8ddeba98 8199d9e3 nt!RtlpCallQueryRegistryRoutine+0x28e
8ddeba0c 8199c4f4 40000000 80000048 8ddeba40 nt!RtlQueryRegistryValues+0x31b
8ddebaf0 8199ba27 00000000 8ddebd38 8196c550 nt!PipCallDriverAddDevice+0x2ff
8ddebcec 81846714 854733f8 859cb4d8 8ddebd38 nt!PipProcessDevNodeTree+0x15c
8ddebd44 818dfe22 00000000 00000000 8549a2d8 nt!PnpDeviceActionWorker+0x229
8ddebd7c 81a0fc42 00000000 d9d20bf3 00000000 nt!ExpWorkerThread+0xfd
8ddebdc0 81878efe 818dfd25 00000001 00000000 nt!PspSystemThreadStartup+0x9d
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16


CHKIMG_EXTENSION: !chkimg -lo 50 -d !NDIS
    806ed848-806ed859  18 bytes - NDIS!ndisRegisterMiniportDriver+108
	[ 57 50 e8 bf 1f f2 ff 83:f0 00 05 00 00 00 00 0a ]
18 errors : !NDIS (806ed848-806ed859)

MODULE_NAME: memory_corruption

IMAGE_NAME:  memory_corruption

FOLLOWUP_NAME:  memory_corruption

DEBUG_FLR_IMAGE_TIMESTAMP:  0

MEMORY_CORRUPTOR:  LARGE

STACK_COMMAND:  .cxr 0xffffffff8ddeb2a8 ; kb

FAILURE_BUCKET_ID:  MEMORY_CORRUPTION_LARGE

BUCKET_ID:  MEMORY_CORRUPTION_LARGE

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:memory_corruption_large

FAILURE_ID_HASH:  {e29154ac-69a4-0eb8-172a-a860f73c0a3c}

Followup: memory_corruption
---------

Comment 5 Peixiu Hou 2016-12-30 07:10:54 UTC
On 129 balloon whql test under q35, occurred the same issue, hit bsod(7e) when run the job "WDF Logo Tests - Final" on win2008-32.

Comment 8 ybendito 2017-01-03 18:18:36 UTC
If CDROM is not mandatory in this test, please remote CDROM from command line and rerun the test. If BSOD happens, please refer the dump file.

Comment 13 Peixiu Hou 2017-03-24 06:17:06 UTC
Verified this issue with qemu-kvm-rhev-2.8.0-5.el7 under q35, it can be passed.

kernel-3.10.0-612.el7.x86_64
qemu-kvm-rhev-2.8.0-5.el7
virtio-win-prewhql-129

Best Regards~
Peixiu Hou

Comment 14 lijin 2017-03-24 09:32:38 UTC
change status to verified according to comment#13

Comment 15 lijin 2017-05-11 05:46:23 UTC
Hi Amnon,

Could you help to ack?

Thanks

Comment 18 errata-xmlrpc 2017-08-01 12:55:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2341