Bug 1079147

Summary: [WHQL][balloon][virtio-rng]ob named DPWLK-HotADD-Device Test- Verify dirver support for Hot-Add CPU made win2k8-R2 BSOD (0x7E)
Product: Red Hat Enterprise Linux 7 Reporter: Min Deng <mdeng>
Component: qemu-kvmAssignee: Yvugenfi <yvugenfi>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: bcao, hhuang, juzhang, kraxel, mdeng, michen, mrezanin, rbalakri, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 08:05:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Min Deng 2014-03-21 04:54:40 UTC
Description of problem:
Job named DP WLK-HotADD-Device Test- Verify dirver support  for Hot-Add CPU made win2k8-R2 BSOD with 0x7E 

Version-Release number of selected component (if applicable):
Build info,
kernel-3.10.0-111.el7.x86_64
qemu-kvm-rhev-1.5.3-53.el7.x86_64
virtio-win-prewhql-0.1-75

How reproducible:
3/3
Steps to Reproduce:
1.Boot up guest with the following CLI
  /usr/libexec/qemu-kvm -M pc -m 6G -smp 4 -cpu Nehalem,+x2apic,hv_spinlocks=0x1fff,hv_relaxed,hv_vapic,hv_time -usb -device usb-tablet -drive file=win2k8-R2-balloon.raw,format=raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,id=hostnet0,script=/etc/qemu-ifup1 -device e1000,netdev=hostnet0,mac=00:42:36:28:34:22,id=net0 -uuid 1cedf92f-cdd9-4b01-8487-35fb58dcc82e -rtc-td-hack -no-kvm-pit-reinjection -chardev socket,id=a,path=/tmp/monitor-win2k8R2-serial,server,nowait -mon chardev=a,mode=readline -name win2k8-R2-balloon -device virtio-balloon-pci,id=balloon0 -vnc :1 -vga cirrus -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -monitor stdio
2.submit the job 
3.

Actual results:
The job failed as BSOD,it occurred while (1102)AddCPUs sub-job was executed.

Expected results:
The job could pass

Additional info:
MODULE_NAME: hidusb

FAULTING_MODULE: fffff80001418000 nt

DEBUG_FLR_IMAGE_TIMESTAMP:  4ce7a665

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

FAULTING_IP: 
hidusb+22bd
fffff880`02d9e2bd 41395110        cmp     dword ptr [r9+10h],edx

EXCEPTION_RECORD:  fffff88001f45678 -- (.exr 0xfffff88001f45678)
ExceptionAddress: fffff88002d9e2bd (hidusb+0x00000000000022bd)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000010
Attempt to read from address 0000000000000010

CONTEXT:  fffff88001f44ed0 -- (.cxr 0xfffff88001f44ed0;r)
rax=fffffa8006381b30 rbx=fffffa8005cb3c20 rcx=fffffa8006381db8
rdx=0000000000000000 rsi=fffffa8006381db8 rdi=fffffa8005cb3c20
rip=fffff88002d9e2bd rsp=fffff88001f458b8 rbp=fffffa80063819e0
 r8=0000000000000000  r9=0000000000000000 r10=000000000000002c
r11=fffff88001f458e0 r12=fffffa8006381db8 r13=fffffa8005ca16f0
r14=fffffa8005cb3dc8 r15=0000000000000006
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
hidusb+0x22bd:
fffff880`02d9e2bd 41395110        cmp     dword ptr [r9+10h],edx ds:002b:00000000`00000010=????????
Last set context:
rax=fffffa8006381b30 rbx=fffffa8005cb3c20 rcx=fffffa8006381db8
rdx=0000000000000000 rsi=fffffa8006381db8 rdi=fffffa8005cb3c20
rip=fffff88002d9e2bd rsp=fffff88001f458b8 rbp=fffffa80063819e0
 r8=0000000000000000  r9=0000000000000000 r10=000000000000002c
r11=fffff88001f458e0 r12=fffffa8006381db8 r13=fffffa8005ca16f0
r14=fffffa8005cb3dc8 r15=0000000000000006
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
hidusb+0x22bd:
fffff880`02d9e2bd 41395110        cmp     dword ptr [r9+10h],edx ds:002b:00000000`00000010=????????
Resetting default scope

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

BUGCHECK_STR:  0x7E

CURRENT_IRQL:  0

ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre

LAST_CONTROL_TRANSFER:  from fffff88002d9e3a7 to fffff88002d9e2bd

STACK_TEXT:  
fffff880`01f458b8 fffff880`02d9e3a7 : fffffa80`062622b0 fffff880`02db5b2b fffffa80`7062000a fffff880`01f45938 : hidusb+0x22bd
fffff880`01f458c0 fffff880`02d9df5b : fffffa80`05cb3dc8 fffffa80`06381b01 fffff880`01f45980 fffffa80`063819e0 : hidusb+0x23a7
fffff880`01f45910 fffff880`02dae555 : fffffa80`06382901 fffffa80`06382960 fffffa80`05cb3c20 fffffa80`06381b50 : hidusb+0x1f5b
fffff880`01f45980 fffff880`02daee81 : 00000000`00000000 00000000`00000000 00000000`00000000 fffffa80`06cfa010 : HIDCLASS+0x4555
fffff880`01f459e0 fffff880`02db341d : fffffa80`06cfa100 00000000`00000004 fffffa80`06381b50 fffffa80`06cfa010 : HIDCLASS+0x4e81
fffff880`01f45a30 fffff800`014915d1 : fffffa80`06cfa1bb fffffa80`00000000 fffffa80`06cfa010 fffffa80`05c1c1a0 : HIDCLASS!HidNotifyPresence+0x3991
fffff880`01f45ac0 fffff880`02d1d60a : 00000000`00000001 fffffa80`00000000 fffffa80`00000001 00000000`00000000 : nt!KeWaitForMultipleObjects+0x1751
fffff880`01f45bb0 fffff880`02d0473f : fffffa80`05c1c1a0 00000000`00000000 00000000`c0007000 fffffa80`06cfa170 : usbhub+0x1f60a
fffff880`01f45c40 fffff800`01785af3 : fffffa80`05c1c050 fffffa80`054dc660 fffffa80`06933a00 fffffa80`054dc660 : usbhub+0x673f
fffff880`01f45c80 fffff800`01497261 : fffff800`01633200 fffff800`01785a00 fffffa80`054dc600 0843f600`00000005 : nt!NtWaitForSingleObject+0x673
fffff880`01f45cb0 fffff800`0172bbae : 40fff25b`51e80000 fffffa80`054dc660 00000000`00000080 fffffa80`05486990 : nt!KeReleaseInStackQueuedSpinLock+0x2f1
fffff880`01f45d40 fffff800`0147e8c6 : fffff800`01608e80 fffffa80`054dc660 fffffa80`054dcb50 7c894860`245c894c : nt!PsCreateSystemThread+0x1da
fffff880`01f45d80 00000000`00000000 : fffff880`01f46000 fffff880`01f40000 fffff880`01f45850 00000000`00000000 : nt!KeInitializeSemaphore+0x24a


FOLLOWUP_IP: 
hidusb+22bd
fffff880`02d9e2bd 41395110        cmp     dword ptr [r9+10h],edx

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  hidusb+22bd

FOLLOWUP_NAME:  MachineOwner

IMAGE_NAME:  hidusb.sys

STACK_COMMAND:  .cxr 0xfffff88001f44ed0 ; kb

BUCKET_ID:  WRONG_SYMBOLS

FAILURE_BUCKET_ID:  WRONG_SYMBOLS

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:wrong_symbols

FAILURE_ID_HASH:  {70b057e8-2462-896f-28e7-ac72d4d365f8}

Followup: MachineOwner
---------

1: kd> .cxr 0xfffff88001f44ed0;r
rax=fffffa8006381b30 rbx=fffffa8005cb3c20 rcx=fffffa8006381db8
rdx=0000000000000000 rsi=fffffa8006381db8 rdi=fffffa8005cb3c20
rip=fffff88002d9e2bd rsp=fffff88001f458b8 rbp=fffffa80063819e0
 r8=0000000000000000  r9=0000000000000000 r10=000000000000002c
r11=fffff88001f458e0 r12=fffffa8006381db8 r13=fffffa8005ca16f0
r14=fffffa8005cb3dc8 r15=0000000000000006
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
hidusb+0x22bd:
fffff880`02d9e2bd 41395110        cmp     dword ptr [r9+10h],edx ds:002b:00000000`00000010=????????
Last set context:
rax=fffffa8006381b30 rbx=fffffa8005cb3c20 rcx=fffffa8006381db8
rdx=0000000000000000 rsi=fffffa8006381db8 rdi=fffffa8005cb3c20
rip=fffff88002d9e2bd rsp=fffff88001f458b8 rbp=fffffa80063819e0
 r8=0000000000000000  r9=0000000000000000 r10=000000000000002c
r11=fffff88001f458e0 r12=fffffa8006381db8 r13=fffffa8005ca16f0
r14=fffffa8005cb3dc8 r15=0000000000000006
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
hidusb+0x22bd:
fffff880`02d9e2bd 41395110        cmp     dword ptr [r9+10h],edx ds:002b:00000000`00000010=????????

Comment 3 Yvugenfi@redhat.com 2014-03-23 12:31:04 UTC
By looking at the stack in the description I can tell that the crash is in hid driver (in this case it is used for USB tablet).

Comment 4 Yvugenfi@redhat.com 2014-03-23 15:58:23 UTC
Do you see the same problem without hv_spinlocks=0x1fff,hv_telaxed flags?

Comment 5 Mike Cao 2014-03-24 01:52:52 UTC
pls retest it w/o -usb -device usb-tablet as well

Comment 6 Min Deng 2014-03-24 06:10:18 UTC
refer to comment4 and comment5,job still got BSOD.

Comment 7 Yvugenfi@redhat.com 2014-03-24 10:29:41 UTC
(In reply to dengmin from comment #6)
> refer to comment4 and comment5,job still got BSOD.

Can you provide the crash dump please?

Thanks,
Yan.

Comment 8 Min Deng 2014-03-25 03:12:25 UTC
(In reply to Yan Vugenfirer from comment #7)
> (In reply to dengmin from comment #6)
> > refer to comment4 and comment5,job still got BSOD.
> 
> Can you provide the crash dump please?
> 
> Thanks,
> Yan.

I've uploaded it please see comment1

Comment 9 Min Deng 2014-03-25 08:26:03 UTC
  Excuse me,as there was something wrong in my previous steps I summarize the latest results for comments comment 4 and comment 5.
  For comment4,still got BSOD and put the dump in the same location -comment 1
  For comment5,job could pass.

Comment 10 Yvugenfi@redhat.com 2014-03-25 09:35:52 UTC
The second crash:

Loading Dump File [E:\temp\Yan\dumps\BZ1079147\MEMORYnew.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is: C:\Users\yan\symbols\local;SRV*C:\Users\yan\symbols\websymbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 7 Kernel Version 7601 (Service Pack 1) UP Free x64
Product: Server, suite: TerminalServer DataCenter SingleUserTS
Built by: 7601.18205.amd64fre.win7sp1_gdr.130708-1532
Machine Name:
Kernel base = 0xfffff800`01468000 PsLoadedModuleList = 0xfffff800`016ab6d0
Debug session time: Mon Mar 24 23:49:49.375 2014 (UTC + 2:00)
System Uptime: 0 days 0:01:45.265
Loading Kernel Symbols
...............................................................
...........................................................
Loading User Symbols
PEB is paged out (Peb.Ldr = 000007ff`fffdf018).  Type ".hh dbgerr001" for details
Loading unloaded module list
.....
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 51, {1, fffff8a000023010, 783000, 374}

Page 1adf7f not present in the dump file. Type ".hh dbgerr004" for details
Probably caused by : ntkrnlmp.exe ( nt! ?? ::NNGAKEGL::`string'+9dba )

Followup: MachineOwner
---------

kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

REGISTRY_ERROR (51)
Something has gone badly wrong with the registry.  If a kernel debugger
is available, get a stack trace. It can also indicate that the registry got
an I/O error while trying to read one of its files, so it can be caused by
hardware problems or filesystem corruption.
It may occur due to a failure in a refresh operation, which is used only
in by the security system, and then only when resource limits are encountered.
Arguments:
Arg1: 0000000000000001, (reserved)
Arg2: fffff8a000023010, (reserved)
Arg3: 0000000000783000, depends on where Windows bugchecked, may be pointer to hive
Arg4: 0000000000000374, depends on where Windows bugchecked, may be return code of
	HvCheckHive if the hive is corrupt.

Debugging Details:
------------------

Page 1adf7f not present in the dump file. Type ".hh dbgerr004" for details

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

BUGCHECK_STR:  0x51

PROCESS_NAME:  services.exe

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from fffff8000180eda8 to fffff800014ddb80

STACK_TEXT:  
fffff880`030f4438 fffff800`0180eda8 : 00000000`00000051 00000000`00000001 fffff8a0`00023010 00000000`00783000 : nt!KeBugCheckEx
fffff880`030f4440 fffff800`01779115 : 00000000`00060d95 00000000`00003c17 00000000`00003787 fffff8a0`00000004 : nt! ?? ::NNGAKEGL::`string'+0x9dba
fffff880`030f44a0 fffff800`01778eec : fffff8a0`00023010 fffff8a0`00023010 fffff8a0`00033a00 00000000`004d0074 : nt!HvMarkDirty+0x176
fffff880`030f4500 fffff800`0173a29b : 00000000`00000001 fffff8a0`00897230 fffff8a0`006ae444 fffff8a0`00023010 : nt!HvMarkCellDirty+0x150
fffff880`030f4550 fffff800`0173a074 : fffff8a0`006ae444 00000000`ffffffff fffff8a0`006ae444 fffff8a0`00023010 : nt!CmpMarkKeyValuesDirty+0x14b
fffff880`030f45f0 fffff800`0173977a : fffff8a0`00023010 00000000`ffffffff fffff8a0`006ae444 fffff8a0`00023010 : nt!CmpFreeKeyValues+0x24
fffff880`030f4620 fffff800`017394a8 : fffff8a0`00023010 00000000`003548a0 fffff8a0`006ae444 fffff8a0`006c1440 : nt!CmpSyncKeyValues+0x7a
fffff880`030f4700 fffff800`0173b76e : fffff8a0`09982000 00000000`0039f168 fffffa80`00000000 00000000`00000000 : nt!CmpCopySyncTree2+0x2a8
fffff880`030f47b0 fffff800`0173b687 : 00000000`00000000 00000000`00000002 fffff8a0`098a9710 fffff8a0`098fa1b0 : nt!CmpCopySyncTree+0x6e
fffff880`030f4800 fffff800`0173b256 : 00000000`00000000 00000000`00000000 00000000`00000001 00000000`00000000 : nt!CmpSaveBootControlSet+0x307
fffff880`030f49e0 fffff800`014dce13 : fffffa80`06414b50 00000000`00000000 fffff880`030f4ab0 00000000`00000001 : nt!NtInitializeRegistry+0xc6
fffff880`030f4a30 fffff800`014d93d0 : fffff800`0173b1ff 00000000`00000220 00000000`000ef848 00000000`000efb78 : nt!KiSystemServiceCopyEnd+0x13
fffff880`030f4bc8 fffff800`0173b1ff : 00000000`00000220 00000000`000ef848 00000000`000efb78 00000000`000a001f : nt!KiServiceLinkage
fffff880`030f4bd0 fffff800`014dce13 : fffffa80`06414b50 fffff880`030f4ca0 fffff880`030f4ca0 00000000`00000002 : nt!NtInitializeRegistry+0x6f
fffff880`030f4c20 00000000`772d205a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`000efaf8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x772d205a


STACK_COMMAND:  kb

FOLLOWUP_IP: 
nt! ?? ::NNGAKEGL::`string'+9dba
fffff800`0180eda8 cc              int     3

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  nt! ?? ::NNGAKEGL::`string'+9dba

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  51db806a

FAILURE_BUCKET_ID:  X64_0x51_nt!_??_::NNGAKEGL::_string_+9dba

BUCKET_ID:  X64_0x51_nt!_??_::NNGAKEGL::_string_+9dba

Followup: MachineOwner
---------

Comment 13 Min Deng 2014-09-02 05:42:34 UTC
Verified the bug on build 90.
The job could pass on win2008R2 for balloon and rng driver.
build info,
virtio-win-prewhl-0.1-90
kernel-3.10.0-145.el7.x86_64
qemu-kvm-rhev-2.1.0-2.el7.x86_64

CLI,
/usr/libexec/qemu-kvm -M pc -m 6G -smp 4 -cpu Nehalem,+x2apic,hv_spinlocks=0x1fff,hv_relaxed,hv_vapic,hv_time -usb -device usb-tablet -drive file=090BLN2008R2NVC,format=raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,mac=00:42:36:28:34:22,id=net0 -uuid 78460cce-a0ba-427c-a23f-1b04f3ec0c05 -rtc-td-hack -no-kvm-pit-reinjection -chardev socket,id=a,path=/tmp/monitor-win2k8R2-serial,server,nowait -mon chardev=a,mode=readline -name win2k8-R2-balloon -device virtio-balloon-pci,id=balloon0 -vnc :1 -vga cirrus -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -monitor stdio
/usr/libexec/qemu-kvm -M pc -m 6G -smp 4 -cpu Nehalem,+x2apic,hv_spinlocks=0x1fff,hv_relaxed,hv_vapic,hv_time -usb -device usb-tablet -drive file=090RNG2008R221A,format=raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,mac=00:42:36:28:34:22,id=net0 -uuid 1cedf92f-cdd9-4b01-8487-35fb58dcc82e -rtc-td-hack -no-kvm-pit-reinjection -chardev socket,id=a,path=/tmp/monitor-win2k8R2-serial,server,nowait -mon chardev=a,mode=readline -name win2k8-R2-rng -vnc :2 -vga cirrus -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -monitor stdio -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0

Thanks
Min

Comment 14 Mike Cao 2014-09-02 05:46:14 UTC
Move Status to Verified according to comment #13

Comment 16 errata-xmlrpc 2015-03-05 08:05:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0349.html