Bug 1037949

Summary: [virtio-win][viostor]guest bsod(9F) when do s4 while guest running iozone
Product: Red Hat Enterprise Linux 7 Reporter: lijin <lijin>
Component: virtio-winAssignee: Vadim Rozenfeld <vrozenfe>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.0CC: ghammer, hhuang, juzhang, knoel, lijin, michen, rbalakri, virt-bugs, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: virtio-win-prewhql-89 Doc Type: Bug Fix
Doc Text:
NO_DOCS
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-24 08:39:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description lijin 2013-12-04 06:24:36 UTC
Description of problem:
do S4 while running iozone in win8-64 guest,it will take a long time then guest bsod.

Version-Release number of selected component (if applicable):
    virtio-win-1.6.7-2.el7.noarch
    kernel-3.10.0-53.el7.x86_64
    qemu-kvm-rhev-1.5.3-19.el7.x86_64
    seabios-1.7.2.2-4.el7.x86_64

How reproducible:
2/10

Steps to Reproduce:
1.boot win8-64 guest with:
/usr/libexec/qemu-kvm \
-M pc -m 2G -smp 2,cores=2 \
-drive file=win8-64.qcow3,format=qcow2,media=disk,if=none,cache=none,id=drive-blk,serial=blk1 \
-device virtio-blk-pci,physical_block_size=512,logical_block_size=512,drive=drive-blk,id=ide-blk-pci1,bootindex=1 \
-rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection \
-name win8-64-nic \
-global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 \
-usb -device usb-tablet \
-monitor stdio \
-vnc :1 -vga cirrus \
-fda /usr/share/virtio-win/virtio-win_amd64.vfd \
-netdev tap,id=hostnet1,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet1,id=net1,mac=00:52:12:16:54:48,bus=pci.0 \
-cdrom /usr/share/virtio-win/virtio-win.iso \
-drive file=disk.raw,format=raw,media=disk,if=none,id=drive2,cache=none,serial=blk2 -device virtio-blk-pci,physical_block_size=4096,logical_block_size=512,drive=drive2,id=ide-blk-pci2 \
2.running iozone
  eg:iozone -az -b c:\aaaa -g 4g -y 32k -i 0 -i 1 
3.do s4 on guest

Actual results:
after long time waiting,guest bsod with 9F code.

Expected results:
guest can s4 correctly,no bsod.

Additional info:
This issue happened only twice,cannot reproduce it after that.
I will upload the dump analyse later.

Comment 1 lijin 2013-12-04 06:29:13 UTC
the first dump windbg analyse:
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time (usually 10 minutes).
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: fffffa80024a3060, Physical Device Object of the stack
Arg3: fffff801ecb5eb30, nt!TRIAGE_9F_POWER on Win7, otherwise the Functional Device Object of the stack
Arg4: fffffa8003bbbbc0, The blocked IRP

Debugging Details:
------------------


DRVPOWERSTATE_SUBCODE:  3

IMAGE_NAME:  pci.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  5010ab1f

MODULE_NAME: pci

FAULTING_MODULE: fffff88001368000 pci

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  0x9F

PROCESS_NAME:  System

CURRENT_IRQL:  2

TAG_NOT_DEFINED_c000000f:  FFFFF801ECB5EFB0

STACK_TEXT:
fffff801`ecb5eaf8 fffff801`ecda941e : 00000000`0000009f 00000000`00000003 fffffa80`024a3060 fffff801`ecb5eb30 : nt!KeBugCheckEx
fffff801`ecb5eb00 fffff801`ecda9451 : fffffa80`02b76860 fffff801`eceb0c20 00000000`00000000 00000000`000000f3 : nt!PopIrpWatchdogBugcheck+0xe2
fffff801`ecb5eb60 fffff801`eccc08b4 : fffffa80`02b76898 00000000`00000000 fffff801`ecb5ee68 00000000`00000000 : nt!PopIrpWatchdog+0x32
fffff801`ecb5ebb0 fffff801`eccc0ed5 : fffff801`ecf11f00 fffff801`ecc8a4f8 fffff801`ecb5ee00 00000000`000000f3 : nt!KiProcessExpiredTimerList+0x214
fffff801`ecb5ecf0 fffff801`eccc0d88 : fffff801`ecf0f180 fffff801`ecf11f80 00000000`00000007 00000000`00017af3 : nt!KiExpireTimerTable+0xa9
fffff801`ecb5ed90 fffff801`eccbae76 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiTimerExpiration+0xc8
fffff801`ecb5ee40 fffff801`eccba745 : 00000000`00000000 fffff801`ecf0f180 fffff880`02f64a10 fffff801`ed3ad580 : nt!KiRetireDpcList+0x1f6
fffff801`ecb5efb0 fffff801`eccba549 : 0000056f`591493c8 fffff801`ecd554c1 00000000`01000010 00000000`00000282 : nt!KxRetireDpcList+0x5
fffff880`02f64950 fffff801`ecd554d5 : 00000000`017a3be9 fffff801`ecc8d523 00000005`37cf5e24 00000001`dc0060f6 : nt!KiDispatchInterruptContinue
fffff880`02f64980 fffff801`ecc8d523 : 00000005`37cf5e24 00000001`dc0060f6 fffff801`ed3ad580 fffff6fc`40019668 : nt!KiDpcInterruptBypass+0x25
fffff880`02f64990 fffff801`ed38cee5 : fffff801`ed37de51 fffff6fb`7e2000c8 fffff6fc`40019668 00000000`00000000 : nt!KiInterruptDispatchLBControl+0x243
fffff880`02f64b28 fffff801`ed37de51 : fffff6fb`7e2000c8 fffff6fc`40019668 00000000`00000000 fffff8a0`0028c140 : hal!HalpPmTimerQueryCounterIoPort+0x5
fffff880`02f64b30 fffff880`03287011 : fffffa80`01961098 00000000`00369e99 fffffa80`01961098 fffffa80`01961010 : hal!KeQueryPerformanceCounter+0x71
fffff880`02f64b60 fffff880`03286f8a : 00000000`00000000 00000000`00369e99 fffffa80`01961098 fffffa80`01961010 : dxgkrnl!BLTQUEUE::IssueCommand+0x21
fffff880`02f64ba0 fffff880`03286f09 : 00000000`00000000 00000000`00000011 00000000`00000000 00000000`00000002 : dxgkrnl!BLTQUEUE::Flush+0x6a
fffff880`02f64bd0 fffff880`03285b1a : fffffa80`035c4920 fffffa80`035c4920 00000000`00000011 00000000`00000000 : dxgkrnl!DXGDODPRESENT::Flush+0x25
fffff880`02f64c00 fffff880`032cd9d9 : fffffa80`035c48d0 fffff880`0c83ea00 00000000`00000000 00000000`00000000 : dxgkrnl!DXGADAPTER::AcquireCoreResourceExclusive+0xe2
fffff880`02f64c50 fffff880`0328b31c : 00000000`00000001 fffffa80`035c48d0 fffff880`032cd500 00000000`00000000 : dxgkrnl!DXGADAPTER::AcquireLocksForStop+0xf5
fffff880`02f64c80 fffff880`0328b23e : 00000000`00000006 fffff880`032cd500 fffff880`0c83ea02 fffff880`032cd500 : dxgkrnl!DXGADAPTER::AcquireCoreSync+0xbc
fffff880`02f64cc0 fffff880`032cd6dd : fffffa80`0c476190 fffffa80`00000000 fffff880`01a7b800 00000000`00000000 : dxgkrnl!DxgkAcquireAdapterCoreSync+0x22
fffff880`02f64cf0 fffff801`ecc3b521 : fffffa80`0c476a10 fffffa80`0c7fe540 00000000`00000080 fffffa80`0c476040 : dxgkrnl!DpiPowerArbiterThread+0x1a9
fffff880`02f64d50 fffff801`ecc79dd6 : fffff801`ecf0f180 fffffa80`0c7fe540 fffffa80`01892b00 fffffa80`018be740 : nt!PspSystemThreadStartup+0x59
fffff880`02f64da0 00000000`00000000 : fffff880`02f65000 fffff880`02f5f000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


STACK_COMMAND:  kb

FOLLOWUP_NAME:  MachineOwner

FAILURE_BUCKET_ID:  0x9F_3_QxlWddm_IMAGE_pci.sys

BUCKET_ID:  0x9F_3_QxlWddm_IMAGE_pci.sys

Followup: MachineOwner
---------

Comment 2 lijin 2013-12-04 06:31:12 UTC
the second dump windbg analyse:
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time (usually 10 minutes).
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: fffffa80024b1060, Physical Device Object of the stack
Arg3: fffff8039cc78920, nt!TRIAGE_9F_POWER on Win7, otherwise the Functional Device Object of the stack
Arg4: fffffa8003b8ce10, The blocked IRP

Debugging Details:
------------------


DRVPOWERSTATE_SUBCODE:  3

IMAGE_NAME:  pci.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  5010ab1f

MODULE_NAME: pci

FAULTING_MODULE: fffff880012be000 pci

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  0x9F

PROCESS_NAME:  System

CURRENT_IRQL:  2

TAG_NOT_DEFINED_c000000f:  FFFFF8039CC7EFB0

STACK_TEXT:
fffff803`9cc788e8 fffff803`9cf9e41e : 00000000`0000009f 00000000`00000003 fffffa80`024b1060 fffff803`9cc78920 : nt!KeBugCheckEx
fffff803`9cc788f0 fffff803`9cf9e451 : fffffa80`040b9230 fffff803`9cc78958 00000000`00000000 fffffa80`02dce060 : nt!PopIrpWatchdogBugcheck+0xe2
fffff803`9cc78950 fffff803`9ceb58b4 : fffffa80`040b9268 fffff803`00001000 fffff803`9cc78c58 00000000`00000f45 : nt!PopIrpWatchdog+0x32
fffff803`9cc789a0 fffff803`9ceb5ed5 : fffffa80`02d02010 fffff803`9ce7f4f8 fffff803`9cc78bf0 00000000`000000ac : nt!KiProcessExpiredTimerList+0x214
fffff803`9cc78ae0 fffff803`9ceb5d88 : fffff803`9d104180 fffff803`9d106f80 00000000`00000002 00000000`0000ce94 : nt!KiExpireTimerTable+0xa9
fffff803`9cc78b80 fffff803`9ceafe76 : fffffa80`02d02010 00000000`ffffffff fffffa80`02d022e8 00000000`00000000 : nt!KiTimerExpiration+0xc8
fffff803`9cc78c30 fffff803`9ceb457a : fffff803`9d104180 fffff803`9d104180 00000000`00183de0 fffff803`9d15e880 : nt!KiRetireDpcList+0x1f6
fffff803`9cc78da0 00000000`00000000 : fffff803`9cc79000 fffff803`9cc73000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x5a


STACK_COMMAND:  kb

FOLLOWFAILURE_BUCKET_ID:  0x9F_3_E1G6032E_IMAGE_pci.sys

BUCKET_ID:  0x9F_3_E1G6032E_IMAGE_pci.sys

Followup: MachineOwner

Comment 6 Vadim Rozenfeld 2014-07-04 10:25:14 UTC
(In reply to lijin from comment #3)
> the dump files located in
> \\smamit.eng.lab.tlv.redhat.com\win-team\Public\QE\bug1037949
> 
> It's seems this bug is similar to
> https://bugzilla.redhat.com/show_bug.cgi?id=869116,does the seabios patch in
> rhel6 merged into rhel7?

yes, there were two different driver failing - qxl and nic. I don't know if rehl6 bios changes were merged to rhel 7 or not, but we can try retesting this issue with the bios from rhel6 or upstream.

Thanks,
Vadim.

Comment 7 Ronen Hod 2014-08-07 08:06:34 UTC
QE, see comment 6

Comment 8 Mike Cao 2014-08-13 03:06:33 UTC
lijin ,pls retest following scenario with latest virtio-win-prewhql build 
1.retest it w/ latest seabios on RHEL7
2.retest it w/ upstream seabios.
git clone git://git.seabios.org/seabios.git seabios

Comment 9 lijin 2014-08-14 05:20:56 UTC
(In reply to Mike Cao from comment #8)
> lijin ,pls retest following scenario with latest virtio-win-prewhql build 
> 1.retest it w/ latest seabios on RHEL7
> 2.retest it w/ upstream seabios.
> git clone git://git.seabios.org/seabios.git seabios

retest with these scenarios with build 89 5 times,guest can do s4 and resume correctly.

package info:
qemu-kvm-rhev-1.5.3-60.el7ev_0.5.x86_64
kernel-3.10.0-133.el7.x86_64
virtio-win-prewhql-89

Comment 10 Vadim Rozenfeld 2014-11-03 08:22:53 UTC
Based on c#9, closing this bug as fixed.

Comment 11 Mike Cao 2014-11-03 09:42:59 UTC
build 89 or later build  is not shipped via RHN ,pls keep it open

Comment 12 Mike Cao 2014-11-03 09:43:44 UTC
Move to Verified according to Comment #9

Comment 16 errata-xmlrpc 2015-11-24 08:39:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2513.html