Bug 885952

Summary: [whql][scsi][wlk][9F]BSOD occurs when running Sleep Stress With IO job
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: virtio-winAssignee: Vadim Rozenfeld <vrozenfe>
Status: CLOSED NEXTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: acathrow, bcao, bsarathy, ddumas, jguo, juzhang, lijin, michen, pbonzini, rhod, virt-bugs
Target Milestone: rcKeywords: Reopened, TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-04 08:44:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 896495    
Attachments:
Description Flags
screendump none

Description Mike Cao 2012-12-11 06:44:32 UTC
Description of problem:


Version-Release number of selected component (if applicable):
2.6.32-338.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.331.el6.x86_64
windows 7 32 bit , seems all win7 and win2k8 guests invloved ,did not reproduce on win2k3

How reproducible:
2/2

Steps to Reproduce:
1.Start VM with virtio-scsi
/usr/libexec/qemu-kvm -m 2G -smp 2 -cpu cpu64-rhel6,+x2apic,family=0xf -usb -device usb-tablet -device virtio-scsi-pci,id=scsi0 -drive file=win7-32-SCSI-49.raw,format=raw,if=none,serial=mike1,id=drive-virtio0,cache=none,werror=stop,rerror=stop -device scsi-hd,bus=scsi0.0,serial=devicemike1,drive=drive-virtio0,id=virtio-blk-pci0,bootindex=1 -netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet0,mac=00:52:20:17:21:c6 -uuid 7e6d2c23-f899-4176-a121-f72fb87d65e0 -no-kvm-pit-reinjection -chardev socket,id=111a,path=/tmp/monitor-win7-64-scsi,server,nowait -mon chardev=111a,mode=readline -vnc :1 -rtc base=localtime,clock=host,driftfix=slew -chardev socket,id=seabios_debug,path=/tmp/monitor-seabios,server,nowait -device isa-debugcon,iobase=0x402,chardev=seabios_debug -device virtio-scsi-pci,id=scsi1 -drive file=disk1.raw,if=none,serial=mike2,format=raw,cache=none,id=drive-virtio-data1,werror=stop,rerror=stop -device scsi-hd,serial=devicemike2,bus=scsi1.0,id=virtio-data1,drive=drive-virtio-data1 -device virtio-scsi-pci,id=scsi2 -drive file=disk2.raw,if=none,serial=mike3,cache=none,format=raw,id=drive-virtio-data2,werror=stop,rerror=stop -device scsi-hd,serial=devicemike3,bus=scsi2.0,id=virtio-data2,drive=drive-virtio-data2 -device virtio-scsi-pci,id=scsi3 -drive file=disk3.raw,cache=none,if=none,serial=mike4,format=raw,id=drive-virtio-data3,werror=stop,rerror=stop -device scsi-hd,serial=devicemike4,bus=scsi3.0,id=virtio-data3,drive=drive-virtio-data3 -device virtio-scsi-pci,id=scsi4 -drive file=disk4.raw,if=none,serial=mike5,format=raw,cache=none,id=drive-virtio-data4,werror=stop,rerror=stop -device scsi-hd,serial=devicemike5,bus=scsi4.0,id=virtio-data4,drive=drive-virtio-data4 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -M rhel6.4.0
2.Running Sleep Stress with IO job

  
Actual results:
Guset BSOD


Expected results:
no BOSD occurs


Additional info:
It is a testblocker for WHQL tset

Comment 1 Mike Cao 2012-12-11 06:52:06 UTC
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time (usually 10 minutes).
Arguments:
Arg1: 00000003, A device object has been blocking an Irp for too long a time
Arg2: 871f72e8, Physical Device Object of the stack
Arg3: 82765ae0, nt!TRIAGE_9F_POWER on Win7, otherwise the Functional Device Object of the stack
Arg4: 8c2a2f00, The blocked IRP

Debugging Details:
------------------


DRVPOWERSTATE_SUBCODE:  3

IMAGE_NAME:  vioscsi.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  50b73b32

MODULE_NAME: vioscsi

FAULTING_MODULE: 849f4000 vioscsi

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

BUGCHECK_STR:  0x9F

PROCESS_NAME:  System

CURRENT_IRQL:  2

STACK_TEXT:  
82765a94 82704637 0000009f 00000003 871f72e8 nt!KeBugCheckEx+0x1e
82765b00 827046b0 82765ba0 00000000 82772380 nt!PopCheckIrpWatchdog+0x1f5
82765b38 826b6799 827806e0 00000000 d182bbda nt!PopCheckForIdleness+0x73
82765b7c 826b673d 82768d20 82765ca8 00000001 nt!KiProcessTimerDpcTable+0x50
82765c68 826b65fa 82768d20 82765ca8 00000000 nt!KiProcessExpiredTimerList+0x101
82765cdc 826b478e 000101bf 845bfb50 82772380 nt!KiTimerExpiration+0x25c
82765d20 826b45b8 00000000 0000000e 00000000 nt!KiRetireDpcList+0xcb
82765d24 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x38


STACK_COMMAND:  kb

FOLLOWUP_NAME:  MachineOwner

FAILURE_BUCKET_ID:  0x9F_VRF_3_disk_IMAGE_vioscsi.sys

BUCKET_ID:  0x9F_VRF_3_disk_IMAGE_vioscsi.sys

Followup: MachineOwner
---------

Comment 3 Ronen Hod 2012-12-11 10:19:47 UTC
We suggest the same as https://bugzilla.redhat.com/show_bug.cgi?id=876033#c5

Comment 4 Mike Cao 2012-12-11 11:03:17 UTC
(In reply to comment #3)
> We suggest the same as https://bugzilla.redhat.com/show_bug.cgi?id=876033#c5

should be virtio-scsi bug introduced recently . win7 32/64bit win2k8 32/64/R2
guest all affected ,We never pass it in this round 

Need to mention that Common Scenario Stress with IO job also failed w/ BSOD

Mike

Comment 9 Paolo Bonzini 2012-12-19 15:28:00 UTC
Can you try it 3/4 times more?  If it fails under Windows only (not Linux), could it be a bug in the generic virtio implementation?

Comment 10 Mike Cao 2012-12-20 03:10:44 UTC
(In reply to comment #5)
> There is a brief summary for most (if not all) 9F BSOD and system stalls
> happening during viostor and vioscsi drivers WHQL testing on Win7/W2K8R2
> systems:
> 
fsFsdWrite and second is trying to perform NtfsCopyWrite operation.
> 
> I believe that if we want to narrow down the problem, we should try
> reproducing this problem on different setups, like diffent cache options -
> none vs. writeback, different disk image formats - raw vs. qcow2, and in
> case of
> virtio-scsi, try testing on a real drive vs. qemu.

Vadim ,I tried cache=writeback

the job stuck at Start B(2).5 .Referring to the screendump .Any idea about why it happen ?

Comment 11 Mike Cao 2012-12-20 03:12:54 UTC
Created attachment 666497 [details]
screendump

Comment 15 Mike Cao 2013-01-08 07:47:42 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > Hi Mike,
> > > Could you try reproducing the problem on qcow2 volume, just to get the full
> > > picture?
> > > 
> > > Thank you,
> > > Vadim.
> > 
> > Vadim ,With cache=none or cache=writeback ?
> 
> If it 's possible - with both. If not, then only writeback.
> Thank you,
> Vadim.

Hi, Vadim

Tried qcow2 image both with cache=none & cache=writethougth
BSOD during job running

Comment 16 Vadim Rozenfeld 2013-01-08 08:26:14 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > (In reply to comment #12)
> > > > Hi Mike,
> > > > Could you try reproducing the problem on qcow2 volume, just to get the full
> > > > picture?
> > > > 
> > > > Thank you,
> > > > Vadim.
> > > 
> > > Vadim ,With cache=none or cache=writeback ?
> > 
> > If it 's possible - with both. If not, then only writeback.
> > Thank you,
> > Vadim.
> 
> Hi, Vadim
> 
> Tried qcow2 image both with cache=none & cache=writethougth
> BSOD during job running

Thank you,
Vadim.

Comment 22 Ronen Hod 2013-07-29 09:34:51 UTC
QE,
Please re-test with the latest SeaBIOS.
Anyhow postponed to 6.6

Comment 23 lijin 2013-08-02 05:11:58 UTC
QE tested this issue on rhel6.5 host with seabios-28

Package version:
    * kernel-2.6.32-393.el6.x86_64    
    * qemu-img-rhev-0.12.1.2-2.377.el6.x86_64
    * virtio-win-prewhql-0.1-65
    * seabios-0.6.1.2-28.el6.x86_64
    * vgabios-0.6b-3.7.el6.noarch
    * sgabios-0-0.3.20110621svn.el6

Steps as comment #0

Actual result:
 Sleep Stress With IO job could pass without any error.

Based on the above, this issue has been fixed already.