Bug 846519

Summary: [virtio-win][scsi]Guest BSOD (9F) during s3/s4 while guest running crystal benchmark
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: seabiosAssignee: Paolo Bonzini <pbonzini>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.4CC: acathrow, bcao, bsarathy, ghammer, lnovich, michen, minovotn, pbonzini, qzhang, rhod, virt-maint, zhzhang
Target Milestone: rcKeywords: Reopened, TestOnly
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: seabios-0.6.1.2-27.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 21:14:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 846912, 969808, 969809    
Bug Blocks: 761491, 896495, 912287    

Description Mike Cao 2012-08-08 03:00:18 UTC
Description of problem:
not sure how to reproduce it .I just running iozone and after tomorrow morning I come back to work ,the guests BSOD.

Version-Release number of selected component (if applicable):
# uname -r
2.6.32-294.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.302.el6.x86_64

How reproducible:
only 1 time

Steps to Reproduce:
1.Start VM w/ virtio-scsi
eg:/usr/libexec/qemu-kvm -boot dc -m 4G -smp 2 -cpu Westmere -usb -device usb-tablet -netdev tap,sndbuf=0,id=hostnet2,script=/etc/qemdownscript=no -device e1000,netdev=hostnet2,mac=00:52:13:20:F5:22,bus=pci.0,addr=0x6 -uuid 7976cd92-6557-493d-86a3-7e2055a2d4cd -no-kvm-pit-reinjection -monitor stdio -rtc base=localtime,clock=host,driftfix=slew -device virtio-scsi-pci,id=bus1 -drive file=/home/win2k8-64.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,cache=writeback,aio=native,id=scsi-disk0 -device scsi-disk,drive=scsi-disk0,id=disk,bus=bus1.0,serial=miketest -vnc :3 -vga cirrus  -fda /home/virtio-win.vfd -bios /usr/share/seabios/bios-pm.bin -drive file=/hotadd.qcow2,if=none,werror=stop,readonly=on,cache=none,werror=stop,id=drive-hotadd -device virtio-scsi-pci,id=scsi-hotadd -device scsi-hd,drive=drive-hotadd,id=hotadd,bus=scsi-hotadd.0  -drive file=/home/hotadd2.qcow2,if=none,werror=stop,cache=none,werror=stop,id=drive-hotadd2,readonly=on -device virtio-scsi-pci,id=scsi-hotadd2 -device scsi-hd,drive=drive-hotadd2,id=hotadd2,bus=scsi-hotadd2.0
2.running iozone 
eg: iozone -az -b C:\aaaa -g 4g -y 32k -i 0 -i 1

  
Actual results:
Guest BSOD

Expected results:
guest does not bsod

Additional info:
not sure how to reproduce it ,report it in case of missing bug.

Comment 1 Mike Cao 2012-08-08 03:01:33 UTC
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time (usually 10 minutes).
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: fffffa80039bc060, Physical Device Object of the stack
Arg3: fffffa8003b75060, nt!TRIAGE_9F_POWER on Win7, otherwise the Functional Device Object of the stack
Arg4: fffffa8004ba7440, The blocked IRP

Debugging Details:
------------------


DRVPOWERSTATE_SUBCODE:  3

IRP_ADDRESS:  fffffa8004ba7440

DEVICE_OBJECT: fffffa80039bc060

DRIVER_OBJECT: fffffa80039624d0

IMAGE_NAME:  vioscsi.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  501d41f6

MODULE_NAME: vioscsi

FAULTING_MODULE: fffffa6000ccb000 vioscsi

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0x9F

PROCESS_NAME:  System

CURRENT_IRQL:  2

LAST_CONTROL_TRANSFER:  from fffff800016c876d to fffff8000166d450

STACK_TEXT:  
fffff800`026ec9f8 fffff800`016c876d : 00000000`0000009f 00000000`00000003 fffffa80`039bc060 fffffa80`03b75060 : nt!KeBugCheckEx
fffff800`026eca00 fffff800`016710dd : fffff800`026ecad8 00000000`00000000 00000000`00000001 fffff800`0178a000 : nt! ?? ::FNODOBFM::`string'+0x17cec
fffff800`026eca70 fffff800`016708d5 : fffff800`026eccd0 fffffa60`00fec702 fffff800`026eccc8 00000000`00000001 : nt!KiTimerListExpire+0x30d
fffff800`026ecca0 fffff800`0167172f : 000001bd`ab028e9d 00000000`00000000 00000000`00000001 fffff800`01789a80 : nt!KiTimerExpiration+0x295
fffff800`026ecd10 fffff800`016718e2 : fffff800`01786680 fffff800`01786680 00000000`00000000 fffff800`0178bb80 : nt!KiRetireDpcList+0x1df
fffff800`026ecd80 fffff800`0183e860 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x62
fffff800`026ecdb0 00000000`fffff800 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!zzz_AsmCodeRange_End+0x4
fffff800`026e60b0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00680000`00000000 : 0xfffff800


STACK_COMMAND:  kb

FOLLOWUP_NAME:  MachineOwner

FAILURE_BUCKET_ID:  X64_0x9F_disk.sys_CNVIRP_IMAGE_vioscsi.sys

BUCKET_ID:  X64_0x9F_disk.sys_CNVIRP_IMAGE_vioscsi.sys

Followup: MachineOwner
---------

Comment 4 Mike Cao 2012-08-08 08:48:57 UTC
Reproduced this issue w/ following steps:

1.Start VM:
/usr/libexec/qemu-kvm -boot dc -m 4G -smp 2 -cpu Westmere -usb -device usb-tablet -netdev tap,sndbuf=0,id=hostnet2,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet2,mac=00:52:13:20:F5:22,bus=pci.0,addr=0x6 -uuid 7976cd92-6557-493d-86a3-7e2055a2d4cd -no-kvm-pit-reinjection -monitor stdio -rtc base=localtime,clock=host,driftfix=slew -device virtio-scsi-pci,id=bus1 -drive file=/home/win2k8-64.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,cache=writethrough,aio=native,id=scsi-disk0 -device scsi-disk,drive=scsi-disk0,id=disk,bus=bus1.0,serial=miketest -spice port=5910,disable-ticketing -vga qxl  -fda /home/virtio-win.vfd -bios /usr/share/seabios/bios-pm.bin -drive file=/home/hotadd.qcow2,if=none,werror=stop,readonly=on,cache=none,werror=stop,id=drive-hotadd -device virtio-scsi-pci,id=scsi-hotadd -device scsi-hd,drive=drive-hotadd,id=hotadd,bus=scsi-hotadd.0  -drive file=/home/hotadd2.qcow2,if=none,werror=stop,cache=none,werror=stop,id=drive-hotadd2,readonly=on -device virtio-scsi-pci,id=scsi-hotadd2 -device scsi-hd,drive=drive-hotadd2,id=hotadd2,bus=scsi-hotadd2.0
2.run crystal benchmark in the guest
3.during "write" stage ,s3 guest 

Actual results:
Guest BSOD

Comment 5 Mike Cao 2012-08-08 10:16:33 UTC
FYI

1. cache=none can not reproduce this issue.
2. I think in comment #0. after crystal benchmark finished testing .then guest come into s3 status automatically since the guest is ideal ,during guest s3 ,guest BSOD

Comment 8 dawu 2013-03-11 05:38:42 UTC
Can reproduce with the latest seabios of seabios-0.6.1.2-26.el6.x86_64,and verified with the fixed seabios from https://bugzilla.redhat.com/show_bug.cgi?id=912561,does not hit this issue, following is the details:

Steps:
1. Start guest with CLI:
/usr/libexec/qemu-kvm -m 2G -smp 2 -cpu Penryn,+x2apic,family=0xf -usb -device usb-tablet -drive file=win2k8-64-scsi-49.qcow2,format=qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=writethrough,werror=stop,aio=native -device virtio-scsi-pci,id=bus0 -device scsi-hd,bus=bus0.0,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,mac=00:10:16:23:25:16,bus=pci.0,addr=0x4 -uuid dbde9e26-140c-4efb-8d58-4b1ca0251cdb -rtc base=localtime -no-kvm-pit-reinjection -monitor stdio -name win2k8-64-scsi -spice disable-ticketing,port=5931 -vga qxl -qmp tcp:0:4445,server,nowait -device virtio-scsi-pci,bus=pci.0,id=scsi0 -drive file=disk1.qcow2,format=qcow2,if=none,media=disk,aio=native,werror=stop,rerror=stop,id=scsi0 -device scsi-hd,bus=scsi0.0,drive=scsi0,id=scsi1 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -bios /home/bios.bin\

2. run crystal benchmark.

3. guest come into S3 status automatically during running crystal benchmark till to finish.

Actually results:
With version seabios-0.6.1.2-26.el6.x86_64, guest hit BSOD with 9F error code.
With the fixed seabios from https://bugzilla.redhat.com/show_bug.cgi?id=912561,guest works well without any error.

Thanks 
Best Regards,
Dawn

Comment 9 Vadim Rozenfeld 2013-03-11 08:02:13 UTC
(In reply to comment #8)
> Can reproduce with the latest seabios of seabios-0.6.1.2-26.el6.x86_64,and
> verified with the fixed seabios from
> https://bugzilla.redhat.com/show_bug.cgi?id=912561,does not hit this issue,
> following is the details:
> 
> Steps:
> 1. Start guest with CLI:
> /usr/libexec/qemu-kvm -m 2G -smp 2 -cpu Penryn,+x2apic,family=0xf -usb
> -device usb-tablet -drive
> file=win2k8-64-scsi-49.qcow2,format=qcow2,index=0,if=none,id=drive-virtio-
> disk1,media=disk,cache=writethrough,werror=stop,aio=native -device
> virtio-scsi-pci,id=bus0 -device
> scsi-hd,bus=bus0.0,drive=drive-virtio-disk1,id=virtio-disk1 -netdev
> tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device
> e1000,netdev=hostnet0,mac=00:10:16:23:25:16,bus=pci.0,addr=0x4 -uuid
> dbde9e26-140c-4efb-8d58-4b1ca0251cdb -rtc base=localtime
> -no-kvm-pit-reinjection -monitor stdio -name win2k8-64-scsi -spice
> disable-ticketing,port=5931 -vga qxl -qmp tcp:0:4445,server,nowait -device
> virtio-scsi-pci,bus=pci.0,id=scsi0 -drive
> file=disk1.qcow2,format=qcow2,if=none,media=disk,aio=native,werror=stop,
> rerror=stop,id=scsi0 -device scsi-hd,bus=scsi0.0,drive=scsi0,id=scsi1
> -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -bios
> /home/bios.bin\
> 
> 2. run crystal benchmark.
> 
> 3. guest come into S3 status automatically during running crystal benchmark
> till to finish.
> 
> Actually results:
> With version seabios-0.6.1.2-26.el6.x86_64, guest hit BSOD with 9F error
> code.
> With the fixed seabios from
> https://bugzilla.redhat.com/show_bug.cgi?id=912561,guest works well without
> any error.

I can confirm that the problem is not reproducible with the seabios compiled from upstream sources as well.
Btw:
1. No need in running crystal benchmark to reproduce this issue. Just run a guest equipped with two virtio-scsi drives, hibernate the system, resume and try to hibernate one more time -> BSOD
2. The problem is also reproducible with virtio-blk drives, when running in IRQ
mode (family != 0xf).

Best regards,
Vadim. 

> 
> Thanks 
> Best Regards,
> Dawn

Comment 10 Mike Cao 2013-03-12 07:07:33 UTC

*** This bug has been marked as a duplicate of bug 912561 ***

Comment 11 Paolo Bonzini 2013-03-12 12:11:54 UTC
*** Bug 846533 has been marked as a duplicate of this bug. ***

Comment 12 Paolo Bonzini 2013-03-12 12:13:12 UTC
Not a duplicate!!!

In fact, the BIOS update fixes this bug but not the alleged dup.

Comment 13 Gal Hammer 2013-03-14 15:47:38 UTC
(In reply to comment #12)
> Not a duplicate!!!
> 
> In fact, the BIOS update fixes this bug but not the alleged dup.

I don't understand. If the BIOS update fixed this bug then why isn't it a duplicate?

Comment 14 Paolo Bonzini 2013-03-15 09:26:43 UTC
Nevermind, QE reported that the BIOS didn't fix bug 912561, but actually it does.  Still, if possible, testing both scenarios would be better.

Comment 15 Mike Cao 2013-03-15 09:36:07 UTC
(In reply to comment #14)
> Nevermind, QE reported that the BIOS didn't fix bug 912561, but actually it
> does.  Still, if possible, testing both scenarios would be better.

Actually it fixed .QE gave the wrong conclusion due to using wrong version seabios.

Comment 23 zhonglinzhang 2013-09-06 03:40:52 UTC
Reproduce this issue with seabios-0.6.1.2-26.el6.x86_64

Steps to Reproduce:
1. Boot a win2008 guest:
/usr/libexec/qemu-kvm -M pc -enable-kvm -m 4G -smp 4,sockets=1,cores=4,threadse test -rtc base=localtime,clock=host,driftfix=slew -k en-us -boot menu=on -spice disable-ticketing,port=5930 -vga qxl -monitor stdio -device virtio-scsi-pci,id=bus1 -drive file=/home/win2008-64-virtio.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,cache=writeback,aio=native,id=scsi-disk0 -device scsi-disk,drive=scsi-disk0,id=disk,bus=bus1.0,serial=miketest -drive file=/home/hotadd2.qcow2,if=none,werror=stop,cache=none,werror=stop,id=drive-hotadd2,readonly=on -device virtio-scsi-pci,id=scsi-hotadd2 -device scsi-hd,drive=drive-hotadd2,id=hotadd2,bus=scsi-hotadd2.0  -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0  -vnc :1 -usb -device usb-tablet

2. 
First:
Hibernate the system, resume and try to hibernate one more time

Second:
Sleep the system, resume and try to sleep one more time

Actual Results:
After a long time guest hang, both first and second show BSOD (9F) in guest.



Verify it with seabios-0.6.1.2-28.el6.x86_64
steps as above

Actual Results:
No BSOD after several times for hibernate and sleep

Comment 24 zhonglinzhang 2013-09-22 09:35:35 UTC
0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_POWER_STATE_FAILURE (9f)
A driver is causing an inconsistent power state.
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: fffffa8003a4a910, Physical Device Object of the stack
Arg3: fffffa8003d31630, Functional Device Object of the stack
Arg4: fffffa8004919a30, The blocked IRP

Debugging Details:
------------------


DRVPOWERSTATE_SUBCODE:  3

IRP_ADDRESS:  fffffa8004919a30

DEVICE_OBJECT: fffffa8003a4a910

DRIVER_OBJECT: fffffa8003a45920

IMAGE_NAME:  vioscsi.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  50b73b33

MODULE_NAME: vioscsi

FAULTING_MODULE: fffffa6000d2e000 vioscsi

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0x9F

PROCESS_NAME:  System

CURRENT_IRQL:  2

LAST_CONTROL_TRANSFER:  from fffff800016d176d to fffff80001676450

STACK_TEXT:  
fffff800`026f59f8 fffff800`016d176d : 00000000`0000009f 00000000`00000003 fffffa80`03a4a910 fffffa80`03d31630 : nt!KeBugCheckEx
fffff800`026f5a00 fffff800`0167a0dd : fffff800`026f5ad8 00000000`00000000 00000000`00000002 fffffa60`00fd7b00 : nt! ?? ::FNODOBFM::`string'+0x17cec
fffff800`026f5a70 fffff800`016798d5 : fffff800`026f5cd0 fffffa60`02d18702 fffff800`026f5cc8 00000000`00000001 : nt!KiTimerListExpire+0x30d
fffff800`026f5ca0 fffff800`0167a72f : 00000203`73ab4b80 00000000`00000000 fffff800`00000001 fffff800`01792a80 : nt!KiTimerExpiration+0x295
fffff800`026f5d10 fffff800`0167a8e2 : fffff800`0178f680 fffff800`0178f680 00000000`00000000 fffff800`01794b80 : nt!KiRetireDpcList+0x1df
fffff800`026f5d80 fffff800`01847860 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x62
fffff800`026f5db0 00000000`fffff800 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!zzz_AsmCodeRange_End+0x4
fffff800`026ef0b0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00680000`00000000 : 0xfffff800


STACK_COMMAND:  kb

FOLLOWUP_NAME:  MachineOwner

FAILURE_BUCKET_ID:  X64_0x9F_disk.sys_CNVIRP_IMAGE_vioscsi.sys

BUCKET_ID:  X64_0x9F_disk.sys_CNVIRP_IMAGE_vioscsi.sys

Comment 25 Qunfang Zhang 2013-10-12 03:25:33 UTC
Setting to VERIFIED according to comment 23 and comment 24. Comment 24 is the detail debug analysis log when reproduced in the unfixed version. And in the latest seabios, there's no problem.

Comment 26 errata-xmlrpc 2013-11-21 21:14:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1655.html