Bug 846208 - [virtio-win][scsi]Guest BSOD during resume from s4 after hot plug scsi disks
[virtio-win][scsi]Guest BSOD during resume from s4 after hot plug scsi disks
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: virtio-win (Show other bugs)
7.0
Unspecified Unspecified
high Severity high
: rc
: 7.0
Assigned To: Vadim Rozenfeld
Virtualization Bugs
: Reopened
Depends On:
Blocks: Virt-S3/S4-7.0 1105334
  Show dependency treegraph
 
Reported: 2012-08-07 02:56 EDT by Mike Cao
Modified: 2015-05-27 21:39 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-05-27 21:39:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mike Cao 2012-08-07 02:56:21 EDT
Description of problem:


Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.302.el6.x86_64
2.6.32-294.el6.x86_64
virtio-win-prewhql-32

How reproducible:
2/2

Steps to Reproduce:
1.Start VM:
CLI:/usr/libexec/qemu-kvm -boot dc -m 4G -smp 2 -cpu Westmere -usb -device usb-tablet -netdev tap,sndbuf=0,id=hostnet2,script=/etc/qemdownscript=no -device e1000,netdev=hostnet2,mac=00:52:13:20:F5:22,bus=pci.0,addr=0x6 -uuid 7976cd92-6557-493d-86a3-7e2055a2d4cd -no-kvm-pit-reinjection -monitor stdio -rtc base=localtime,clock=host,driftfix=slew -device virtio-scsi-pci,id=bus1 -drive file=/home/win2k8-64.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,cache=writeback,aio=native,id=scsi-disk0 -device scsi-disk,drive=scsi-disk0,id=disk,bus=bus1.0,serial=miketest -vnc :3 -vga cirrus  -fda /home/virtio-win.vfd -bios /usr/share/seabios/bios-pm.bin
2.hotplug scsi disks 
eg:(qemu) __com.redhat_drive_add file=/hotadd.qcow2,werror=stop,cache=none,rerror=stop,id=drive-hotadd
(qemu) device_add virtio-scsi-pci,id=scsi-hotadd
(qemu) device_add scsi-hd,drive=drive-hotadd,id=hotadd,bus=scsi-hotadd.0
(qemu) __com.redhat_drive_add  file=/home/hotadd2.qcow2,werror=stop,cache=none,rerror=stop,id=drive-hotadd2
(qemu) device_add virtio-scsi-pci,id=scsi-hotadd2
(qemu) device_add scsi-hd,drive=drive-hotadd2,id=hotadd2,bus=scsi-hotadd2.0
3.S4 Guest 
4.Resume Guest w/ the same CLI in the step 1
  
Actual results:
Guest BSOD

Expected results:
no BSOD occurs

Additional info:
Tried w/ resume from s4 after hotunplug ,does not hit this issue .
Comment 1 Mike Cao 2012-08-07 03:02:28 EDT
1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000008, memory referenced
Arg2: 0000000000000007, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffffa6000ccca30, address which referenced memory

Debugging Details:
------------------


READ_ADDRESS:  0000000000000008 

CURRENT_IRQL:  7

FAULTING_IP: 
vioscsi!SynchronizedSRBRoutine+50 [c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows\vioscsi\helper.c @ 43]
fffffa60`00ccca30 4c8b5108        mov     r10,qword ptr [rcx+8]

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0xD1

PROCESS_NAME:  System

TRAP_FRAME:  fffffa6002e65270 -- (.trap 0xfffffa6002e65270)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000002 rbx=0000000000000000 rcx=0000000000000000
rdx=fffffa8004c88d00 rsi=0000000000000000 rdi=0000000000000000
rip=fffffa6000ccca30 rsp=fffffa6002e65400 rbp=fffffa8004c88d00
 r8=fffffa8004c88d00  r9=0000000000000001 r10=0000000000000000
r11=fffffa8004286010 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
vioscsi!SynchronizedSRBRoutine+0x50:
fffffa60`00ccca30 4c8b5108        mov     r10,qword ptr [rcx+8] ds:00000000`00000008=????????????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff800016ba1ee to fffff800016ba450

STACK_TEXT:  
fffffa60`02e65128 fffff800`016ba1ee : 00000000`0000000a 00000000`00000008 00000000`00000007 00000000`00000000 : nt!KeBugCheckEx
fffffa60`02e65130 fffff800`016b90cb : 00000000`00000000 fffff800`016c773a 00000000`00000000 fffffa80`04b19018 : nt!KiBugCheckDispatch+0x6e
fffffa60`02e65270 fffffa60`00ccca30 : 00000000`00000004 fffffa80`03973360 fffffa80`03dfae30 fffffa60`02e65558 : nt!KiPageFault+0x20b
fffffa60`02e65400 fffffa60`00ce23bb : fffffa80`04b251b0 fffffa80`04b19018 fffffa80`04b19018 fffffa80`04c88d00 : vioscsi!SynchronizedSRBRoutine+0x50 [c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows\vioscsi\helper.c @ 43]
fffffa60`02e65450 fffffa60`00ccb7e6 : 00000000`00000001 fffffa80`04b19018 fffffa80`04c88d00 fffffa80`04a94d78 : storport!StorPortSynchronizeAccess+0x5b
fffffa60`02e65490 fffffa60`00ce6930 : fffffa80`04b251b0 fffffa60`0451c010 00000000`00000000 fffffa80`04286010 : vioscsi!VioScsiStartIo+0xce [c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows\vioscsi\vioscsi.c @ 347]
fffffa60`02e654d0 fffffa60`00cebe61 : fffffa80`04c88d00 fffffa80`04c88d00 00000000`00000000 fffffa80`04286010 : storport!RaidAdapterPostScatterGatherExecute+0x150
fffffa60`02e65530 fffffa60`00ce8402 : fffffa80`04c69680 fffffa80`04c69680 00000000`00000000 fffffa80`04286010 : storport!RaUnitStartIo+0xb1
fffffa60`02e65580 fffffa60`00ce8c2b : 00000000`00000004 fffffa60`00cf2101 fffffa80`04c88d00 00000000`00000000 : storport!RaidStartIoPacket+0xc2
fffffa60`02e655e0 fffffa60`00cec0f7 : fffffa80`04286010 fffffa80`04c88d00 fffffa80`04c69680 00000000`00000000 : storport!RaidUnitSubmitRequest+0xcb
fffffa60`02e65620 fffffa60`00cec9ea : fffffa80`04c69530 fffffa80`04c88790 fffffa60`00fd03e0 fffffa60`00cf2110 : storport!RaUnitScsiIrp+0x147
fffffa60`02e65660 fffffa60`00fcd62f : fffffa60`00fd03e0 fffffa80`04286128 fffffa80`04286010 00000000`00000000 : storport!RaDriverScsiIrp+0x7a
fffffa60`02e656a0 fffff800`016c35c2 : fffffa80`c0000016 fffffa60`00ce6052 fffffa80`04286010 fffff800`01804cc0 : CLASSPNP!ClasspPowerUpCompletion+0x2df
fffffa60`02e65720 fffffa60`00cdf7d9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!IopfCompleteRequest+0x302
fffffa60`02e657d0 fffffa60`00ce93fc : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000001 : storport!RaidCompleteRequestEx+0x19
fffffa60`02e65800 fffffa60`00cea370 : fffffa80`04286010 fffffa60`00cf2110 fffffa80`04b251b0 fffff800`01703c28 : storport!RaidUnitProcessSetDevicePowerIrp+0xec
fffffa60`02e65850 fffffa60`00cea482 : 00000000`00000001 fffffa80`04286010 fffffa60`00cf2110 fffffa60`00ce7036 : storport!RaidUnitSetDevicePowerIrp+0x100
fffffa60`02e65890 fffffa60`00cea5de : 00000000`00000000 fffffa80`04286010 00000000`00000002 fffffa80`04c69680 : storport!RaidUnitSetPowerIrp+0xf2
fffffa60`02e658e0 fffffa60`00cec90a : fffffa60`00cf2110 fffffa80`04c88790 fffffa80`04286010 fffffa80`04c69530 : storport!RaUnitPowerIrp+0xde
fffffa60`02e65930 fffff800`0176c4f2 : fffffa80`04286010 fffffa80`04286128 fffffa80`04286010 fffffa80`04c88cd0 : storport!RaDriverPowerIrp+0x7a
fffffa60`02e65970 fffffa60`00fcd9cd : 00000000`00000001 fffffa60`00b54492 00000000`00000001 fffffa60`00cad72c : nt!IopPoHandleIrp+0x32
fffffa60`02e659a0 fffff800`016c35c2 : fffffa80`c0000016 fffff800`0175a743 fffffa80`04286010 fffffa60`00cb188c : CLASSPNP!ClasspPowerUpCompletion+0x67d
fffffa60`02e65a20 fffffa60`00cdf7d9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!IopfCompleteRequest+0x302
fffffa60`02e65ad0 fffffa60`00cec1d3 : fffffa80`04c88790 fffffa80`04286010 00000000`00000000 00000000`00000000 : storport!RaidCompleteRequestEx+0x19
fffffa60`02e65b00 fffffa60`00cec9ea : fffffa80`04c69530 fffffa60`00fd3110 00000000`0000000e fffffa60`00cf2110 : storport!RaUnitScsiIrp+0x223
fffffa60`02e65b40 fffffa60`00fcdf17 : 00000000`0000000e fffffa80`04c88cd0 fffffa80`04286010 fffff800`0175ab8e : storport!RaDriverScsiIrp+0x7a
fffffa60`02e65b80 fffffa60`00fcb2f0 : fffffa80`04c88790 fffffa80`04c69530 fffffa80`04c888e0 fffffa80`042860e0 : CLASSPNP!ClasspPowerHandler+0x437
fffffa60`02e65bf0 fffff800`0176c4f2 : fffffa80`04286010 00000000`00000000 00000000`00000000 fffffa80`04286010 : CLASSPNP!ClassDispatchPower+0x80
fffffa60`02e65c20 fffffa60`00beb234 : 00000000`00000001 fffffa80`04286170 00000000`00000001 fffff800`0175ab8e : nt!IopPoHandleIrp+0x32
fffffa60`02e65c50 fffff800`0177c22a : 00000000`00000001 00000000`00000000 fffffa80`04c85040 fffffa60`02e65d00 : partmgr!PmPower+0xd4
fffffa60`02e65ca0 fffff800`018c4f37 : ffffffff`fa0a1f00 fffffa80`04ca2bb0 00000000`00000080 00000000`00000001 : nt!PopIrpWorker+0x3ca
fffffa60`02e65d50 fffff800`016f7616 : fffffa60`005ec180 fffffa80`04ca2bb0 fffffa80`04c8e040 fffffa60`005ecaa0 : nt!PspSystemThreadStartup+0x57
fffffa60`02e65d80 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16


STACK_COMMAND:  kb

FOLLOWUP_IP: 
vioscsi!SynchronizedSRBRoutine+50 [c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows\vioscsi\helper.c @ 43]
fffffa60`00ccca30 4c8b5108        mov     r10,qword ptr [rcx+8]

FAULTING_SOURCE_LINE:  c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows\vioscsi\helper.c

FAULTING_SOURCE_FILE:  c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows\vioscsi\helper.c

FAULTING_SOURCE_LINE_NUMBER:  43

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  vioscsi!SynchronizedSRBRoutine+50

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: vioscsi

IMAGE_NAME:  vioscsi.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  501d41f6

FAILURE_BUCKET_ID:  X64_0xD1_vioscsi!SynchronizedSRBRoutine+50

BUCKET_ID:  X64_0xD1_vioscsi!SynchronizedSRBRoutine+50

Followup: MachineOwner
---------
Comment 5 daiwei 2012-08-08 23:58:18 EDT
Hi Mike,

   Did you add the second disk in your command line when you resuming guest from S4 ?
Comment 6 Mike Cao 2012-08-09 00:56:12 EDT
(In reply to comment #5)
> Hi Mike,
> 
>    Did you add the second disk in your command line when you resuming guest
> from S4 ?

What you means is another bug :846202 .
Comment 8 Amit Shah 2013-02-21 06:07:43 EST
This doesn't look right: you hot-add disks, then perform S4, and resume guests w/o the hot-added disks.  This is never guaranteed to work.

Try starting qemu with command line from step 1 plus all the hot-added disks and drives, if there still is a problem, it'll be a valid bug.
Comment 9 Mike Cao 2013-02-21 07:48:19 EST
(In reply to comment #8)
> This doesn't look right: you hot-add disks, then perform S4, and resume
> guests w/o the hot-added disks.  This is never guaranteed to work.

What you described is a Bug refering to https://bugzilla.redhat.com/show_bug.cgi?id=846202.

This is the only way I can workaround it 
> 
> Try starting qemu with command line from step 1 plus all the hot-added disks
> and drives, if there still is a problem, it'll be a valid bug.
Comment 10 Amit Shah 2013-02-21 08:13:09 EST
(In reply to comment #9)
> (In reply to comment #8)
> > This doesn't look right: you hot-add disks, then perform S4, and resume
> > guests w/o the hot-added disks.  This is never guaranteed to work.
> 
> What you described is a Bug refering to
> https://bugzilla.redhat.com/show_bug.cgi?id=846202.
> 
> This is the only way I can workaround it 

You're working around a bug (for what?) with a completely wrong test case.

Closing as notabug.
Comment 11 Mike Cao 2013-02-21 08:31:24 EST
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > This doesn't look right: you hot-add disks, then perform S4, and resume
> > > guests w/o the hot-added disks.  This is never guaranteed to work.
> > 
> > What you described is a Bug refering to
> > https://bugzilla.redhat.com/show_bug.cgi?id=846202.
> > 
> > This is the only way I can workaround it 
> 
> You're working around a bug (for what?) with a completely wrong test case.
> 
> Closing as notabug.

But the backtrace shows sth wrong in virtio scsi driver ..

So do you mean w/o https://bugzilla.redhat.com/show_bug.cgi?id=846202 fix ,we can not say we support pm+hotplug for windows guest ,right ?
Comment 12 Amit Shah 2013-02-21 10:12:34 EST
(In reply to comment #11)
> > You're working around a bug (for what?) with a completely wrong test case.
> > 
> > Closing as notabug.
> 
> But the backtrace shows sth wrong in virtio scsi driver ..
> 
> So do you mean w/o https://bugzilla.redhat.com/show_bug.cgi?id=846202 fix
> ,we can not say we support pm+hotplug for windows guest ,right ?

Right.
Comment 13 Vadim Rozenfeld 2013-02-21 16:34:16 EST
(In reply to comment #12)
> (In reply to comment #11)
> > > You're working around a bug (for what?) with a completely wrong test case.
> > > 
> > > Closing as notabug.
> > 
> > But the backtrace shows sth wrong in virtio scsi driver ..
> > 
> > So do you mean w/o https://bugzilla.redhat.com/show_bug.cgi?id=846202 fix
> > ,we can not say we support pm+hotplug for windows guest ,right ?
> 
> Right.

I will check it again, but what i remember from my previous investigation is that
resources allocated on hot-plug were not exactly the same as allocated statically 
during resume. So if we can specify the same resources for both events - hot-plug and resume, and Windows will not reallocate those resources - it is a Windows driver's bug.

Best regards,
Vadim.
Comment 14 Amit Shah 2013-02-22 00:38:01 EST
(In reply to comment #13)
> (In reply to comment #12)

> I will check it again, but what i remember from my previous investigation is
> that
> resources allocated on hot-plug were not exactly the same as allocated
> statically 
> during resume.

Yes, during resume, the hot-plugged disks were not allocated.

> So if we can specify the same resources for both events -
> hot-plug and resume, and Windows will not reallocate those resources - it is
> a Windows driver's bug.

If same resources are specified, there's another bug - bug 846202.  This one has a difference in the resource allocation during suspend and resume, which a guest isn't expected to handle.  If you believe that's not true, and is a bug, please reopen.
Comment 15 Mike Cao 2013-02-22 08:18:21 EST
Reopen in case missing bug
Comment 19 Ronen Hod 2014-08-06 05:06:38 EDT
QE, please check again with build-88.
Comment 21 Vadim Rozenfeld 2015-05-27 21:39:42 EDT
Closing this bug as WONTFIX for the following reasons:
- We are not going to support S3/S4 except for WHQL testing;
- There is no safe workaround for this problem, since the pci
configuration will be changed between hibernate and resume.

Note You need to log in before you can comment on or make changes to this bug.