Bug 1122364 - [virtio-win][vioser][rhel6]win2k8r2 guest bsod(7e) when do continuous hotplug/unplug during virtio serial in use with driver verifier enabled
Summary: [virtio-win][vioser][rhel6]win2k8r2 guest bsod(7e) when do continuous hotplu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: virtio-win
Version: 7.2
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: rc
: 7.2
Assignee: Ladi Prosek
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-23 05:21 UTC by lijin
Modified: 2016-11-04 08:43 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
NO_DOCS
Clone Of:
Environment:
Last Closed: 2016-11-04 08:43:32 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2609 normal SHIPPED_LIVE virtio-win bug fix and enhancement update 2016-11-03 15:27:12 UTC

Description lijin 2014-07-23 05:21:41 UTC
Description of problem:
win2k8r2 guest bsod(7e) when do  continuous hotplug/unplug during virtio serial in use

Version-Release number of selected component (if applicable):
kernel-2.6.32-490.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.430.el6.x86_64
seabios-0.6.1.2-28.el6.x86_64
virtio-win-prewhql-86

How reproducible:
4/5

Steps to Reproduce:
1.boot guest with:
/usr/libexec/qemu-kvm -drive file=win2k8R2.qcow2,if=none,cache=none,media=disk,format=qcow2,id=drive-ide0-0-1 -device ide-drive,id=ide0-0-1,drive=drive-ide0-0-1,bootindex=0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -usb -device usb-tablet -spice disable-ticketing,port=5900 -vga qxl -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:7f:f9:F6,bus=pci.0 -monitor unix:/tmp/tt,server,nowait -chardev file,path=/root/console.log,id=serial1 -device isa-serial,chardev=serial1,id=s1 -cpu SandyBridge -smp 2,maxcpus=2,cores=2,threads=1,sockets=1 -m 2G -enable-kvm -qmp tcp:0:4444,server,nowait -device virtio-serial-pci,id=virtio-serial0,max_ports=16 -chardev socket,path=/tmp/tt0,server,nowait,id=chardev0 -device virtserialport,chardev=chardev0,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port0,nr=1 -chardev socket,path=/tmp/tt1,server,nowait,id=chardev1 -device virtserialport,chardev=chardev1,name=com.redhat.rhevm.vdsm1,bus=virtio-serial0.0,id=port1
2.Transferring data from guest to host via port0 in a loop
eg: on the host # for ((;;)) ; do python serial-host-receive.py /tmp/tt0; sleep 1; done
in the guest #for ((;;)) ; do python VirtIOChannel_guest_send.py com.redhat.rhevm.vdsm;done
3.Transferring data form host to guest via port1 in a loop
eg:in the guest # for ((;;)) ;do python VirtIOChannel_guest_reieve.py com.redhat.rhevm.vdsm1;done
on the host # for ((;;)) ;do python serial-host-send.py /tmp/tt1 ; sleep 1; done
4.Continuous hotplug/unplug virtioserialport:
cat hotunplug.sh 
for ((I=1;I<=50;I++)) ;
do
echo device_del port0 | nc -U /tmp/tt ;
sleep 3;
echo device_del port1| nc -U /tmp/tt ;
sleep 3;
echo device_add virtserialport,chardev=chardev0,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port0 |nc -U /tmp/tt ;
sleep 3;
echo device_add virtserialport,chardev=chardev1,name=com.redhat.rhevm.vdsm1,bus=virtio-serial0.0,id=port1 | nc -U /tmp/tt ;
sleep 3;
done

Actual results:
guest bsod with :7e" while hotplug/unplug loop
also hit bug1025660 during this case

Expected results:
no bsod,serial device can be hotplug/unplug correctly and serial port can be used normally after this loop finished

Additional info:
16.1: kd:x86> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffff80000003, The exception code that was not handled
Arg2: fffff80001490470, The address that the exception occurred at
Arg3: fffff88001fd8c68, Exception Record Address
Arg4: fffff88001fd84c0, Context Record Address

Debugging Details:
------------------


EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

FAULTING_IP: 
nt!DbgBreakPoint+0
fffff800`01490470 cc              int     3

EXCEPTION_RECORD:  fffff88001fd8c68 -- (.exr 0xfffff88001fd8c68)
ExceptionAddress: 0000000000000000
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 21562480
   Parameter[0]: fffffffffffff800
   Parameter[1]: 0000000000000001
   Parameter[2]: 0000000000000000
   Parameter[3]: 0000000000000000
   Parameter[4]: 0000000000000000
   Parameter[5]: 0000000000c5cb82
   Parameter[6]: 0000000000000000
   Parameter[7]: 0000000001fd8ea8
   Parameter[8]: fffffffffffff880
   Parameter[9]: 0000000000000000
   Parameter[10]: 0000000000000000
   Parameter[11]: 0000000000000000
   Parameter[12]: 0000000000000000
   Parameter[13]: 0000000000000000
   Parameter[14]: 0000000000000000

CONTEXT:  fffff88001fd84c0 -- (.cxr 0xfffff88001fd84c0)
eax=00e8a210 ebx=fffffa80 ecx=fffffa80 edx=02b66920 esi=03212b58 edi=fffff880
eip=00000001 esp=fffff800 ebp=fffff880 iopl=2     vif ov up di ng nz ac po nc
cs=0000  ss=0018  ds=8ea8  es=057f  fs=d7d8  gs=0000             efl=014d6990
00000000`00000001 ??              ???
Resetting default scope

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

BUGCHECK_STR:  0x7E

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from 0000000000000000 to 0000000000000001

STACK_TEXT:  
00000000 00000000 00000000 00000000 00000000 0x0


STACK_COMMAND:  kb

SYMBOL_NAME:  ANALYSIS_INCONCLUSIVE

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: Unknown_Module

IMAGE_NAME:  Unknown_Image

DEBUG_FLR_IMAGE_TIMESTAMP:  0

BUCKET_ID:  INVALID_KERNEL_CONTEXT

Followup: MachineOwner
---------

Comment 3 lijin 2014-07-23 09:12:26 UTC
guest hit this while enable driver verifier;
guest won't bsod with driver verifier disabled(try 3 times)

Comment 4 Gal Hammer 2014-07-24 11:51:57 UTC
Does it reproduce if you increase the all sleep times from 3 to 10?

Comment 5 Gal Hammer 2014-07-24 14:21:56 UTC
One more question. Does it also reproduce on a rhel-7 host?

Comment 6 lijin 2014-07-25 03:19:31 UTC
1.increase the all sleep times from 3 to 10 ---> still hit this issue
2.can reproduce on rhel7 host:
package info:
qemu-kvm-rhev-1.5.3-60.el7ev_0.2.x86_64
kernel-3.10.0-133.el7.x86_64
seabios-1.7.2.2-12.el7.x86_64

Comment 7 Mike Cao 2014-08-05 02:51:22 UTC
lijin, pls try whether you can reproduce it w/ virtio-win-1.7.1

Comment 8 lijin 2014-08-05 05:58:40 UTC
(In reply to Mike Cao from comment #7)
> lijin, pls try whether you can reproduce it w/ virtio-win-1.7.1

can reproduce it with virtio-win-1.7.1

Comment 9 Gal Hammer 2014-08-10 14:07:30 UTC
Is the verifier is set with the standard settings or all the flags are enabled (see bug 1113910)?

Is it reproducible when the serial ports are not in use?

Thanks.

Comment 10 lijin 2014-08-11 07:36:13 UTC
1.the verifier is set with the standard settings:# verifier.exe /standard /driver vioser.sys 
2.cannot reproduce this issue when the serial ports are not in use

Comment 11 lijin 2014-12-09 08:35:03 UTC
winxp hit the same issue with build94

How reproducible:
2/5

package info:
virtio-win-prewhql-94
qemu-kvm-rhev-2.1.2-12.el7.x86_64
kernel-3.10.0-208.el7.x86_64
seabios-1.7.5-5.el7.x86_64
spice-server-0.12.4-8.el7.x86_64
qxl-win-0.1-21

steps:
1.boot guest with:
/usr/libexec/qemu-kvm -drive file=winxp-new.raw,if=none,cache=none,media=disk,format=raw,id=drive-ide0-0-1 -device ide-drive,id=ide1,drive=drive-ide0-0-1,bus=ide.1,unit=1,bootindex=1 -spice port=5900,disable-ticketing -vga qxl -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -qmp tcp:0:4444,server,nowait -usb -device usb-tablet,id=tablet1 -monitor stdio -boot menu=on -netdev tap,script=/etc/qemu-ifup,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=00:52:54:fe:48:68 -cpu SandyBridge -M pc -smp 2 -m 2G -enable-kvm \
-device virtio-serial-pci,id=virtio-serial0,max_ports=31 -chardev socket,id=channel1,host=127.0.0.1,port=12345,server,nowait -device virtserialport,nr=1,chardev=channel1,name=com.redhat.rhevm.vdsm1,bus=virtio-serial0.0,id=port1 -chardev socket,id=channel2,host=127.0.0.1,port=12346,server,nowait -device virtserialport,nr=2,chardev=channel2,name=com.redhat.rhevm.vdsm2,bus=virtio-serial0.0,id=port2
2.Transferring data form host to guest via port0 in a loop
eg:in the guest # for ((;;)) ;do python VirtIOChannel_guest_reieve.py com.redhat.rhevm.vdsm1;done
on the host # for ((;;)) ;do python serial-host-send.py; sleep 1; done
3.Transferring data from guest to host via port1 in a loop
eg: on the host # for ((;;)) ; do python serial-host-receive.py; sleep 1; done
in the guest #for ((;;)) ; do python VirtIOChannel_guest_send.py com.redhat.rhevm.vdsm2;done
4.Continuous hotplug/unplug virtioserialport:
# cat hot-port.sh
#!/bin/bash
# some simply scripts for balloon device hotplug/unplug in a loop
let i=0
exec 3<>/dev/tcp/localhost/4444 #note modify this to qmp port
echo -e "{ 'execute': 'qmp_capabilities' }" >&3
read response <&3
echo $response
while [ $i -lt 100 ]
do
    echo -e "{ 'execute': 'device_del', 'arguments': {'id': 'port1' }}">&3 ;
    sleep 3 ;
    read response <&3 ;
    echo "$i: $response"
    echo -e "{ 'execute': 'device_del', 'arguments': {'id': 'port2' }}">&3 ;
    sleep 3 ;
    read response <&3 ;
    echo "$i: $response"
    sleep 3 ;
    echo -e "{'execute':'device_add','arguments':{'driver':'virtserialport','chardev':'channel1','name':'com.redhat.rhevm.vdsm1','bus':'virtio-serial0.0','id':'port1'}}">&3 ;
    sleep 3 ;
    read response <&3
    echo "$i: $response"
    echo -e "{'execute':'device_add','arguments':{'driver':'virtserialport','chardev':'channel2','name':'com.redhat.rhevm.vdsm2','bus':'virtio-serial0.0','id':'port2'}}">&3 ;
    sleep 3 ;
    read response <&3
    echo "$i: $response"
    let i=$i+1
done

actual results:
during step4,guest bsod with 7E

windbg info:
1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: 80000003, The exception code that was not handled
Arg2: 8052b5d0, The address that the exception occurred at
Arg3: bad037a8, Exception Record Address
Arg4: bad034a4, Context Record Address

Debugging Details:
------------------

*** No owner thread found for resource 8055b4e0
*** No owner thread found for resource 8055b4e0
*** No owner thread found for resource 8055b4e0

EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

FAULTING_IP: 
nt!DbgBreakPoint+0
8052b5d0 cc              int     3

EXCEPTION_RECORD:  bad037a8 -- (.exr 0xffffffffbad037a8)
ExceptionAddress: 8052b5d0 (nt!DbgBreakPoint)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 3
   Parameter[0]: 00000000
   Parameter[1]: 8052b704
   Parameter[2]: 00000061

CONTEXT:  bad034a4 -- (.cxr 0xffffffffbad034a4;r)
eax=89770b70 ebx=00000000 ecx=8052b704 edx=00000061 esi=804faea6 edi=00000102
eip=8052b5d0 esp=bad03870 ebp=bad0388c iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000202
nt!DbgBreakPoint:
8052b5d0 cc              int     3
Last set context:
eax=89770b70 ebx=00000000 ecx=8052b704 edx=00000061 esi=804faea6 edi=00000102
eip=8052b5d0 esp=bad03870 ebp=bad0388c iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000202
nt!DbgBreakPoint:
8052b5d0 cc              int     3
Resetting default scope

DEFAULT_BUCKET_ID:  DRIVER_FAULT

BUGCHECK_STR:  0x7E

PROCESS_NAME:  System

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_PARAMETER1:  00000000

EXCEPTION_PARAMETER2:  8052b704

EXCEPTION_PARAMETER3:  00000061

ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre

LOCK_ADDRESS:  8055b560 -- (!locks 8055b560)

Resource @ nt!IopDeviceTreeLock (0x8055b560)    Shared 1 owning threads
     Threads: 89bb43c8-01<*> 
1 total locks, 1 locks currently held

PNP_TRIAGE: 
	Lock address  : 0x8055b560
	Thread Count  : 1
	Thread address: 0x89bb43c8
	Thread wait   : 0x9751

LAST_CONTROL_TRANSFER:  from 805cff2a to 804f9f33

STACK_TEXT:  
bad0386c b1847c57 8977ffb8 8977fe00 00000000 nt!DbgBreakPoint
bad0388c b18374af 8977ffb8 b185dd18 768801f8 wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+0x5e
bad038ac b18392a7 00000000 b185ffc8 89a54d00 wdf01000!FxIoQueue::StopProcessingForPower+0xcf
bad038cc b1830e61 00000001 bad038f4 b184ecdc wdf01000!FxPkgIo::StopProcessingForPower+0xbd
bad038d8 b184ecdc 89628150 00000001 b18602f8 wdf01000!FxDeviceToMx::FxPkgIo_StopProcessingForPower+0x16
bad038f4 b184dd81 89a54d00 89a54e0c 89a54d00 wdf01000!FxPkgPnp::PowerGotoD3Stopped+0x47
bad0397c b184ebb2 00000314 89a54e0c 89a54d00 wdf01000!FxPkgPnp::PowerEnterNewState+0x11c
bad039a0 b184f5bb bad039b8 b18621d4 89a54d00 wdf01000!FxPkgPnp::PowerProcessEventInner+0x171
bad039c4 b1854e73 00000001 bad03a58 b1854832 wdf01000!FxPkgPnp::PowerProcessEvent+0x15c
bad039d0 b1854832 89a54d00 89a54e7c 89a54d00 wdf01000!FxPkgPnp::PowerPolStopping+0x1a
bad03a58 b1855716 0000055b 89a54e7c 89a54d00 wdf01000!FxPkgPnp::PowerPolicyEnterNewState+0x11c
bad03a7c b1856388 bad03a94 00000008 89a54d00 wdf01000!FxPkgPnp::PowerPolicyProcessEventInner+0x185
bad03aa0 b1852b74 00000001 00000000 b185309c wdf01000!FxPkgPnp::PowerPolicyProcessEvent+0x172
bad03aac b185309c bad03adc b1852484 89a54d00 wdf01000!FxPkgPnp::PnpPowerPolicySurpriseRemove+0xc
bad03ab4 b1852484 89a54d00 89a54da8 89a54d00 wdf01000!FxPkgPnp::PnpEventFailedIoStarting+0xd
bad03adc b1852db2 00000127 89a54da8 89a54d00 wdf01000!FxPkgPnp::PnpEnterNewState+0x104
bad03b00 b185347a bad03b18 89770b70 89a54d00 wdf01000!FxPkgPnp::PnpProcessEventInner+0x149
bad03b24 b184c540 00000400 00000000 89a54d00 wdf01000!FxPkgPnp::PnpProcessEvent+0x13e
bad03b38 b185067a bad03b70 89a54d00 bad03b68 wdf01000!FxPkgPnp::PnpSurpriseRemoval+0x29
bad03b48 b184be02 89a54d00 bad03b70 8a406f68 wdf01000!FxPkgPdo::_PnpSurpriseRemoval+0x1e
bad03b68 b1828a3f 8a406f68 bad03b90 b1828c63 wdf01000!FxPkgPnp::Dispatch+0x207
bad03b74 b1828c63 89a23830 8a406f68 89a23830 wdf01000!FxDevice::Dispatch+0x7f
bad03b90 804ef18f 89a23830 8a406f68 806e6428 wdf01000!FxDevice::DispatchWithLock+0x7b
bad03ba0 80658128 8a406ffc bad03c40 8a406f68 nt!IopfCallDriver+0x31
bad03bc4 80592b63 89a23830 89a23830 00000017 nt!IovCallDriver+0xa0
bad03bf0 80592dc5 89a23830 bad03c1c 00000000 nt!IopSynchronousCall+0xb7
bad03c44 805938e4 89a23830 00000017 e1549ee8 nt!IopRemoveDevice+0x93
bad03c60 80594786 00000308 e1549ee8 00000000 nt!IopSurpriseRemoveLockedDeviceNode+0x90
bad03c74 805947fd 896096f8 00000003 e1549ee8 nt!IopDeleteLockedDeviceNode+0x24
bad03ca8 8059a1d2 89a23830 02549ee8 00000003 nt!IopDeleteLockedDeviceNodes+0x3f
bad03d3c 8059a72e bad03d78 806e6974 e1d88c28 nt!PiProcessQueryRemoveAndEject+0x4d2
bad03d58 8059a874 bad03d78 89a22d80 8056485c nt!PiProcessTargetDeviceEvent+0x2a
bad03d7c 8053876d 89a22d80 00000000 89bb43c8 nt!PiWalkDeviceList+0xea
bad03dac 805cff64 89a22d80 00000000 00000000 nt!ExpWorkerThread+0xef
bad03ddc 805460de 8053867e 00000001 00000000 nt!PspSystemThreadStartup+0x34
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16


FOLLOWUP_IP: 
wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+5e
b1847c57 8d45f4          lea     eax,[ebp-0Ch]

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+5e

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: wdf01000

IMAGE_NAME:  wdf01000.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4a5bbf28

STACK_COMMAND:  .cxr 0xffffffffbad034a4 ; kb

FAILURE_BUCKET_ID:  0x7E_BADMEMREF_wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+5e

BUCKET_ID:  0x7E_BADMEMREF_wdf01000!_FX_DRIVER_GLOBALS::WaitForSignal+5e

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:0x7e_badmemref_wdf01000!_fx_driver_globals::waitforsignal+5e

FAILURE_ID_HASH:  {7f9ce2e3-e036-29c0-4327-6bd86a1c7722}

Followup: MachineOwner
---------

Comment 13 lijin 2014-12-10 09:42:28 UTC
(In reply to lijin from comment #11)
> # cat hot-port.sh
> #!/bin/bash
> # some simply scripts for balloon device hotplug/unplug in a loop
> let i=0
> exec 3<>/dev/tcp/localhost/4444 #note modify this to qmp port
> echo -e "{ 'execute': 'qmp_capabilities' }" >&3
> read response <&3
> echo $response
> while [ $i -lt 100 ]
> do
>     echo -e "{ 'execute': 'device_del', 'arguments': {'id': 'port1' }}">&3 ;
>     sleep 3 ;
>     read response <&3 ;
>     echo "$i: $response"
>     echo -e "{ 'execute': 'device_del', 'arguments': {'id': 'port2' }}">&3 ;
>     sleep 3 ;
>     read response <&3 ;
>     echo "$i: $response"
>     sleep 3 ;
>     echo -e
> "{'execute':'device_add','arguments':{'driver':'virtserialport','chardev':
> 'channel1','name':'com.redhat.rhevm.vdsm1','bus':'virtio-serial0.0','id':
> 'port1'}}">&3 ;
>     sleep 3 ;
>     read response <&3
>     echo "$i: $response"
>     echo -e
> "{'execute':'device_add','arguments':{'driver':'virtserialport','chardev':
> 'channel2','name':'com.redhat.rhevm.vdsm2','bus':'virtio-serial0.0','id':
> 'port2'}}">&3 ;
>     sleep 3 ;
>     read response <&3
>     echo "$i: $response"
>     let i=$i+1
> done

modify all "sleep 3" to "sleep 7" to give guest enough time to unplug,still hit this issue(tried three times,guest bsod once)

Comment 15 lijin 2015-03-31 08:25:20 UTC
still hit this issue on build 102

package info:
kernel-2.6.32-540.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.454.el6.x86_64
seabios-0.6.1.2-29.el6.x86_64
spice-server-0.12.4-12.el6.x86_64
virtio-win-prewhql-102

steps same as comment #0
    
Actual Results:
on build 86 and 102,guest bsod with 7e when hotunplug/hotplug in loop while serial ports in use;

Based on above,this issue has NOT been fixed.

Comment 17 lijin 2015-05-12 06:57:09 UTC
win2012R2 guest also hit this issue,dump file is located in the same place as comment#16

Comment 21 Ladi Prosek 2015-12-17 16:07:13 UTC
Taking a look.

Comment 22 Ladi Prosek 2015-12-22 15:23:59 UTC
I was able to repro this. It seems that only one port and pushing data through it from guest to host is enough to hit it.

With verifier off, this manifests as the write call hanging forever after the port is unplugged. Maybe an IRP is never completed? Digging deeper...

Comment 23 lijin 2015-12-23 03:25:07 UTC
win2012R2 still hit this issue with following  package:
virtio-win-1.7.5-0.el6.noarch
qemu-kvm-rhev-0.12.1.2-2.482.el6.x86_64
kernel-2.6.32-595.el6.x86_64
seabios-0.6.1.2-30.el6.x86_64

Comment 24 Gal Hammer 2015-12-23 07:42:21 UTC
(In reply to Ladi Prosek from comment #22)
> I was able to repro this. It seems that only one port and pushing data
> through it from guest to host is enough to hit it.

Good to hear. I was unable to reproduce it on my setup.

Do you have a window crash dump file to share? Thanks.
 
> With verifier off, this manifests as the write call hanging forever after
> the port is unplugged. Maybe an IRP is never completed? Digging deeper...

Comment 25 Ladi Prosek 2015-12-23 09:36:24 UTC
Yes, I have a dump. I'll start copying it over but I suspect that it will be the same thing that lijin shared earlier. At least the blame callstack shown by !analyze is identical.

Here's something potentially more useful - verbose debug output of the driver just before the hang. One thing I still want to verify is that the last write in the log actually corresponds to the userspace write that hangs.

00102727	62.04384613	VIOSerialPortCreate Port id = 3
00102728	62.04384995	--> VIOSerialReclaimConsumedBuffers
00102729	62.04385757	<-- VIOSerialReclaimConsumedBuffers Full: 0
00102730	62.04386139	--> VIOSerialSendCtrlMsg vq = FFFFD0016E41D000
00102731	62.04387283	<-- VIOSerialSendCtrlMsg cnt = 0
00102732	62.04388046	<-- VIOSerialPortCreate
00102733	62.04390335	--> VIOSerialPortDeviceControl
00102734	62.04391098	<-- VIOSerialPortDeviceControl
00102735	62.04421997	--> VIOSerialPortWrite Request: 00001FFF2ADCF7E8 Length: 16384
00102736	62.04423141	--> VIOSerialWillWriteBlock
00102737	62.04425049	--> VIOSerialReclaimConsumedBuffers
00102738	62.04426193	<-- VIOSerialReclaimConsumedBuffers Full: 0
00102739	62.04426575	<-- VIOSerialWillWriteBlock
00102740	62.04428864	--> VIOSerialSendBuffers Buffer: FFFFE000D4400000 Length: 16384
00102741	62.04431152	<-- VIOSerialSendBuffers
00102742	62.04431534	<-- VIOSerialPortWrite
00102743	62.04757309	--> VIOSerialInterruptIsr
00102744	62.04758835	<-- VIOSerialInterruptIsr Serviced: 1
00102745	62.04761887	--> VIOSerialInterruptIsr
00102746	62.04762268	--> VIOSerialInterruptDpc
00102747	62.04762650	<-- VIOSerialInterruptIsr Serviced: 1
00102748	62.04763031	--> VIOSerialCtrlWorkHandler
00102749	62.04763412	--> VIOSerialQueuesInterruptDpc
00102750	62.04764175	<-- VIOSerialCtrlWorkHandler
00102751	62.04764557	--> VIOSerialGetInBuf
00102752	62.04764938	<-- VIOSerialInterruptDpc
00102753	62.04765320	<-- VIOSerialGetInBuf
00102754	62.04765701	--> VIOSerialReclaimConsumedBuffers
00102755	62.04767990	<-- VIOSerialReclaimConsumedBuffers Full: 0
00102756	62.04768372	<-- VIOSerialQueuesInterruptDpc
00102757	62.04793549	--> VIOSerialPortClose
00102758	62.04793930	--> VIOSerialSendCtrlMsg vq = FFFFD0016E41D000
00102759	62.04795456	<-- VIOSerialSendCtrlMsg cnt = 0
00102760	62.04796219	--> VIOSerialDiscardPortDataLocked
00102761	62.04796600	<-- VIOSerialDiscardPortDataLocked
00102762	62.04797363	--> VIOSerialReclaimConsumedBuffers
00102763	62.04798126	<-- VIOSerialReclaimConsumedBuffers Full: 0
00102764	62.04798508	<-- VIOSerialPortClose
00102765	62.04801941	VIOSerialPortCreate Port id = 3
00102766	62.04803085	--> VIOSerialReclaimConsumedBuffers
00102767	62.04803467	<-- VIOSerialReclaimConsumedBuffers Full: 0
00102768	62.04803848	--> VIOSerialSendCtrlMsg vq = FFFFD0016E41D000
00102769	62.04804611	<-- VIOSerialSendCtrlMsg cnt = 0
00102770	62.04805756	<-- VIOSerialPortCreate
00102771	62.04807281	--> VIOSerialPortDeviceControl
00102772	62.04807663	<-- VIOSerialPortDeviceControl
00102773	62.04811859	--> VIOSerialInterruptIsr
00102774	62.04812241	<-- VIOSerialInterruptIsr Serviced: 1
00102775	62.04813004	--> VIOSerialInterruptDpc
00102776	62.04813766	--> VIOSerialCtrlWorkHandler
00102777	62.04814529	<-- VIOSerialCtrlWorkHandler
00102778	62.04814911	<-- VIOSerialInterruptDpc
00102779	62.04817963	--> VIOSerialInterruptIsr
00102780	62.04819107	<-- VIOSerialInterruptIsr Serviced: 1
00102781	62.04819489	--> VIOSerialInterruptDpc
00102782	62.04819870	--> VIOSerialCtrlWorkHandler
00102783	62.04820633	<-- VIOSerialCtrlWorkHandler
00102784	62.04821396	<-- VIOSerialInterruptDpc
00105619	62.22101593	--> VIOSerialInterruptIsr
00105620	62.22103119	<-- VIOSerialInterruptIsr Serviced: 1
00105621	62.22104263	--> VIOSerialInterruptIsr
00105622	62.22104263	<-- VIOSerialInterruptIsr Serviced: 1
00105623	62.22105789	--> VIOSerialInterruptDpc
00105624	62.22105789	--> VIOSerialCtrlWorkHandler
00105625	62.22106171	--> VIOSerialHandleCtrlMsg
00105626	62.22106552	VIOSerialFindPortById  port = 3
00105627	62.22107315	VIOSerialFindPortById  id = 3 port = 0xFFFFE000D6B5B198
00105628	62.22107315	VIRTIO_CONSOLE_PORT_REMOVE id = 3
00105629	62.22108078	--> VIOSerialRemovePort DeviceId: 0 PortId: 3
00105630	62.22108459	--> VIOSerialEvtChildListIdentificationDescriptionCompare
00105631	62.22108841	<-- VIOSerialEvtChildListIdentificationDescriptionCompare
00105632	62.22109222	<-- VIOSerialRemovePort
00105633	62.22109604	<-- VIOSerialHandleCtrlMsg
00105634	62.22109985	--> VIOSerialAddInBuf  buf = FFFFE000D4E01110
00105635	62.22110748	<-- VIOSerialAddInBuf
00105636	62.22111130	<-- VIOSerialCtrlWorkHandler
00105637	62.22111511	<-- VIOSerialInterruptDpc
00105638	62.22111893	--> VIOSerialQueuesInterruptDpc
00105639	62.22111893	<-- VIOSerialQueuesInterruptDpc
00105640	68.22449493	--> VIOSerialInterruptIsr
00105641	68.22449493	<-- VIOSerialInterruptIsr Serviced: 1
00105642	68.22450256	--> VIOSerialInterruptDpc
00105643	68.22450256	--> VIOSerialCtrlWorkHandler
00105644	68.22451019	--> VIOSerialHandleCtrlMsg
00105645	68.22451782	VIOSerialFindPortById  port = 3
00105646	68.22451782	<-- VIOSerialFindPortById
00105647	68.22452545	Invlid index 3 in control packet
00105648	68.22452545	VIRTIO_CONSOLE_PORT_OPEN invalid id = 3
00105649	68.22453308	<-- VIOSerialHandleCtrlMsg
00105650	68.22454071	--> VIOSerialAddInBuf  buf = FFFFE000D4E28FD0
00105651	68.22454071	<-- VIOSerialAddInBuf
00105652	68.22454834	--> VIOSerialHandleCtrlMsg
00105653	68.22454834	VIOSerialFindPortById  port = 3
00105654	68.22454834	<-- VIOSerialFindPortById
00105655	68.22455597	VIOSerialAddPort  DeviceId = 0 :: PortId = 3
00105656	68.22455597	--> VIOSerialEvtChildListIdentificationDescriptionCompare
00105657	68.22456360	<-- VIOSerialEvtChildListIdentificationDescriptionCompare
00105658	68.22457123	--> VIOSerialEvtChildListIdentificationDescriptionDuplicate
00105659	68.22457123	<-- VIOSerialEvtChildListIdentificationDescriptionDuplicate
00105660	68.22457123	WdfChildListAddOrUpdateChildDescriptionAsPresent = 0x0.
00105661	68.22457886	<-- VIOSerialHandleCtrlMsg
00105662	68.22457886	--> VIOSerialAddInBuf  buf = FFFFE000D4E28F90
00105663	68.22458649	<-- VIOSerialAddInBuf
00105664	68.22459412	<-- VIOSerialCtrlWorkHandler
00105665	68.22459412	<-- VIOSerialInterruptDpc

Comment 26 Ladi Prosek 2015-12-23 09:48:23 UTC
And this is what happened after I killed the hanging process. Only now was the D0 exit delivered to the driver.

00105666	1474.66308594	--> VIOSerialPortWriteRequestCancel Request: 0x00001FFF2ADCF7E8
00105667	1474.66308594	<-- VIOSerialPortWriteRequestCancel
00105668	1474.66308594	--> VIOSerialPortEvtDeviceD0Exit TargetState: 5
00105669	1474.66308594	--> VIOSerialDisableInterruptQueue disable
00105670	1474.66308594	<-- VIOSerialDisableInterruptQueue
00105671	1474.66308594	--> VIOSerialDiscardPortDataLocked
00105672	1474.66308594	<-- VIOSerialDiscardPortDataLocked
00105673	1474.66308594	--> VIOSerialReclaimConsumedBuffers
00105674	1474.66308594	<-- VIOSerialReclaimConsumedBuffers Full: 1
00105675	1474.66308594	--> VIOSerialFreeBuffer  buf = FFFFE000D4FE9ED0, buf->va_buf = FFFFE000D4FEE000
00105676	1474.66308594	<-- VIOSerialFreeBuffer
...
00105931	1474.66406250	<-- VIOSerialPortEvtDeviceD0Exit

Comment 28 Ladi Prosek 2015-12-23 16:18:01 UTC
This is what I believe is happening:
1) Upon receiving the VIRTIO_CONSOLE_PORT_REMOVE command, VIOSerialRemovePort() marks the port as "missing"
2) When the inflight write completes (QEMU explicitly flushes out all unconsumed buffers from the queue after port removal), the driver is notified as usual but fails to handle the event because VIOSerialQueuesInterruptDpc() enumerates only "present" ports
3) Therefore VIOSerialReclaimConsumedBuffers() is never called and the write request is never completed

I have verified that immediately completing the inflight write request in VIOSerialRemovePort() solves the problem but it may not be the right way to fix this. I'll continue investigating next week and will send out a patch once I properly wrap my head around all this :)

Comment 29 lijin 2015-12-24 05:15:45 UTC
Change product from rhel6 to rhel7 for the following reasons:
1.virtio-win share the same stream on rhel6 and rhel7;
2.QE and Dev have an agreement that bug fix of virtio-win on rhel7 should also work on rhel6,we will track virtio-win bug via rhel7 product.

Comment 36 Peixiu Hou 2016-03-23 03:03:36 UTC
On rhel6: 

This issue can be reproduced with virtio-win-prewhql-102.

Verified this issue with virtio-win-prewhql-115, not hit BSOD.

On rhel7: 

This issue cannot be reproduced with virtio-win-prewhql-102.

Verified this issue with virtio-win-prewhql-115, not hit BSOD.


Based on above,this issue has been fixed.

Comment 37 Yu Wang 2016-03-23 06:16:04 UTC
According to comment#36, the bug has been fixed ,change status to verified.

Thanks
wyu

Comment 40 errata-xmlrpc 2016-11-04 08:43:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2609.html


Note You need to log in before you can comment on or make changes to this bug.