Bug 768795 - [virtio-win][balloon]BSOD when excercising Balloon Driver
Summary: [virtio-win][balloon]BSOD when excercising Balloon Driver
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: virtio-win
Version: 6.0
Hardware: x86_64
OS: Windows
unspecified
high
Target Milestone: rc
: ---
Assignee: Vadim Rozenfeld
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-18 23:25 UTC by Michael Hines
Modified: 2013-02-21 10:37 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
In the previous version all inflate/deflate request were performed by workitems, which makes it possible to execute several requests simultaneously, trivial race condition and BSOD as the result. The problem was fixed by changing design from work items to dedicated thread, and processing the inflate/deflate requests sequentially.
Clone Of:
Environment:
Last Closed: 2013-02-21 10:37:53 UTC
Target Upstream Version:


Attachments (Terms of Use)
Example windows balloon driver crash (29.39 KB, image/png)
2011-12-18 23:25 UTC, Michael Hines
no flags Details
Another example crash screen (29.04 KB, image/png)
2011-12-18 23:26 UTC, Michael Hines
no flags Details
First example crash dump (262.31 KB, application/octet-stream)
2011-12-20 16:50 UTC, Michael Hines
no flags Details
First example crash screenshot (29.36 KB, image/png)
2011-12-20 16:50 UTC, Michael Hines
no flags Details
Second example crash dump (262.31 KB, text/plain)
2011-12-20 16:52 UTC, Michael Hines
no flags Details
Second example crash screenshot (29.88 KB, image/png)
2011-12-20 16:52 UTC, Michael Hines
no flags Details
Third example crash dump (262.31 KB, application/octet-stream)
2011-12-20 16:53 UTC, Michael Hines
no flags Details
Third example crash screenshot (30.68 KB, image/png)
2011-12-20 16:53 UTC, Michael Hines
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0441 0 normal SHIPPED_LIVE virtio-win bug fix and enhancement update 2013-02-20 20:48:13 UTC

Description Michael Hines 2011-12-18 23:25:46 UTC
Created attachment 548447 [details]
Example windows balloon driver crash

Description of problem: 

The windows balloon driver crashes very frequently when excercised repeatedly over a short period of time (10 to 20 minutes). We frequently use libvirt to move the Windows balloon driver up and down during runtime. These crashes happen even when the VM has sufficient free memory and the crashes are very reproducible.


Version-Release number of selected component (if applicable):

virtio-win-0.1-15.iso (From linux-kvm.org)

How reproducible:

It is very easily reproducible. 


Steps to Reproduce:
1. Create a 4GB memory Windows 7 64-bit virtual machine. Inside the VM we have a batch file that plays a 2-minute media file in an infinite loop (a cartoon video running inside the VLC media player). After the video exits, the VM plays the video over and over again.
2. While the VM is running, begin periodically invoking the "virsh setmem" command with an assortment of values, separated by a 20-second wait between invocations
3. Open virt-manager and wait for blue-screen of death to appear.
  
Actual results:

BSOD occurs after about 10 minutes.

Here is an example series of "virsh setmem" commands while the VM is running.

virsh setmem domain 3984588
virsh setmem domain 3786137
virsh setmem domain 3786137
virsh setmem domain 3597414
virsh setmem domain 3597414
virsh setmem domain 3418419
virsh setmem domain 3418419
virsh setmem domain 3248179
virsh setmem domain 3248179
virsh setmem domain 3086694
virsh setmem domain 3086694
virsh setmem domain 2932992
virsh setmem domain 2932992
virsh setmem domain 2787072
virsh setmem domain 2787072
virsh setmem domain 2726297
virsh setmem domain 2726297
virsh setmem domain 2863257
virsh setmem domain 3007334
virsh setmem domain 3007334
virsh setmem domain 3157862
virsh setmem domain 3157862
virsh setmem domain 3315916
virsh setmem domain 3315916
virsh setmem domain 3482572
virsh setmem domain 3482572
virsh setmem domain 3656755
virsh setmem domain 3308492
virsh setmem domain 3449548
virsh setmem domain 3449548
virsh setmem domain 3277363
virsh setmem domain 3277363
virsh setmem domain 3113932
virsh setmem domain 3113932
virsh setmem domain 2958284
virsh setmem domain 2958284
virsh setmem domain 2810419
virsh setmem domain 2810419
virsh setmem domain 2726297
virsh setmem domain 2726297
virsh setmem domain 2863257
virsh setmem domain 2863257
virsh setmem domain 3007334
virsh setmem domain 3007334
virsh setmem domain 3157862
virsh setmem domain 3157862
virsh setmem domain 3315916
virsh setmem domain 3315916
virsh setmem domain 3482572
virsh setmem domain 3482572
virsh setmem domain 3656755
virsh setmem domain 3656755
virsh setmem domain 3308492


Expected results:

VM should continue running as long as there is sufficient free memory available. 

Additional info:

At no point during this exercise did the windows "Resource Monitor" report less than 1GB of "Available Memory". I kept the resource monitor running while the VM is running using virt-manager to make sure that the balloon values were not causing any emergency low-memory situations.

Comment 1 Michael Hines 2011-12-18 23:26:54 UTC
Created attachment 548448 [details]
Another example crash screen

Comment 4 Michael Hines 2011-12-19 20:24:19 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
virtio-win-0.1-15.iso
qemu version: 0.15 stable

Comment 5 Michael Hines 2011-12-19 20:25:46 UTC
I reperformed the same test with an upgraded version of QEMU (0.15) just to rule out the possibility of the BSOD/virtio crash being due to an old qemu from RHEL 6.0, but that did not help. The crash still happens.

Comment 6 Yan Vugenfirer 2011-12-20 10:56:40 UTC
Can you provide the crash dump that was created during the crash?

It should be in one of the following locations depending on the configuration: c:\windows\memory.dmp or in the location that will be specified in the windows that opens up after there was a crash by Windows to ask you to send the report to MS.

Comment 7 Michael Hines 2011-12-20 16:49:08 UTC
I've gathered several "minidump" files. The larger dumps don't seem available.

Also: I'm happy to attach GDB to QEMU if you like to attempt to capture more information if that helps. I'm very familiar with debugging driver crashes in Linux.

There are 6 attachments following this comment:

- Three "minidumps"
- And Three screen shots

Please do let me know if you need me to provide you with any more sophisticated low-level information to assist in pinning down the problem.

Comment 8 Michael Hines 2011-12-20 16:50:10 UTC
Created attachment 548884 [details]
First example crash dump

Comment 9 Michael Hines 2011-12-20 16:50:56 UTC
Created attachment 548885 [details]
First example crash screenshot

Comment 10 Michael Hines 2011-12-20 16:52:01 UTC
Created attachment 548886 [details]
Second example crash dump

Comment 11 Michael Hines 2011-12-20 16:52:42 UTC
Created attachment 548887 [details]
Second example crash screenshot

Comment 12 Michael Hines 2011-12-20 16:53:19 UTC
Created attachment 548888 [details]
Third example crash dump

Comment 13 Michael Hines 2011-12-20 16:53:56 UTC
Created attachment 548889 [details]
Third example crash screenshot

Comment 22 Mike Cao 2012-02-28 05:36:43 UTC
Reproduced this issue with win2k8R2 on virtio-win-prewhql-23

steps:
1.in the guest ,play a concert video in www.youku.com
2.virsh setmem in a loop

Actual Results
the vedio hang at first ,then BSOD happened .

Comment 23 Mike Cao 2012-02-28 06:04:38 UTC
Not only BSOD happened ,sometime qemu-kvm quit with "virtio: trying to map MMIO
memory"

Vadim ,
need I report a new Bug for that ?

Thanks,
Mike

Comment 25 Mike Cao 2012-02-28 06:18:10 UTC
Full steps to reproduce this bug via qemu-kvm comamndline 

1.start guest with balloon and -monitor unix:/tmp/tt,server,nowait
eg:/usr/libexec/qemu-kvm -M rhel6.3.0 -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name win08R2 -uuid 3c9151e0-f85d-da58-c63a-8c30065eee3e -nodefconfig -nodefaults  -rtc base=localtime,driftfix=slew -no-shutdown -drive file=/home/win08R2,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive file=/root/en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_with_sp1_x64_dvd_617601.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:b7:20:34,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -vnc 0:0 -vga std -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -monitor unix:/tmp/tt,server,nowait
2.play a online video in the guest (I search one concert video in www.youku.com and play it )
3.during steps2 ,run following steps .
for ((;;))
do
sh balloon.sh 
done

#cat balloon.sh
echo balloon `echo 3984588/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3786137/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3786137/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3597414/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3597414/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3418419/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3418419/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3248179/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3248179/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3086694/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3086694/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2932992/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2932992/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2787072/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2787072/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2726297/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2726297/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2863257/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3007334/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3007334/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3157862/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3157862/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3315916/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3315916/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3482572/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3482572/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3656755/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3308492/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3449548/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3449548/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3277363/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3277363/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3113932/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3113932/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2958284/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2958284/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2810419/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2810419/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2726297/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2726297/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2863257/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 2863257/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3007334/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3007334/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3157862/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3157862/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3315916/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3315916/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3482572/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3482572/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3656755/1024 | bc ` |nc -U /tmp/tt 
echo balloon `echo 3656755/1024 | bc ` |nc -U /tmp/tt 

Actual Results :
wait less than 10mins ,sometimes guest BOSD ,sometimes qemu-kvm quit with "virtio: trying to map MMIO memory"

Comment 26 Mike Cao 2012-02-28 06:29:46 UTC
(In reply to comment #25)
> Full steps to reproduce this bug via qemu-kvm comamndline 
> 
> 1.start guest with balloon and -monitor unix:/tmp/tt,server,nowait

> 2.play a online video in the guest (I search one concert video in www.youku.com
> and play it )
> 3.during steps2 ,run following steps .
> for ((;;))
> do
> sh balloon.sh 
> done
> 

There should add a steps
left step3 running 5mins ,then kill the scripts

> Actual Results :
> wait less than 10mins ,sometimes guest BOSD ,sometimes qemu-kvm quit with
> "virtio: trying to map MMIO memory"

Comment 27 Vadim Rozenfeld 2012-02-28 07:16:26 UTC
(In reply to comment #23)
> Not only BSOD happened ,sometime qemu-kvm quit with "virtio: trying to map MMIO
> memory"
> 
> Vadim ,
> need I report a new Bug for that ?

Hi Mike,
Let's keep it as a single bug, until we find
out the reason for this problem.

Best regards,
Vadim.

Comment 28 Michael Hines 2012-02-28 15:52:28 UTC
Thanks for your help, guys. Let me know if you would like me to do any testing of any kind.

Comment 29 Ronen Hod 2012-03-06 09:13:40 UTC
We will not make it for 6.3.
We would like to solve ASAP since it is a QEMU issue too (sometimes)

Comment 30 Vadim Rozenfeld 2012-03-14 13:03:13 UTC
(In reply to comment #29)
> We will not make it for 6.3.
> We would like to solve ASAP since it is a QEMU issue too (sometimes)

Could you please explain what makes you thinking that it is a QEMU issue?
TIA,
Vadim.

Comment 31 Vadim Rozenfeld 2012-06-19 18:17:31 UTC
Hi Mike,

please check the most recent balloon driver, from
http://download.devel.redhat.com/brewroot/packages/virtio-win-prewhql/0.1/29/win/virtio-win-prewhql-0.1.zip

Thank you,
Vadim.

Comment 33 Yang Zhao 2012-06-27 09:25:29 UTC
Reproduced it with win2k8-r2 on virtio-win-prewhql-0.1-23.Steps please refer to Comment 25 & 26.
Verified it with win2k8-r2 on virtio-win-prewhql-0.1-29.Steps please also refer to Comment 25 & 26.
Base on above,the bug is fixed on virtio-win-prewhql-0.1-29.


Thanks
Yang Zhao

Comment 34 Mike Cao 2012-08-03 02:37:37 UTC
According to comment #33 ,move status to VERIFIED.
Vadim, Could you grant devel_ack since this bug has been fixed ald ,Thanks.

Comment 37 errata-xmlrpc 2013-02-21 10:37:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0441.html


Note You need to log in before you can comment on or make changes to this bug.