Bug 803950

Summary: [virtio-win][balloon] Guest BOSD when evict memory and suspend(s4) guest at the same time
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: virtio-winAssignee: Vadim Rozenfeld <vrozenfe>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3CC: acathrow, bcao, bsarathy, dawu, mdeng, michen, rhod, syeghiay, vrozenfe
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Shutdown or Suspend to S3/S4 while memory ballooning (inflate/deflate), can result in a BSOD.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 11:58:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Cao 2012-03-16 05:14:37 UTC
Description of problem:


Version-Release number of selected component (if applicable):
2.6.32-252.el6.x86_64
qemu-kvm-0.12.1.2-2.248.el6rhev.x86_64
virtio-win-prewhql-23
win2k8R2 guests

How reproducible:
only 1 time 

Steps to Reproduce:
1.Start win2k8r2 guest w/ -m 8G
CLI: /usr/libexec/qemu-kvm -M rhel6.3.0 -enable-kvm -m 8192 -smp
4,sockets=4,cores=1,threads=1 -name win2k8-R2 -uuid
e2eaca3e-e764-f57b-22f0-74f4ab8c4965 -monitor stdio -rtc
base=localtime,driftfix=slew -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/test/win2k8r2,if=none,id=drive-ide0-0-0,format=raw,cache=none -device
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive
file=/root/en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_with_sp1_x64_dvd_617601.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device
rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:15:af:6a,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
-device usb-tablet,id=input0 -spice port=5910,disable-ticketing -vga qxl
-device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5  -bios /bios-tttt.bin
2.(qemu)balloon 1024
3.during guest memory ballooning ,then suspend the guest (s4)


Actual results:
guest BOSD during suspend. 

Expected results:


Additional info:

Comment 3 Mike Cao 2012-03-20 06:50:22 UTC
winxp hit the same issue 

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: bab910ce, The address that the exception occurred at
Arg3: bacfbbf8, Exception Record Address
Arg4: bacfb8f4, Context Record Address

Debugging Details:
------------------


EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. 

The memory could not be %s.

FAULTING_IP: 
BALLOON!BalloonTellHost+9c [c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows

\balloon\sys\balloon.c @ 347]
bab910ce 8b4304          mov     eax,dword ptr [ebx+4]

EXCEPTION_RECORD:  bacfbbf8 -- (.exr 0xffffffffbacfbbf8)
ExceptionAddress: bab910ce (BALLOON!BalloonTellHost+0x0000009c)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: 00000004
Attempt to read from address 00000004

CONTEXT:  bacfb8f4 -- (.cxr 0xffffffffbacfb8f4)
eax=00001000 ebx=00000000 ecx=bacfbce8 edx=00000000 esi=00000000 edi=00001000
eip=bab910ce esp=bacfbcc0 ebp=bacfbd00 iopl=0         nv up ei pl nz ac pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010216
BALLOON!BalloonTellHost+0x9c:
bab910ce 8b4304          mov     eax,dword ptr [ebx+4] ds:0023:00000004=????????
Resetting default scope

PROCESS_NAME:  System

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The 

memory could not be %s.

EXCEPTION_PARAMETER1:  00000000

EXCEPTION_PARAMETER2:  00000004

READ_ADDRESS:  00000004 

FOLLOWUP_IP: 
BALLOON!BalloonTellHost+9c [c:\cygwin\tmp\build\source\internal-kvm-guest-drivers-windows

\balloon\sys\balloon.c @ 347]
bab910ce 8b4304          mov     eax,dword ptr [ebx+4]

BUGCHECK_STR:  0x7E

DEFAULT_BUCKET_ID:  NULL_CLASS_PTR_DEREFERENCE

LAST_CONTROL_TRANSFER:  from bab914b4 to bab910ce

STACK_TEXT:  
bacfbd00 bab914b4 8a3f3ec0 00000000 8a4b6244 BALLOON!BalloonTellHost+0x9c [c:\cygwin\tmp\build

\source\internal-kvm-guest-drivers-windows\balloon\sys\balloon.c @ 347]
bacfbd28 bab90910 75c0c300 00000400 806e6900 BALLOON!BalloonFill+0x244 [c:\cygwin\tmp\build

\source\internal-kvm-guest-drivers-windows\balloon\sys\balloon.c @ 258]
bacfbd40 ba50f042 7831d090 8a3f0330 87ce2f68 BALLOON!FillLeakWorkItem+0x90 [c:\cygwin\tmp\build

\source\internal-kvm-guest-drivers-windows\balloon\sys\device.c @ 382]
bacfbd5c ba50f0aa 8a3dca80 bacfbd7c 80576ad5 wdf01000!FxWorkItem::WorkItemHandler+0xad
bacfbd68 80576ad5 8a3dca80 87ce2f68 8056485c wdf01000!FxWorkItem::WorkItemThunk+0x19
bacfbd7c 8053876d 8a3f0330 00000000 8a4b3b30 nt!IopProcessWorkItem+0x13
bacfbdac 805cff64 8a3f0330 00000000 00000000 nt!ExpWorkerThread+0xef
bacfbddc 805460de 8053867e 00000001 00000000 nt!PspSystemThreadStartup+0x34
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16


FAULTING_SOURCE_CODE:  
   343: 
   344:     sg.physAddr = MmGetPhysicalAddress(drvCtx->pfns_table);
   345:     sg.ulSize = sizeof(drvCtx->pfns_table[0]) * drvCtx->num_pfns;
   346: 
>  347:     if(0 > vq->vq_ops->add_buf(vq, &sg, 1, 0, devCtx, NULL, 0))
   348:     {
   349:         TraceEvents(TRACE_LEVEL_ERROR, DBG_HW_ACCESS, "<-> %s :: Cannot add buffer\n", 

__FUNCTION__);
   350:         return;
   351:     }
   352:     vq->vq_ops->kick(vq);


SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  BALLOON!BalloonTellHost+9c

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: BALLOON

IMAGE_NAME:  BALLOON.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4f666145

STACK_COMMAND:  .cxr 0xffffffffbacfb8f4 ; kb

FAILURE_BUCKET_ID:  0x7E_BALLOON!BalloonTellHost+9c

BUCKET_ID:  0x7E_BALLOON!BalloonTellHost+9c

Followup: MachineOwner
---------

Comment 4 Ronen Hod 2012-03-25 14:16:36 UTC
Moving to 6.4
Added a tech note

Comment 5 Ronen Hod 2012-03-25 14:16:37 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Shutdown or Suspend to S3/S4 while memory ballooning (inflate/deflate), can result in a BSOD.

Comment 6 Ronen Hod 2012-03-25 14:17:06 UTC
*** Bug 803940 has been marked as a duplicate of this bug. ***

Comment 10 Mike Cao 2012-04-09 08:28:59 UTC
Re-test this on virtio-win-prewhql-25

steps as same as described in comment #0

Actual Results:

Guest still BOSD happened ,I use !analyze -v to view the dmp ,results exactly same as comment #3

Based on above ,this issue still existed .
re-assign this issue .

Comment 11 Vadim Rozenfeld 2012-04-09 09:06:57 UTC
(In reply to comment #10)
> Re-test this on virtio-win-prewhql-25
> 
> steps as same as described in comment #0
> 
> Actual Results:
> 
> Guest still BOSD happened ,I use !analyze -v to view the dmp ,results exactly
> same as comment #3
> 
> Based on above ,this issue still existed .
> re-assign this issue .

Hi Mike,
Could you please upload the crash dump file?
Thank you,
Vadim.

Comment 13 Vadim Rozenfeld 2012-04-15 08:19:29 UTC
Hi Mike,
Could you give a try to build 26?
http://download.devel.redhat.com/brewroot/packages/virtio-win-prewhql/0.1/26/win/virtio-win-prewhql-0.1.zip

It not a finale fix for this problem, rather a workaround which should minimize the chances of hitting this BSOD.

Thank you,
Vadim.

Comment 14 Mike Cao 2012-04-18 07:21:11 UTC
(In reply to comment #13)
> Hi Mike,
> Could you give a try to build 26?
> http://download.devel.redhat.com/brewroot/packages/virtio-win-prewhql/0.1/26/win/virtio-win-prewhql-0.1.zip
> 
> It not a finale fix for this problem, rather a workaround which should minimize
> the chances of hitting this BSOD.
> 
> Thank you,
> Vadim.

Hi, Vadim

Tried 7 times w/ virtio-win-prewhql-26 ,I did not hit the issue described in comment #0

Since this is not a final fix.

How to handle the bug's status ?

Best Regards,
Mike

Comment 15 Vadim Rozenfeld 2012-04-18 08:09:46 UTC
Hi Mike,
Balloon driver will be slightly redesigned in 6.4,
including the piece of code which leads to the above problem.
If you cannot reproduce this problem, let's close it for now
and hope that we will not see it any more.
Best,
Vadim.

Comment 16 Mike Cao 2012-04-18 08:14:54 UTC
(In reply to comment #15)
> Hi Mike,
> Balloon driver will be slightly redesigned in 6.4,
> including the piece of code which leads to the above problem.
> If you cannot reproduce this problem, let's close it for now
> and hope that we will not see it any more.
> Best,
> Vadim.

Let's keep it open now . QE will run full round virtio balloon recently ,if We did not this this issue and the version ack change back to rhel6.3.0+ ,I will close this one 

Mike

Comment 20 errata-xmlrpc 2012-06-20 11:58:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0751.html