Bug 623903

Summary: query-balloon commmand didn't return on pasued guest cause virt-manger hang
Product: Red Hat Enterprise Linux 6 Reporter: lihuang <lihuang>
Component: qemu-kvmAssignee: Amit Shah <amit.shah>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 6.0CC: anderson, berrange, dwu, mkenneth, syeghiay, szhou, tao, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.113.el6 Doc Type: Bug Fix
Doc Text:
Due to issues caused when query-balloon hangs when it's unable to get the stats from a guest, this functionality has been temporarily disabled.
Story Points: ---
Clone Of:
: 626958 (view as bug list) Environment:
Last Closed: 2010-11-10 21:27:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580953, 580954, 626958    

Description lihuang 2010-08-13 05:40:09 UTC
Description of problem:
click the PAUSE button on virt-manager. then virt-manager hung.

Checked the libvirt log.It was waiting for the return of qmp command "query-ballon". so can reproduce the problem from commandline directly.

(qemu) info status 
VM status: running
(qemu) 
(qemu) info balloon 
balloon: actual=4096,mem_swapped_in=0,minor_page_faults=463975,mem_swapped_out=0,free_mem=3949793280,major_page_faults=390,total_mem=4117966848
(qemu) stop
(qemu) info balloon 

^^^ no response  



Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.109.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1. 
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Dor Laor 2010-08-16 12:46:01 UTC
Can we get the full cmdline of qemu?
Does it happen with a specific guest or with all?

Comment 2 lihuang 2010-08-16 19:04:45 UTC
(In reply to comment #1)
> Can we get the full cmdline of qemu?
> Does it happen with a specific guest or with all?

this happen with rhel6 guest. rhel5 guest is PASS.

qemu     13738     1 75 14:26 ?        00:00:15 /usr/libexec/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -name s0811.2.86 -uuid fa03bb40-0892-507b-8ed3-c1eb3333d7df -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/s0811.2.86.monitor,server,nowait -mon chardev=monitor,mode=control -rtc base=utc -boot c -drive file=/home/images/s0811.2.86.img,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a3:3f:b7,bus=pci.0,addr=0x3 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

Comment 4 Daniel Berrangé 2010-08-17 15:04:37 UTC
Yeah I think we should try to fix this. libvirt could avoid issuing 'info balloon' if it knows the VM is paused, but that has a built-in race condition if QEMU pauses the guest on I/O error or watchdog. Only QEMU can avoid the race - it should simply skip the extended statistics update if the CPUs are paused, and return the most recent previously obtained info instead.

Comment 5 Dor Laor 2010-08-18 10:16:24 UTC
Not sure it will be simple since we'll the to async kill the on going balloon command. I agree we do need it.

Comment 7 Daniel Berrangé 2010-08-24 09:10:07 UTC
In bug 626544 there is a similar problem to this one, except that it occurs when the guest OS itself is crashed, rather than paused by QEMU. Upstream Anthony suggested that we need some kind of timeout instead of waiting forever for the guest to respond.

Comment 11 Bill Burns 2010-08-24 18:17:55 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Due to issues caused when query-balloon hangs when unable to get the stat from a guest this functionality has been temporarily disabled.

Comment 13 Bill Burns 2010-08-25 13:25:15 UTC
*** Bug 626544 has been marked as a duplicate of this bug. ***

Comment 15 Shirley Zhou 2010-08-26 04:55:48 UTC
Reproduce this bug in qemu-kvm-0.12.1.2-2.112.el6:
(qemu) info balloon 
balloon: actual=4096,mem_swapped_in=1396,minor_page_faults=4398,mem_swapped_out=0,free_mem=3649048576,major_page_faults=238,total_mem=4293603328
(qemu) stop
handle_dev_input: stop
(qemu) info balloon 
then,No response.

Verify this bug qemu-kvm-0.12.1.2-2.113.el6
(qemu) info balloon 
balloon: actual=4096
(qemu) stop
handle_dev_input: stop
(qemu) info balloon 
balloon: actual=4096

For this bug, it has been fixed.

While for windows guest,when we want to shrink memory, it does not work at all like:
(qemu) info balloon 
balloon: actual=4096
(qemu) balloon 2048 // do evict memory
(qemu) info balloon 
balloon: actual=4092
(qemu) info balloon 
balloon: actual=4092
(qemu) info balloon 
balloon: actual=4092
(qemu) info balloon 
balloon: actual=4092
(qemu) info balloon 
balloon: actual=4092 

From https://bugzilla.redhat.com/show_bug.cgi?id=610787#c13, we know memory reduce 2M after each query balloon info, that make evict memory work.

Dor,is it ok to fix above issue through tracking this bug? or should I report new bug?

Comment 16 Amit Shah 2010-08-26 05:16:58 UTC
Please report a new bug for the new issue. Please also mention if it works fine with Linux guests.

Comment 17 Amit Shah 2010-08-26 05:37:31 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Due to issues caused when query-balloon hangs when unable to get the stat from a guest this functionality has been temporarily disabled.+Due to issues caused when query-balloon hangs when it's unable to get the stats from a guest, this functionality has been temporarily disabled.

Comment 18 Shirley Zhou 2010-08-26 07:35:18 UTC
According to comment 15 and comment 16, this bug has been fixed, will change status to verified.

For RHEL guest, virtio ballooning can do evict and enlarge memory.
For windows guest, ballooning does not work totally, we already have bug 610787 report balloon issues, and will clarify and keep tracking this new windows issue on bug 610787.

Comment 19 Bill Burns 2010-08-31 13:49:49 UTC
*** Bug 628020 has been marked as a duplicate of this bug. ***

Comment 20 Issue Tracker 2010-09-08 17:32:15 UTC
Event posted on 09-08-2010 08:22am EDT by Glen Johnson

------- Comment From santwana.samantray.com 2010-09-08 08:19
EDT-------
Hello Redhat,

I verified this issue in RHEL6 Snap13 release(k.v-2.6.32-70.el6.x86_64),
and this seems to be fixed. Suspend/resume of the KVM guests is happening
fine using virsh. We can close this issue now.

Thanks,
Santwana


This event sent from IssueTracker by jkachuck 
 issue 1283623

Comment 22 zhanghaiyan 2010-09-10 08:43:01 UTC
*** Bug 631637 has been marked as a duplicate of this bug. ***

Comment 24 releng-rhel@redhat.com 2010-11-10 21:27:07 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Comment 25 Amit Shah 2011-02-02 12:59:56 UTC
*** Bug 616389 has been marked as a duplicate of this bug. ***