Bug 1140997

Summary: guest is stuck when setting balloon memory with large guest-stats-polling-interval
Product: Red Hat Enterprise Linux 7 Reporter: Jincheng Miao <jmiao>
Component: qemu-kvm-rhevAssignee: Luiz Capitulino <lcapitulino>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: dyuan, hhuang, huding, jiahu, jmiao, juzhang, lcapitulino, mrezanin, mzhan, rbalakri, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.1.2-2.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1142290 (view as bug list) Environment:
Last Closed: 2015-03-05 09:55:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1142290    

Description Jincheng Miao 2014-09-12 08:06:15 UTC
Description of problem:
guest is stuck when setting balloon memory with large guest-stats-polling-interval

Version-Release number of selected component (if applicable):
libvirt-1.2.8-1.el7.x86_64
qemu-kvm-rhev-2.1.0-3.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. set value to '21474836'

# virsh edit r7a
...
    <memballoon model='virtio'>
      <stats period='21474836'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </memballoon>
...

2. start it
# virsh start r7a
[hung]

# ps -ef | grep qemu
qemu     28339     1 99 16:04 ?        00:00:18 /usr/libexec/qemu-kvm -name r7a -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid c4fa19e8-e8c9-49ab-b6bf-0427ed4e750e -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r7a.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2014-09-12T08:04:13 -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/r7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on


in qemu event:
6102.143 > 0x7fd9e8008510 {"execute":"qom-list","arguments":{"path":"//machine/i440fx/pci.0/child[9]"},"id":"libvirt-87"}
6102.144 < 0x7fd9e8008510 {"return": [{"name": "virtio-pci[0]", "type": "child<qemu:memory-region>"}, {"name": "virtio-bus", "type": "child<virtio-pci-bus>"}, {"name": "bus master[0]", "type": "child<qemu:memory-region>"}, {"name": "guest-stats-polling-interval", "type": "int"}, {"name": "guest-stats", "type": "guest statistics"}, {"name": "virtio-backend", "type": "child<virtio-balloon-device>"}, {"name": "parent_bus", "type": "link<bus>"}, {"name": "command_serr_enable", "type": "bool"}, {"name": "multifunction",
6102.144 > 0x7fd9e8008510 {"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[9]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"}
6102.145 < 0x7fd9e8008510 {"return": {}, "id": "libvirt-88"}
6102.146 > 0x7fd9e8008510 {"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"}

Comment 2 Luiz Capitulino 2014-09-12 20:46:12 UTC
You meant that virsh got stuck, right?

I quickly tested the balloon stats feature in qemu-kvm-rhev and it seems to be working (although I did not set the delay to the value you set, of course). It seems to me that libvirt is stuck waiting for the event that will never come. Reassining.

Comment 3 Jincheng Miao 2014-09-15 03:41:46 UTC
Actually, I also hit this problem just using qemu-kvm-rhev,
the importance is set guest-stats-polling-interval to '21474836'.

start guest with virtio-balloon:
# /usr/libexec/qemu-kvm -name r7a -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid c4fa19e8-e8c9-49ab-b6bf-0427ed4e750e -no-user-config -nodefaults -qmp tcp:0:5555,server,nowait -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r7a.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2014-09-15T03:33:57 -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/r7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

execute some QMP cmds:
# telnet 127.0.0.1 5555
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 2}, "package": " (qemu-kvm-rhev-2.1.0-3.el7)"}, "capabilities": []}}

{"execute":"qmp_capabilities"}
{"return": {}}

{"execute":"qom-list","arguments":{"path":"//machine/i440fx/pci.0/child[9]"},"id":"libvirt-87"}
{"return": [{"name": "virtio-pci[0]", "type": "child<qemu:memory-region>"}, {"name": "virtio-bus", "type": "child<virtio-pci-bus>"}, {"name": "bus master[0]", "type": "child<qemu:memory-region>"}, {"name": "guest-stats-polling-interval", "type": "int"}, {"name": "guest-stats", "type": "guest statistics"}, {"name": "virtio-backend", "type": "child<virtio-balloon-device>"}, {"name": "parent_bus", "type": "link<bus>"}, {"name": "command_serr_enable", "type": "bool"}, {"name": "multifunction", "type": "bool"}, {"name": "rombar", "type": "uint32"}, {"name": "romfile", "type": "str"}, {"name": "addr", "type": "int32"}, {"name": "legacy-addr", "type": "str"}, {"name": "event_idx", "type": "bool"}, {"name": "indirect_desc", "type": "bool"}, {"name": "class", "type": "uint32"}, {"name": "hotplugged", "type": "bool"}, {"name": "hotpluggable", "type": "bool"}, {"name": "realized", "type": "bool"}, {"name": "type", "type": "string"}], "id": "libvirt-87"}

{"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[9]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"}
{"return": {}, "id": "libvirt-88"}

{"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"}

Then this 'balloon' doesn't return.

So I think this bug belongs to qemu-kvm-rhev, could you check it again?

Comment 4 Luiz Capitulino 2014-09-15 14:42:19 UTC
You're completely right. I'm able to reproduce now. It's not libvirt related, moving it back to me and qemu-kvm-rhev.

Will investigate it shortly.

Comment 6 Jincheng Miao 2014-09-16 03:18:50 UTC
Hi Luiz,

I have tested your build in libvirt, and no blocking happened.

The QMP event is:
  2.634 > 0x7fba8800a930 {"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[11]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"}
  2.635 < 0x7fba8800a930 {"return": {}, "id": "libvirt-88"}
  2.635 > 0x7fba8800a930 {"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"}
  2.636 < 0x7fba8800a930 {"return": {}, "id": "libvirt-89"}
  2.636 > 0x7fba8800a930 {"execute":"cont","id":"libvirt-90"}
  2.639 ! 0x7fba8800a930 {"timestamp": {"seconds": 1410837372, "microseconds": 846501}, "event": "RESUME"}
  2.639 < 0x7fba8800a930 {"return": {}, "id": "libvirt-90"}

Comment 7 Luiz Capitulino 2014-09-16 13:52:12 UTC
Thanks. This is an integer overflow in the virtio-balloon driver, I've posted the fix upstream and will backport it as soon as it's merged.

Comment 9 Miroslav Rezanina 2014-10-10 07:34:08 UTC
Fix included in qemu-kvm-rhev-2.1.2-2.el7

Comment 11 juzhang 2014-10-27 04:23:43 UTC
Reproduce:
Version of components:
qemu-kvm-1.5.3-68.el7.x86_64

Steps:
1, boot guest with -S and "-device virtio-balloon-pci,id=balloon0,bus=pci.0", and leave it as stop status.
# /usr/libexec/qemu-kvm -m 1G -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -smp 2 -monitor stdio -spice port=5931,disable-ticketing -qmp tcp::8888,server,nowait -boot menu=on -drive file=/home/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=drive-virtio-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device ide-drive,bus=ide.0,unit=0,wwn=0x5000c50015ea71bb,drive=drive-virtio-disk,id=virtio-disk,bootindex=1 -netdev tap,id=tap0,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=tap0,id=net0,mac=24:be:05:14:11:11,mq=on -drive file=/home/kernel-3.10.0-184.el7.iso,if=none,id=hd,format=raw,media=cdrom,readonly=on,cache=none,werror=stop,rerror=stop -device ide-drive,bus=ide.0,unit=1,wwn=0x5000c50015ea71ad,drive=hd,id=cdrom \
-device virtio-balloon-pci,id=balloon0,bus=pci.0 -S

2, run following command inside QMP.
$ telnet 10.66.82.225 8888
Trying 10.66.82.225...
Connected to 10.66.82.225.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 3, "minor": 5, "major": 1}, "package": " (qemu-kvm-1.5.3-68.el7)"}, "capabilities": []}}
{"execute":"qmp_capabilities"}
{"return": {}}

{"execute":"qom-list","arguments":{"path":"//machine/i440fx/pci.0/child[6]"},"id":"libvirt-87"}

{"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[6]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"}
{"return": {}, "id": "libvirt-88"}

{"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"}

Results:
After step 2, qemu-kvm hang.

As above show, this bz has been reproduce.
===================
Verify:
Version of components:
qemu-kvm-1.5.3-75.el7.x86_64

Steps as above show, after step 2, 
{"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[6]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"}
{"return": {}, "id": "libvirt-88"}

{"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"}
{"return": {}, "id": "libvirt-89"}

qemu-kvm works well. So this bz has been verified.
===================
Verify on qemu-kvm-rhev-2.1.2-3.el7.x86_64, steps as above show, after step2, qemu-kvm works well. QMP works well, give info as followings:
# telnet 10.66.8.240 8888
Trying 10.66.8.240...
Connected to 10.66.8.240.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 2, "minor": 1, "major": 2}, "package": " (qemu-kvm-rhev-2.1.2-3.el7)"}, "capabilities": []}}
{"execute":"qmp_capabilities"} 
{"return": {}}

{"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[6]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"}
{"return": {}, "id": "libvirt-88"}
{"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"}
{"return": {}, "id": "libvirt-89"}

Comment 14 errata-xmlrpc 2015-03-05 09:55:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html