Red Hat Bugzilla – Bug 1140997
guest is stuck when setting balloon memory with large guest-stats-polling-interval
Last modified: 2015-03-05 04:55:20 EST
Description of problem: guest is stuck when setting balloon memory with large guest-stats-polling-interval Version-Release number of selected component (if applicable): libvirt-1.2.8-1.el7.x86_64 qemu-kvm-rhev-2.1.0-3.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. set value to '21474836' # virsh edit r7a ... <memballoon model='virtio'> <stats period='21474836'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> ... 2. start it # virsh start r7a [hung] # ps -ef | grep qemu qemu 28339 1 99 16:04 ? 00:00:18 /usr/libexec/qemu-kvm -name r7a -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid c4fa19e8-e8c9-49ab-b6bf-0427ed4e750e -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r7a.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2014-09-12T08:04:13 -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/r7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on in qemu event: 6102.143 > 0x7fd9e8008510 {"execute":"qom-list","arguments":{"path":"//machine/i440fx/pci.0/child[9]"},"id":"libvirt-87"} 6102.144 < 0x7fd9e8008510 {"return": [{"name": "virtio-pci[0]", "type": "child<qemu:memory-region>"}, {"name": "virtio-bus", "type": "child<virtio-pci-bus>"}, {"name": "bus master[0]", "type": "child<qemu:memory-region>"}, {"name": "guest-stats-polling-interval", "type": "int"}, {"name": "guest-stats", "type": "guest statistics"}, {"name": "virtio-backend", "type": "child<virtio-balloon-device>"}, {"name": "parent_bus", "type": "link<bus>"}, {"name": "command_serr_enable", "type": "bool"}, {"name": "multifunction", 6102.144 > 0x7fd9e8008510 {"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[9]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"} 6102.145 < 0x7fd9e8008510 {"return": {}, "id": "libvirt-88"} 6102.146 > 0x7fd9e8008510 {"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"}
You meant that virsh got stuck, right? I quickly tested the balloon stats feature in qemu-kvm-rhev and it seems to be working (although I did not set the delay to the value you set, of course). It seems to me that libvirt is stuck waiting for the event that will never come. Reassining.
Actually, I also hit this problem just using qemu-kvm-rhev, the importance is set guest-stats-polling-interval to '21474836'. start guest with virtio-balloon: # /usr/libexec/qemu-kvm -name r7a -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid c4fa19e8-e8c9-49ab-b6bf-0427ed4e750e -no-user-config -nodefaults -qmp tcp:0:5555,server,nowait -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r7a.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2014-09-15T03:33:57 -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/r7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on execute some QMP cmds: # telnet 127.0.0.1 5555 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. {"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 2}, "package": " (qemu-kvm-rhev-2.1.0-3.el7)"}, "capabilities": []}} {"execute":"qmp_capabilities"} {"return": {}} {"execute":"qom-list","arguments":{"path":"//machine/i440fx/pci.0/child[9]"},"id":"libvirt-87"} {"return": [{"name": "virtio-pci[0]", "type": "child<qemu:memory-region>"}, {"name": "virtio-bus", "type": "child<virtio-pci-bus>"}, {"name": "bus master[0]", "type": "child<qemu:memory-region>"}, {"name": "guest-stats-polling-interval", "type": "int"}, {"name": "guest-stats", "type": "guest statistics"}, {"name": "virtio-backend", "type": "child<virtio-balloon-device>"}, {"name": "parent_bus", "type": "link<bus>"}, {"name": "command_serr_enable", "type": "bool"}, {"name": "multifunction", "type": "bool"}, {"name": "rombar", "type": "uint32"}, {"name": "romfile", "type": "str"}, {"name": "addr", "type": "int32"}, {"name": "legacy-addr", "type": "str"}, {"name": "event_idx", "type": "bool"}, {"name": "indirect_desc", "type": "bool"}, {"name": "class", "type": "uint32"}, {"name": "hotplugged", "type": "bool"}, {"name": "hotpluggable", "type": "bool"}, {"name": "realized", "type": "bool"}, {"name": "type", "type": "string"}], "id": "libvirt-87"} {"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[9]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"} {"return": {}, "id": "libvirt-88"} {"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"} Then this 'balloon' doesn't return. So I think this bug belongs to qemu-kvm-rhev, could you check it again?
You're completely right. I'm able to reproduce now. It's not libvirt related, moving it back to me and qemu-kvm-rhev. Will investigate it shortly.
Hi Luiz, I have tested your build in libvirt, and no blocking happened. The QMP event is: 2.634 > 0x7fba8800a930 {"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[11]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"} 2.635 < 0x7fba8800a930 {"return": {}, "id": "libvirt-88"} 2.635 > 0x7fba8800a930 {"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"} 2.636 < 0x7fba8800a930 {"return": {}, "id": "libvirt-89"} 2.636 > 0x7fba8800a930 {"execute":"cont","id":"libvirt-90"} 2.639 ! 0x7fba8800a930 {"timestamp": {"seconds": 1410837372, "microseconds": 846501}, "event": "RESUME"} 2.639 < 0x7fba8800a930 {"return": {}, "id": "libvirt-90"}
Thanks. This is an integer overflow in the virtio-balloon driver, I've posted the fix upstream and will backport it as soon as it's merged.
Fix included in qemu-kvm-rhev-2.1.2-2.el7
Reproduce: Version of components: qemu-kvm-1.5.3-68.el7.x86_64 Steps: 1, boot guest with -S and "-device virtio-balloon-pci,id=balloon0,bus=pci.0", and leave it as stop status. # /usr/libexec/qemu-kvm -m 1G -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -smp 2 -monitor stdio -spice port=5931,disable-ticketing -qmp tcp::8888,server,nowait -boot menu=on -drive file=/home/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=drive-virtio-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device ide-drive,bus=ide.0,unit=0,wwn=0x5000c50015ea71bb,drive=drive-virtio-disk,id=virtio-disk,bootindex=1 -netdev tap,id=tap0,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=tap0,id=net0,mac=24:be:05:14:11:11,mq=on -drive file=/home/kernel-3.10.0-184.el7.iso,if=none,id=hd,format=raw,media=cdrom,readonly=on,cache=none,werror=stop,rerror=stop -device ide-drive,bus=ide.0,unit=1,wwn=0x5000c50015ea71ad,drive=hd,id=cdrom \ -device virtio-balloon-pci,id=balloon0,bus=pci.0 -S 2, run following command inside QMP. $ telnet 10.66.82.225 8888 Trying 10.66.82.225... Connected to 10.66.82.225. Escape character is '^]'. {"QMP": {"version": {"qemu": {"micro": 3, "minor": 5, "major": 1}, "package": " (qemu-kvm-1.5.3-68.el7)"}, "capabilities": []}} {"execute":"qmp_capabilities"} {"return": {}} {"execute":"qom-list","arguments":{"path":"//machine/i440fx/pci.0/child[6]"},"id":"libvirt-87"} {"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[6]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"} {"return": {}, "id": "libvirt-88"} {"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"} Results: After step 2, qemu-kvm hang. As above show, this bz has been reproduce. =================== Verify: Version of components: qemu-kvm-1.5.3-75.el7.x86_64 Steps as above show, after step 2, {"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[6]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"} {"return": {}, "id": "libvirt-88"} {"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"} {"return": {}, "id": "libvirt-89"} qemu-kvm works well. So this bz has been verified. =================== Verify on qemu-kvm-rhev-2.1.2-3.el7.x86_64, steps as above show, after step2, qemu-kvm works well. QMP works well, give info as followings: # telnet 10.66.8.240 8888 Trying 10.66.8.240... Connected to 10.66.8.240. Escape character is '^]'. {"QMP": {"version": {"qemu": {"micro": 2, "minor": 1, "major": 2}, "package": " (qemu-kvm-rhev-2.1.2-3.el7)"}, "capabilities": []}} {"execute":"qmp_capabilities"} {"return": {}} {"execute":"qom-set","arguments":{"path":"//machine/i440fx/pci.0/child[6]","property":"guest-stats-polling-interval","value":21474836},"id":"libvirt-88"} {"return": {}, "id": "libvirt-88"} {"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-89"} {"return": {}, "id": "libvirt-89"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0624.html