Hide Forgot
Clone for RHEL 7.3 component qemu-kvm-rhev. +++ This bug was initially created as a clone of Bug #1371943 +++ Description of problem:RHSA-2016-1756 breaks migration of instances. Openstack instances which migrate to a new host are shut down. The error 'Virtqueue size exceeded' appears in /var/log/libvirt/qemu/instance-name". Other reports about this bug. https://bugzilla.redhat.com/show_bug.cgi?id=1358359 https://www.redhat.com/archives/libvir-list/2016-August/msg00406.html https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg02666.html Version-Release number of selected component (if applicable): openstack-nova-compute-12.0.4-4.el7ost.noarch qemu-img-rhev-2.3.0-31.el7_2.21.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64 How reproducible: 100% Steps to Reproduce: 1. apply patch mentioned above 2. start instance migration 3. notice failure Actual results: instance migration fails with Virtqueue size exceeded' in logs Expected results: instance migration succeeds Additional info: As mentioned in the email thread above, this works with the cirros image but fails with a centos or ubuntu image. --- Additional comment from Moshe Levi on 2016-08-31 10:44:08 EDT --- It seem that it working with qemu2.6 but when back-porting to older version it break things. Ubunut already revert the patch in 14.04 and 16.04 see https://www.redhat.com/archives/libvir-list/2016-August/msg01287.html and https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089. --- Additional comment from Sahid Ferdjaoui on 2016-09-02 09:08:27 EDT --- It's not totally clear for me if that issue is coming only when statistics are enabled for the balloon device. According to [1] that seems to be the case. A possible workaround would be to ask Nova to do not enable that feature. For libvirt driver the config option 'mem_stats_period_seconds' can be set to 0. mem_stats_period_seconds = 0 This issue is mostly related to the version of QEMU we are shipping for RHEL7 [2], We probably have to report a regression for that component since at this step of our understanding of the bug, the compute team can't really fix it. [1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1358359
Here's a brew build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11696939 backporting two patches: virtio: decrement vq->inuse in virtqueue_discard() virtio: recalculate vq->inuse after migration from 2.7. can we get a confirmation on whether this fixes the issues?
I have posted a backport for RHEL 7.2.z similar to Michael's: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11706790 Additional test scenarios: 1. virtio-balloon stats virtqueue test $ qemu-img create -f qcow2 -b test.img test.qcow2 $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=qcow2,file=test.qcow2 -device virtio-balloon-pci,id=virtio-balloon0 -S (qemu) qom-set virtio-balloon guest-stats-polling-interval 5 (qemu) c ...let it boot and log in on the console... (qemu) savevm (qemu) quit $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=qcow2,file=test.qcow2 -device virtio-balloon-pci,id=virtio-balloon0 -S (qemu) qom-set virtio-balloon guest-stats-polling-interval 5 (qemu) loadvm 1 (qemu) c $ rm test.qcow2 Expected behavior: Guest state is loaded and resumes successfully. Actual behavior: "Virtqueue size exceeded" error from QEMU and the guest is terminated after the 'c' monitor command is issued. 2. virtio-blk s->rq test $ sudo qemu-img create -f qcow2 /dev/testvg/testlv 10G shell1$ sudo qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=raw,file=rhel72.img -drive if=virtio,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop guest# dd if=/dev/zero of=/dev/vdb oflag=direct bs=4k shell2$ sudo qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=raw,file=rhel72.img -drive if=virtio,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop -incoming tcp::1234 (qemu1) migrate tcp:127.0.0.1:1234 $ sudo lvresize -L +4M /dev/testvg/testlv (qemu2) c Expected behavior: Guest resumes successfully after 'c' monitor command is issued on destination QEMU. Actual behavior: "Virtqueue size exceeded" error from destination QEMU and guest is terminated after the 'c' monitor command is issued.
Fix included in qemu-kvm-rhev-2.6.0-25.el7
Thanks, Stefan. Reproduce this bug using the following version: kernel-3.10.0-505.el7.x86_64 qemu-kvm-rhev-2.6.0-24.el7.x86_64 Reproduce steps: 1. create a 4M lv # pvcreate /dev/sdg # vgcreate testvg /dev/sdg # lvcreate -L 4M -T testvg/testlv # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert home rhel_hp-dl380pg8-09 -wi-ao---- 212.61g root rhel_hp-dl380pg8-09 -wi-ao---- 50.00g swap rhel_hp-dl380pg8-09 -wi-ao---- 15.75g testlv testvg twi-a-tz-- 4.00m 0.00 0.88 2. create a data disk image based on the above lv # qemu-img create -f qcow2 /dev/testvg/testlv 10G 3. boot a rhel7.3 guest with the above data disk image # /usr/libexec/qemu-kvm \ -S \ -name 'rhel7.3' \ -machine pc-i440fx-rhel7.3.0 \ -m 4096 \ -smp 4,maxcpus=4,sockets=1,cores=4,threads=1 \ -cpu SandyBridge \ -rtc base=localtime,clock=host,driftfix=slew \ -nodefaults \ -boot menu=on \ -enable-kvm \ -monitor stdio \ -spice port=5900,disable-ticketing \ -drive file=/home/rhel7.3.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \ -device scsi-hd,drive=drive_sysdisk,bus=scsi_pci_bus0.0,id=device_sysdisk,bootindex=1 \ -drive if=none,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop,id=drive-virtio-disk0 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0 \ 4. on the same host, use the same command line with "-incoming tcp:0:5800", boot the rhel7.3 guest 5. inside guest # dd if=/dev/zero of=/dev/vdb oflag=direct bs=4k 6. after guest is paused with io-error, do migration (qemu) info status VM status: paused (io-error) (qemu) migrate -d tcp:0:5800 7. on host, grow the logical volume by 4 MB # lvresize -L +4M /dev/testvg/testlv 8. in destination, resume the guest (qemu)c after step8, "Virtqueue size exceeded" error from destination QEMU and qemu-kvm quits. Verify this bug using the following version: kernel-3.10.0-505.el7.x86_64 qemu-kvm-rhev-2.6.0-25.el7.x86_64 Do the above test, after step 8, destination qemu-kvm did not quit and guest can resume normally.
Based on comment #11, set this bug to be verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html