Description of problem:RHSA-2016-1756 breaks migration of instances. Openstack instances which migrate to a new host are shut down. The error 'Virtqueue size exceeded' appears in /var/log/libvirt/qemu/instance-name". Other reports about this bug. https://bugzilla.redhat.com/show_bug.cgi?id=1358359 https://www.redhat.com/archives/libvir-list/2016-August/msg00406.html https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg02666.html Version-Release number of selected component (if applicable): openstack-nova-compute-12.0.4-4.el7ost.noarch qemu-img-rhev-2.3.0-31.el7_2.21.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64 How reproducible: 100% Steps to Reproduce: 1. apply patch mentioned above 2. start instance migration 3. notice failure Actual results: instance migration fails with Virtqueue size exceeded' in logs Expected results: instance migration succeeds Additional info: As mentioned in the email thread above, this works with the cirros image but fails with a centos or ubuntu image.
It seem that it working with qemu2.6 but when back-porting to older version it break things. Ubunut already revert the patch in 14.04 and 16.04 see https://www.redhat.com/archives/libvir-list/2016-August/msg01287.html and https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089.
It's not totally clear for me if that issue is coming only when statistics are enabled for the balloon device. According to [1] that seems to be the case. A possible workaround would be to ask Nova to do not enable that feature. For libvirt driver the config option 'mem_stats_period_seconds' can be set to 0. mem_stats_period_seconds = 0 This issue is mostly related to the version of QEMU we are shipping for RHEL7 [2], We probably have to report a regression for that component since at this step of our understanding of the bug, the compute team can't really fix it. [1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1358359
I reproduced that issue on RHOS7. The issue is even more critical since the guest totally disappears and no way to retrieve it from Nova. I can also confirm that the workaround for Nova is working, just disable the report of statistics for the memory balloon device. mem_stats_period_seconds = 0
This is also a regression on Mitaka.
I can confirm that the packages provided in bug 1372763#3 are fixing the issue in Nova for live migration (Tested with RHOS7).
*** This bug has been marked as a duplicate of bug 1374364 ***
nevermind, verification should be on bug 1374364
Dup -- QE will decide about automating the original