Bug 1371943

Summary: RHSA-2016-1756 breaks migration of instances
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: qemu-kvm-rhevAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DUPLICATE QA Contact: Shai Revivo <srevivo>
Severity: high Docs Contact:
Priority: high    
Version: 8.0 (Liberty)CC: amedeo.salvati, aperotti, berrange, blake.c.anderson, c.hendrickson09, dasmith, dh3, eglynn, furlongm, jherrman, kamfonik, kchamart, knoel, mburns, moshele, mst, rcernin, sbauza, sferdjao, sgordon, srevivo, stefanha, vaggarwa, virt-maint, vromanso
Target Milestone: ---Keywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.6.0-25.el7 Doc Type: Bug Fix
Doc Text:
The fix for CVE-2016-5403 caused migrating guest instances to fail with a "Virtqueue size exceeded" error message. With this update, the value of the virtualization queue is recalculated after the migration, and the described problem no longer occurs.
Story Points: ---
Clone Of:
: 1372763 1374364 1374365 1374366 1374367 1374368 1374369 (view as bug list) Environment:
Last Closed: 2016-10-13 13:42:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1372763, 1376542    
Bug Blocks: 1374364, 1374365, 1374366, 1374367, 1374368, 1374369    

Description Jeremy 2016-08-31 13:38:48 UTC
Description of problem:RHSA-2016-1756 breaks migration of instances. 
Openstack instances which migrate to a new host are shut down.  The error 'Virtqueue size exceeded' appears in /var/log/libvirt/qemu/instance-name".

Other reports about this bug. 
https://bugzilla.redhat.com/show_bug.cgi?id=1358359
 https://www.redhat.com/archives/libvir-list/2016-August/msg00406.html
https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg02666.html

Version-Release number of selected component (if applicable):
openstack-nova-compute-12.0.4-4.el7ost.noarch 
qemu-img-rhev-2.3.0-31.el7_2.21.x86_64 
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64 


How reproducible:
100%

Steps to Reproduce:
1. apply patch mentioned above
2. start instance migration
3. notice failure

Actual results:
instance migration fails with Virtqueue size exceeded' in logs

Expected results:
instance migration succeeds 

Additional info:
As mentioned in the email thread above, this works with the cirros image but fails with a centos or ubuntu image.

Comment 3 Moshe Levi 2016-08-31 14:44:08 UTC
It seem that it working with qemu2.6 but when back-porting to older version it break things. 
Ubunut already revert the patch in 14.04 and 16.04 
see 
https://www.redhat.com/archives/libvir-list/2016-August/msg01287.html 
and https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089.

Comment 5 Sahid Ferdjaoui 2016-09-02 13:08:27 UTC
It's not totally clear for me if that issue is coming only when statistics are enabled for the balloon device. According to [1] that seems to be the case. A possible workaround would be to ask Nova to do not enable that feature. For libvirt driver the config option 'mem_stats_period_seconds' can be set to 0.

  mem_stats_period_seconds = 0

This issue is mostly related to the version of QEMU we are shipping for RHEL7 [2], We probably have to report a regression for that component since at this step of our understanding of the bug, the compute team can't really fix it.

[1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1358359

Comment 8 Sahid Ferdjaoui 2016-09-05 12:10:07 UTC
I reproduced that issue on RHOS7. The issue is even more critical since the guest totally disappears and no way to retrieve it from Nova.

I can also confirm that the workaround for Nova is working, just disable the report of statistics for the memory balloon device.

  mem_stats_period_seconds = 0

Comment 9 Marcus Furlong 2016-09-06 06:48:18 UTC
This is also a regression on Mitaka.

Comment 10 Sahid Ferdjaoui 2016-09-07 08:27:55 UTC
I can confirm that the packages provided in bug 1372763#3 are fixing the issue in Nova for live migration (Tested with RHOS7).

Comment 19 Mike Burns 2016-10-13 13:42:34 UTC

*** This bug has been marked as a duplicate of bug 1374364 ***

Comment 20 Mike Burns 2016-10-13 13:45:20 UTC
nevermind, verification should be on bug 1374364

Comment 22 awaugama 2017-09-07 19:04:33 UTC
Dup -- QE will decide about automating the original