Bug 1371943 - RHSA-2016-1756 breaks migration of instances
Summary: RHSA-2016-1756 breaks migration of instances
Keywords:
Status: CLOSED DUPLICATE of bug 1374364
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 10.0 (Newton)
Assignee: Virtualization Maintenance
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On: 1372763 1376542
Blocks: 1374364 1374365 1374366 1374367 1374368 1374369
TreeView+ depends on / blocked
 
Reported: 2016-08-31 13:38 UTC by Jeremy
Modified: 2021-12-10 15:01 UTC (History)
25 users (show)

Fixed In Version: qemu-kvm-rhev-2.6.0-25.el7
Doc Type: Bug Fix
Doc Text:
The fix for CVE-2016-5403 caused migrating guest instances to fail with a "Virtqueue size exceeded" error message. With this update, the value of the virtualization queue is recalculated after the migration, and the described problem no longer occurs.
Clone Of:
: 1372763 1374364 1374365 1374366 1374367 1374368 1374369 (view as bug list)
Environment:
Last Closed: 2016-10-13 13:42:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-7815 0 None None None 2021-12-10 15:01:46 UTC
Red Hat Knowledge Base (Solution) 2598111 0 None None None 2016-09-01 16:57:20 UTC

Description Jeremy 2016-08-31 13:38:48 UTC
Description of problem:RHSA-2016-1756 breaks migration of instances. 
Openstack instances which migrate to a new host are shut down.  The error 'Virtqueue size exceeded' appears in /var/log/libvirt/qemu/instance-name".

Other reports about this bug. 
https://bugzilla.redhat.com/show_bug.cgi?id=1358359
 https://www.redhat.com/archives/libvir-list/2016-August/msg00406.html
https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg02666.html

Version-Release number of selected component (if applicable):
openstack-nova-compute-12.0.4-4.el7ost.noarch 
qemu-img-rhev-2.3.0-31.el7_2.21.x86_64 
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64 


How reproducible:
100%

Steps to Reproduce:
1. apply patch mentioned above
2. start instance migration
3. notice failure

Actual results:
instance migration fails with Virtqueue size exceeded' in logs

Expected results:
instance migration succeeds 

Additional info:
As mentioned in the email thread above, this works with the cirros image but fails with a centos or ubuntu image.

Comment 3 Moshe Levi 2016-08-31 14:44:08 UTC
It seem that it working with qemu2.6 but when back-porting to older version it break things. 
Ubunut already revert the patch in 14.04 and 16.04 
see 
https://www.redhat.com/archives/libvir-list/2016-August/msg01287.html 
and https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089.

Comment 5 Sahid Ferdjaoui 2016-09-02 13:08:27 UTC
It's not totally clear for me if that issue is coming only when statistics are enabled for the balloon device. According to [1] that seems to be the case. A possible workaround would be to ask Nova to do not enable that feature. For libvirt driver the config option 'mem_stats_period_seconds' can be set to 0.

  mem_stats_period_seconds = 0

This issue is mostly related to the version of QEMU we are shipping for RHEL7 [2], We probably have to report a regression for that component since at this step of our understanding of the bug, the compute team can't really fix it.

[1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1358359

Comment 8 Sahid Ferdjaoui 2016-09-05 12:10:07 UTC
I reproduced that issue on RHOS7. The issue is even more critical since the guest totally disappears and no way to retrieve it from Nova.

I can also confirm that the workaround for Nova is working, just disable the report of statistics for the memory balloon device.

  mem_stats_period_seconds = 0

Comment 9 Marcus Furlong 2016-09-06 06:48:18 UTC
This is also a regression on Mitaka.

Comment 10 Sahid Ferdjaoui 2016-09-07 08:27:55 UTC
I can confirm that the packages provided in bug 1372763#3 are fixing the issue in Nova for live migration (Tested with RHOS7).

Comment 19 Mike Burns 2016-10-13 13:42:34 UTC

*** This bug has been marked as a duplicate of bug 1374364 ***

Comment 20 Mike Burns 2016-10-13 13:45:20 UTC
nevermind, verification should be on bug 1374364

Comment 22 awaugama 2017-09-07 19:04:33 UTC
Dup -- QE will decide about automating the original


Note You need to log in before you can comment on or make changes to this bug.