RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1372763 - RHSA-2016-1756 breaks migration of instances
Summary: RHSA-2016-1756 breaks migration of instances
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: ---
Assignee: Stefan Hajnoczi
QA Contact: huiqingding
URL:
Whiteboard:
Depends On:
Blocks: 1371943 1374364 1374365 1374366 1374367 1374368 1374369 1374623 1376542
TreeView+ depends on / blocked
 
Reported: 2016-09-02 15:23 UTC by Karen Noel
Modified: 2021-08-30 11:39 UTC (History)
30 users (show)

Fixed In Version: qemu-kvm-rhev-2.6.0-25.el7
Doc Type: Bug Fix
Doc Text:
The fix for CVE-2016-5403 caused migrating guest instances to fail with a "Virtqueue size exceeded" error message. With this update, the value of the virtualization queue is recalculated after the migration, and the described problem no longer occurs.
Clone Of: 1371943
: 1374623 1376542 (view as bug list)
Environment:
Last Closed: 2016-11-07 21:33:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2598111 0 None None None 2016-09-02 15:23:39 UTC
Red Hat Product Errata RHBA-2016:2673 0 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2016-11-08 01:06:13 UTC

Description Karen Noel 2016-09-02 15:23:40 UTC
Clone for RHEL 7.3 component qemu-kvm-rhev.

+++ This bug was initially created as a clone of Bug #1371943 +++

Description of problem:RHSA-2016-1756 breaks migration of instances. 
Openstack instances which migrate to a new host are shut down.  The error 'Virtqueue size exceeded' appears in /var/log/libvirt/qemu/instance-name".

Other reports about this bug. 
https://bugzilla.redhat.com/show_bug.cgi?id=1358359
 https://www.redhat.com/archives/libvir-list/2016-August/msg00406.html
https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg02666.html

Version-Release number of selected component (if applicable):
openstack-nova-compute-12.0.4-4.el7ost.noarch 
qemu-img-rhev-2.3.0-31.el7_2.21.x86_64 
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64 


How reproducible:
100%

Steps to Reproduce:
1. apply patch mentioned above
2. start instance migration
3. notice failure

Actual results:
instance migration fails with Virtqueue size exceeded' in logs

Expected results:
instance migration succeeds 

Additional info:
As mentioned in the email thread above, this works with the cirros image but fails with a centos or ubuntu image.

--- Additional comment from Moshe Levi on 2016-08-31 10:44:08 EDT ---

It seem that it working with qemu2.6 but when back-porting to older version it break things. 
Ubunut already revert the patch in 14.04 and 16.04 
see 
https://www.redhat.com/archives/libvir-list/2016-August/msg01287.html 
and https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089.

--- Additional comment from Sahid Ferdjaoui on 2016-09-02 09:08:27 EDT ---

It's not totally clear for me if that issue is coming only when statistics are enabled for the balloon device. According to [1] that seems to be the case. A possible workaround would be to ask Nova to do not enable that feature. For libvirt driver the config option 'mem_stats_period_seconds' can be set to 0.

  mem_stats_period_seconds = 0

This issue is mostly related to the version of QEMU we are shipping for RHEL7 [2], We probably have to report a regression for that component since at this step of our understanding of the bug, the compute team can't really fix it.

[1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1358359

Comment 1 Michael S. Tsirkin 2016-09-02 16:03:43 UTC
Here's a brew build:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11696939

backporting two patches:
 virtio: decrement vq->inuse in virtqueue_discard()
 virtio: recalculate vq->inuse after migration
from 2.7.

can we get a confirmation on whether this fixes the issues?

Comment 3 Stefan Hajnoczi 2016-09-06 15:21:29 UTC
I have posted a backport for RHEL 7.2.z similar to Michael's:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11706790

Additional test scenarios:

1. virtio-balloon stats virtqueue test

$ qemu-img create -f qcow2 -b test.img test.qcow2
$ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=qcow2,file=test.qcow2 -device virtio-balloon-pci,id=virtio-balloon0 -S
(qemu) qom-set virtio-balloon guest-stats-polling-interval 5
(qemu) c
...let it boot and log in on the console...
(qemu) savevm
(qemu) quit
$ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=qcow2,file=test.qcow2 -device virtio-balloon-pci,id=virtio-balloon0 -S
(qemu) qom-set virtio-balloon guest-stats-polling-interval 5
(qemu) loadvm 1
(qemu) c
$ rm test.qcow2

Expected behavior:
Guest state is loaded and resumes successfully.

Actual behavior:
"Virtqueue size exceeded" error from QEMU and the guest is terminated after the
'c' monitor command is issued.


2. virtio-blk s->rq test

$ sudo qemu-img create -f qcow2 /dev/testvg/testlv 10G
shell1$ sudo qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=raw,file=rhel72.img -drive if=virtio,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop
guest# dd if=/dev/zero of=/dev/vdb oflag=direct bs=4k
shell2$ sudo qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=raw,file=rhel72.img -drive if=virtio,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop -incoming tcp::1234
(qemu1) migrate tcp:127.0.0.1:1234
$ sudo lvresize -L +4M /dev/testvg/testlv
(qemu2) c

Expected behavior:
Guest resumes successfully after 'c' monitor command is issued on destination
QEMU.

Actual behavior:
"Virtqueue size exceeded" error from destination QEMU and guest is terminated
after the 'c' monitor command is issued.

Comment 7 Miroslav Rezanina 2016-09-13 12:49:27 UTC
Fix included in qemu-kvm-rhev-2.6.0-25.el7

Comment 11 huiqingding 2016-09-18 05:00:57 UTC
Thanks, Stefan.

Reproduce this bug using the following version:
kernel-3.10.0-505.el7.x86_64
qemu-kvm-rhev-2.6.0-24.el7.x86_64

Reproduce steps:
1. create a 4M lv
# pvcreate /dev/sdg
# vgcreate testvg /dev/sdg
# lvcreate -L 4M -T testvg/testlv
# lvs
  LV     VG                  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home   rhel_hp-dl380pg8-09 -wi-ao---- 212.61g                                                    
  root   rhel_hp-dl380pg8-09 -wi-ao----  50.00g                                                    
  swap   rhel_hp-dl380pg8-09 -wi-ao----  15.75g                                                    
  testlv testvg              twi-a-tz--   4.00m             0.00   0.88  

2. create a data disk image based on the above lv
# qemu-img create -f qcow2 /dev/testvg/testlv 10G

3. boot a rhel7.3 guest with the above data disk image
# /usr/libexec/qemu-kvm \
 -S \
 -name 'rhel7.3' \
 -machine pc-i440fx-rhel7.3.0 \
 -m 4096 \
 -smp 4,maxcpus=4,sockets=1,cores=4,threads=1 \
 -cpu SandyBridge \
 -rtc base=localtime,clock=host,driftfix=slew \
 -nodefaults \
 -boot menu=on \
 -enable-kvm \
 -monitor stdio \
 -spice port=5900,disable-ticketing \
 -drive file=/home/rhel7.3.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device scsi-hd,drive=drive_sysdisk,bus=scsi_pci_bus0.0,id=device_sysdisk,bootindex=1 \
  -drive if=none,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop,id=drive-virtio-disk0 \
 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0 \

4. on the same host, use the same command line with "-incoming tcp:0:5800", boot the rhel7.3 guest

5. inside guest
# dd if=/dev/zero of=/dev/vdb oflag=direct bs=4k

6. after guest is paused with io-error, do migration
(qemu) info status
VM status: paused (io-error)
(qemu) migrate -d tcp:0:5800

7. on host, grow the logical volume by 4 MB
# lvresize -L +4M /dev/testvg/testlv

8. in destination, resume the guest
(qemu)c

after step8, "Virtqueue size exceeded" error from destination QEMU and qemu-kvm quits.

Verify this bug using the following version:
kernel-3.10.0-505.el7.x86_64
qemu-kvm-rhev-2.6.0-25.el7.x86_64

Do the above test, after step 8, destination qemu-kvm did not quit and guest can resume normally.

Comment 12 huiqingding 2016-09-18 05:02:17 UTC
Based on comment #11, set this bug to be verified.

Comment 14 errata-xmlrpc 2016-11-07 21:33:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html


Note You need to log in before you can comment on or make changes to this bug.