Bug 1372763 - RHSA-2016-1756 breaks migration of instances
Summary: RHSA-2016-1756 breaks migration of instances
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: ---
Assignee: Stefan Hajnoczi
QA Contact: huiqingding
URL:
Whiteboard:
Depends On:
Blocks: 1371943 1374364 1374365 1374366 1374367 1374368 1374369 1374623 1376542
TreeView+ depends on / blocked
 
Reported: 2016-09-02 15:23 UTC by Karen Noel
Modified: 2020-01-17 15:55 UTC (History)
30 users (show)

Fixed In Version: qemu-kvm-rhev-2.6.0-25.el7
Doc Type: Bug Fix
Doc Text:
The fix for CVE-2016-5403 caused migrating guest instances to fail with a "Virtqueue size exceeded" error message. With this update, the value of the virtualization queue is recalculated after the migration, and the described problem no longer occurs.
Clone Of: 1371943
: 1374623 1376542 (view as bug list)
Environment:
Last Closed: 2016-11-07 21:33:27 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2598111 None None None 2016-09-02 15:23:39 UTC
Red Hat Product Errata RHBA-2016:2673 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2016-11-08 01:06:13 UTC

Description Karen Noel 2016-09-02 15:23:40 UTC
Clone for RHEL 7.3 component qemu-kvm-rhev.

+++ This bug was initially created as a clone of Bug #1371943 +++

Description of problem:RHSA-2016-1756 breaks migration of instances. 
Openstack instances which migrate to a new host are shut down.  The error 'Virtqueue size exceeded' appears in /var/log/libvirt/qemu/instance-name".

Other reports about this bug. 
https://bugzilla.redhat.com/show_bug.cgi?id=1358359
 https://www.redhat.com/archives/libvir-list/2016-August/msg00406.html
https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg02666.html

Version-Release number of selected component (if applicable):
openstack-nova-compute-12.0.4-4.el7ost.noarch 
qemu-img-rhev-2.3.0-31.el7_2.21.x86_64 
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64 


How reproducible:
100%

Steps to Reproduce:
1. apply patch mentioned above
2. start instance migration
3. notice failure

Actual results:
instance migration fails with Virtqueue size exceeded' in logs

Expected results:
instance migration succeeds 

Additional info:
As mentioned in the email thread above, this works with the cirros image but fails with a centos or ubuntu image.

--- Additional comment from Moshe Levi on 2016-08-31 10:44:08 EDT ---

It seem that it working with qemu2.6 but when back-porting to older version it break things. 
Ubunut already revert the patch in 14.04 and 16.04 
see 
https://www.redhat.com/archives/libvir-list/2016-August/msg01287.html 
and https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089.

--- Additional comment from Sahid Ferdjaoui on 2016-09-02 09:08:27 EDT ---

It's not totally clear for me if that issue is coming only when statistics are enabled for the balloon device. According to [1] that seems to be the case. A possible workaround would be to ask Nova to do not enable that feature. For libvirt driver the config option 'mem_stats_period_seconds' can be set to 0.

  mem_stats_period_seconds = 0

This issue is mostly related to the version of QEMU we are shipping for RHEL7 [2], We probably have to report a regression for that component since at this step of our understanding of the bug, the compute team can't really fix it.

[1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1612089
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1358359

Comment 1 Michael S. Tsirkin 2016-09-02 16:03:43 UTC
Here's a brew build:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11696939

backporting two patches:
 virtio: decrement vq->inuse in virtqueue_discard()
 virtio: recalculate vq->inuse after migration
from 2.7.

can we get a confirmation on whether this fixes the issues?

Comment 3 Stefan Hajnoczi 2016-09-06 15:21:29 UTC
I have posted a backport for RHEL 7.2.z similar to Michael's:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11706790

Additional test scenarios:

1. virtio-balloon stats virtqueue test

$ qemu-img create -f qcow2 -b test.img test.qcow2
$ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=qcow2,file=test.qcow2 -device virtio-balloon-pci,id=virtio-balloon0 -S
(qemu) qom-set virtio-balloon guest-stats-polling-interval 5
(qemu) c
...let it boot and log in on the console...
(qemu) savevm
(qemu) quit
$ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=qcow2,file=test.qcow2 -device virtio-balloon-pci,id=virtio-balloon0 -S
(qemu) qom-set virtio-balloon guest-stats-polling-interval 5
(qemu) loadvm 1
(qemu) c
$ rm test.qcow2

Expected behavior:
Guest state is loaded and resumes successfully.

Actual behavior:
"Virtqueue size exceeded" error from QEMU and the guest is terminated after the
'c' monitor command is issued.


2. virtio-blk s->rq test

$ sudo qemu-img create -f qcow2 /dev/testvg/testlv 10G
shell1$ sudo qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=raw,file=rhel72.img -drive if=virtio,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop
guest# dd if=/dev/zero of=/dev/vdb oflag=direct bs=4k
shell2$ sudo qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=raw,file=rhel72.img -drive if=virtio,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop -incoming tcp::1234
(qemu1) migrate tcp:127.0.0.1:1234
$ sudo lvresize -L +4M /dev/testvg/testlv
(qemu2) c

Expected behavior:
Guest resumes successfully after 'c' monitor command is issued on destination
QEMU.

Actual behavior:
"Virtqueue size exceeded" error from destination QEMU and guest is terminated
after the 'c' monitor command is issued.

Comment 7 Miroslav Rezanina 2016-09-13 12:49:27 UTC
Fix included in qemu-kvm-rhev-2.6.0-25.el7

Comment 11 huiqingding 2016-09-18 05:00:57 UTC
Thanks, Stefan.

Reproduce this bug using the following version:
kernel-3.10.0-505.el7.x86_64
qemu-kvm-rhev-2.6.0-24.el7.x86_64

Reproduce steps:
1. create a 4M lv
# pvcreate /dev/sdg
# vgcreate testvg /dev/sdg
# lvcreate -L 4M -T testvg/testlv
# lvs
  LV     VG                  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home   rhel_hp-dl380pg8-09 -wi-ao---- 212.61g                                                    
  root   rhel_hp-dl380pg8-09 -wi-ao----  50.00g                                                    
  swap   rhel_hp-dl380pg8-09 -wi-ao----  15.75g                                                    
  testlv testvg              twi-a-tz--   4.00m             0.00   0.88  

2. create a data disk image based on the above lv
# qemu-img create -f qcow2 /dev/testvg/testlv 10G

3. boot a rhel7.3 guest with the above data disk image
# /usr/libexec/qemu-kvm \
 -S \
 -name 'rhel7.3' \
 -machine pc-i440fx-rhel7.3.0 \
 -m 4096 \
 -smp 4,maxcpus=4,sockets=1,cores=4,threads=1 \
 -cpu SandyBridge \
 -rtc base=localtime,clock=host,driftfix=slew \
 -nodefaults \
 -boot menu=on \
 -enable-kvm \
 -monitor stdio \
 -spice port=5900,disable-ticketing \
 -drive file=/home/rhel7.3.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device scsi-hd,drive=drive_sysdisk,bus=scsi_pci_bus0.0,id=device_sysdisk,bootindex=1 \
  -drive if=none,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop,id=drive-virtio-disk0 \
 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0 \

4. on the same host, use the same command line with "-incoming tcp:0:5800", boot the rhel7.3 guest

5. inside guest
# dd if=/dev/zero of=/dev/vdb oflag=direct bs=4k

6. after guest is paused with io-error, do migration
(qemu) info status
VM status: paused (io-error)
(qemu) migrate -d tcp:0:5800

7. on host, grow the logical volume by 4 MB
# lvresize -L +4M /dev/testvg/testlv

8. in destination, resume the guest
(qemu)c

after step8, "Virtqueue size exceeded" error from destination QEMU and qemu-kvm quits.

Verify this bug using the following version:
kernel-3.10.0-505.el7.x86_64
qemu-kvm-rhev-2.6.0-25.el7.x86_64

Do the above test, after step 8, destination qemu-kvm did not quit and guest can resume normally.

Comment 12 huiqingding 2016-09-18 05:02:17 UTC
Based on comment #11, set this bug to be verified.

Comment 14 errata-xmlrpc 2016-11-07 21:33:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html


Note You need to log in before you can comment on or make changes to this bug.