Bug 1376542

Summary: RHSA-2016-1756 breaks migration of instances
Product: Red Hat Enterprise Linux 7 Reporter: Stefan Hajnoczi <stefanha>
Component: qemu-kvmAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: huiqingding <huding>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.3CC: amedeo.salvati, aperotti, areis, berrange, blake.c.anderson, chayang, c.hendrickson09, cww, dasmith, eglynn, furlongm, huding, jherrman, jmelvin, juzhang, kamfonik, kchamart, knoel, lmiksik, mkolaja, moshele, qizhu, rbalakri, rbryant, sbauza, sferdjao, sgordon, srevivo, stefanha, virt-maint, vromanso
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-1.5.3-126.el7 Doc Type: Bug Fix
Doc Text:
The fix for CVE-2016-5403 caused migrating guest instances to fail with a "Virtqueue size exceeded" error message. With this update, the value of the virtualization queue is recalculated after the migration, and the described problem no longer occurs.
Story Points: ---
Clone Of: 1372763
: 1380306 (view as bug list) Environment:
Last Closed: 2016-11-03 20:03:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1372763    
Bug Blocks: 1371943, 1374364, 1374365, 1374366, 1374367, 1374368, 1374369, 1374623, 1380306    

Comment 2 Stefan Hajnoczi 2016-09-16 08:50:45 UTC
Please verify as follows:

Create a 4 MB (!) LVM volume on the host.  I have called it /dev/testvg/testlv.

Now create qcow2 metadata on the LVM volume:

$ sudo qemu-img create -f qcow2 /dev/testvg/testlv 10G

(The reason for using an LVM volume instead of a regular file is that its size is fixed.  Since the LVM volume cannot grow automatically we can cause ENOSPC to happen when the guest writes to it.)

shell1$ sudo qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=raw,file=rhel72.img -drive if=virtio,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop

guest# dd if=/dev/zero of=/dev/vdb oflag=direct bs=4k

The guest should be paused almost immediately because /dev/testvg/testlv runs out of space and returns an ENOSPC write error.

Now launch the destination QEMU for live migration (on the same host):

shell2$ sudo qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,cache=none,format=raw,file=rhel72.img -drive if=virtio,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop -incoming tcp::1234
(qemu1) migrate tcp:127.0.0.1:1234

After migration has completed the guest is still paused.  Let's grow the LVM volume so the failed write request can be retried:

$ sudo lvresize -L +4M /dev/testvg/testlv
(qemu2) c

Expected behavior:
Guest resumes successfully when the 'c' monitor command is issued on destination
QEMU.  Note that it will probably pause again very soon because the LVM volume runs out of space again.

Actual behavior:
"Virtqueue size exceeded" error from destination QEMU and guest is terminated
after the 'c' monitor command is issued.

Comment 4 Miroslav Rezanina 2016-09-20 16:03:10 UTC
Fix included in qemu-kvm-1.5.3-126.el7

Comment 6 huiqingding 2016-09-21 03:21:58 UTC
Reproduce this bug using the following version:
kernel-3.10.0-509.el7.x86_64
qemu-kvm-1.5.3-125.el7.x86_64

Do the test as comment #2, the detailed steps as bz1372763 comment #11. After migration and type "c" in destination qemu-kvm, guest cannot resume and destination qemu-kvm quits with error "Virtqueue size exceeded".


Verify this bug using the following version:
kernel-3.10.0-509.el7.x86_64
qemu-kvm-1.5.3-126.el7.x86_64

Do the test as comment #2, the detailed steps as bz1372763 comment #11.  After migration and type "c" in destination qemu-kvm, guest can resume normally and guest is paused with io-error.

Comment 7 huiqingding 2016-09-21 03:23:05 UTC
Based on comment 6, set this bug to be verified.

Comment 10 errata-xmlrpc 2016-11-03 20:03:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2585.html

Comment 11 errata-xmlrpc 2016-11-03 21:51:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2585.html