Bug 1374623

Summary: RHSA-2016-1756 breaks migration of instances
Product: Red Hat Enterprise Linux 7 Reporter: Marcel Kolaja <mkolaja>
Component: qemu-kvm-rhevAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: huiqingding <huding>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.3CC: amedeo.salvati, aperotti, areis, berrange, blake.c.anderson, chayang, c.hendrickson09, cww, dasmith, eglynn, furlongm, huding, ipetrova, jboggs, jen, jherrman, jmelvin, juzhang, kamfonik, kchamart, knoel, lhh, lijin, lmiksik, moshele, qizhu, rbryant, sbauza, sferdjao, sgordon, sknauss, srevivo, stefanha, virt-maint, vromanso
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.3.0-31.el7_2.22 Doc Type: Bug Fix
Doc Text:
The fix for CVE-2016-5403 caused migrating guest instances to fail with a "Virtqueue size exceeded" error message. With this update, the value of the virtualization queue is recalculated after the migration, and the described problem no longer occurs.
Story Points: ---
Clone Of: 1372763 Environment:
Last Closed: 2016-11-17 15:01:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1372763, 1376542    
Bug Blocks:    

Description Marcel Kolaja 2016-09-09 08:50:04 UTC
This bug has been copied from bug #1372763 and has been proposed
to be backported to 7.2 z-stream (EUS).

Comment 4 Miroslav Rezanina 2016-09-13 12:49:44 UTC
Fix included in qemu-kvm-rhev-2.6.0-25.el7

Comment 5 Miroslav Rezanina 2016-09-14 08:08:16 UTC
Fix included in qemu-kvm-rhev-2.3.0-31.el7_2.22

Comment 7 lijin 2016-09-22 05:43:13 UTC
with qemu-kvm-rhev-2.3.0-31.el7_2.22.x86_64,I still hit "Virtqueue size exceeded" error when do migration during disk(blk) io stress.

steps:
1.boot win8-32 guest with virtio-blk-pci device:
-object iothread,id=thread0 -drive file=win8-32-rhel7u2.raw,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device virtio-blk-pci,iothread=thread0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 

2.run CrystalDiskMark in guest

3.do migration

actual result:
migration failed with:
src:(qemu) qemu-kvm: Virtqueue size exceeded
dst:(qemu) qemu-kvm: error while loading state section id 2(ram)
qemu-kvm: load of migration failed: Input/output error

Comment 8 Stefan Hajnoczi 2016-09-22 13:47:58 UTC
(In reply to lijin from comment #7)
> with qemu-kvm-rhev-2.3.0-31.el7_2.22.x86_64,I still hit "Virtqueue size
> exceeded" error when do migration during disk(blk) io stress.
> 
> steps:
> 1.boot win8-32 guest with virtio-blk-pci device:
> -object iothread,id=thread0 -drive
> file=win8-32-rhel7u2.raw,if=none,id=drive-ide0-0-0,format=raw,
> serial=mike_cao,cache=none -device
> virtio-blk-pci,iothread=thread0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
> 
> 2.run CrystalDiskMark in guest
> 
> 3.do migration
> 
> actual result:
> migration failed with:
> src:(qemu) qemu-kvm: Virtqueue size exceeded
> dst:(qemu) qemu-kvm: error while loading state section id 2(ram)
> qemu-kvm: load of migration failed: Input/output error

I was able to reproduce this problem with a Linux guest running fio.  It is a different bug since the error happens on the source QEMU while the patch for this BZ fixes the destination QEMU.

Comment 9 lijin 2016-09-23 07:53:09 UTC
(In reply to Stefan Hajnoczi from comment #8)
> I was able to reproduce this problem with a Linux guest running fio.  It is
> a different bug since the error happens on the source QEMU while the patch
> for this BZ fixes the destination QEMU.

Thanks,I will report a new bug to track it.

Comment 10 lijin 2016-09-23 08:05:39 UTC
This issue is reproduced on rhel7.2-z(qemu-kvm-rhev-2.3.0-31.el7_2.22.x86_64),can NOT reproduce with rhel7.3 latest version(qemu-kvm-rhev-2.6.0-26.el7.x86_64)

As Z stream bug should clone from Y stream bug,I'm a little confused how to handle it?

Comment 11 Stefan Hajnoczi 2016-09-23 08:33:01 UTC
(In reply to lijin from comment #10)
> This issue is reproduced on
> rhel7.2-z(qemu-kvm-rhev-2.3.0-31.el7_2.22.x86_64),can NOT reproduce with
> rhel7.3 latest version(qemu-kvm-rhev-2.6.0-26.el7.x86_64)
> 
> As Z stream bug should clone from Y stream bug,I'm a little confused how to
> handle it?

Right, code inspection shows that RHEL 7.3 and upstream QEMU do not suffer from this race condition.  So we need a 7.2.z-only BZ.

I don't know the process either but I've asked on IRC.  Will update the BZ when I receive an answer.

Comment 12 Stefan Hajnoczi 2016-09-23 09:15:29 UTC
(In reply to Stefan Hajnoczi from comment #11)
> (In reply to lijin from comment #10)
> > This issue is reproduced on
> > rhel7.2-z(qemu-kvm-rhev-2.3.0-31.el7_2.22.x86_64),can NOT reproduce with
> > rhel7.3 latest version(qemu-kvm-rhev-2.6.0-26.el7.x86_64)
> > 
> > As Z stream bug should clone from Y stream bug,I'm a little confused how to
> > handle it?
> 
> Right, code inspection shows that RHEL 7.3 and upstream QEMU do not suffer
> from this race condition.  So we need a 7.2.z-only BZ.
> 
> I don't know the process either but I've asked on IRC.  Will update the BZ
> when I receive an answer.

09:46 < mrezanin> stefanha: As usual...create normal bz (for both z-stream and y-stream). Then we negotiate clone and after it y-stream is closed with proper marking and z-stream is solved
09:48 < stefanha> mrezanin: Does this mean: create a BZ with both y-stream and z-stream flags, then leave a comment saying it's relevant for z-stream only, close it as NOTABUG?
09:48 < mrezanin> stefanha: Yes...but close after cloning

I will create the BZ and CC you.

Comment 13 huiqingding 2016-10-10 03:06:34 UTC
Reproduce this bug using the following version:
kernel-3.10.0-327.37.1.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64

Reproduce steps:
1. create a 4M lv
# pvcreate /dev/sdg
# vgcreate testvg /dev/sdg
# lvcreate -L 4M -T testvg/testlv
# lvs
  LV     VG                  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home   rhel_hp-dl380pg8-09 -wi-ao---- 212.61g                                                    
  root   rhel_hp-dl380pg8-09 -wi-ao----  50.00g                                                    
  swap   rhel_hp-dl380pg8-09 -wi-ao----  15.75g                                                    
  testlv testvg              twi-a-tz--   4.00m             0.00   0.88  

2. create a data disk image based on the above lv
# qemu-img create -f qcow2 /dev/testvg/testlv 10G

3. boot a rhel7.3 guest with the above data disk image
# /usr/libexec/qemu-kvm \
 -S \
 -name 'rhel7.3' \
 -machine pc \
 -m 4096 \
 -smp 4,maxcpus=4,sockets=1,cores=4,threads=1 \
 -cpu SandyBridge \
 -rtc base=localtime,clock=host,driftfix=slew \
 -nodefaults \
 -boot menu=on \
 -enable-kvm \
 -monitor stdio \
 -drive file=/mnt/rhel7.3.raw,format=raw,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive_sysdisk,bootindex=1 \
  -drive if=none,cache=none,format=qcow2,file=/dev/testvg/testlv,werror=stop,id=drive-virtio-disk0 \
 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0 \
 -vga qxl \
 -spice port=5900,disable-ticketing

4. on the same host, use the same command line with "-incoming tcp:0:5800", boot the rhel7.3 guest

5. inside guest
# dd if=/dev/zero of=/dev/vdb oflag=direct bs=4k

6. after guest is paused with io-error, do migration
(qemu) info status
VM status: paused (io-error)
(qemu) migrate -d tcp:0:5800

7. on host, grow the logical volume by 4 MB
# lvresize -L +4M /dev/testvg/testlv

8. in destination, resume the guest
(qemu)c

after step8, "Virtqueue size exceeded" error from destination QEMU and qemu-kvm quits.

Verify this bug using the following version:
kernel-3.10.0-327.37.1.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.22.x86_64

Do the above test, after step 8, destination qemu-kvm did not quit and guest can resume normally.

Comment 14 huiqingding 2016-10-10 03:07:57 UTC
Based on comment #13, set this bug to be verified.

Comment 15 Irina Petrova 2016-10-27 09:48:49 UTC
Hey guys, 

Since qemu-kvm-rhev-2.3.0-31.el7_2.22.x86_64 obviously passed QA (c#13, c#14) when can we expect it?

I have another customer asking for it.

Regards,
Irina

Comment 23 errata-xmlrpc 2016-11-17 15:01:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2803.html