Bug 1378788

Summary: Race condition during virtio-blk dataplane stop triggers "Virtqueue size exceeded"
Product: Red Hat Enterprise Linux 7 Reporter: Stefan Hajnoczi <stefanha>
Component: qemu-kvm-rhevAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: lijin <lijin>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: chayang, huding, jentrena, jherrman, juzhang, lijin, mas-hatada, mrezanin, mst, virt-maint
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: QEMU 2.6 Doc Type: Bug Fix
Doc Text:
Due to a race condition in the virtio-blk dataplane, live migration of a guest in some cases failed with a "Virtqueue size exceeded" error message. This update prevents the race condition from occurring, and thus allows live migration to work more reliably.
Story Points: ---
Clone Of:
: 1380320 (view as bug list) Environment:
Last Closed: 2016-11-07 21:36:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1380320    

Description Stefan Hajnoczi 2016-09-23 09:29:03 UTC
Description of problem:
It is possible to trigger the "Virtqueue size exceeded" error on the source QEMU during live migration due to a race condition in virtio-blk dataplane.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.3.0-31.el7_2.22

How reproducible:
Non-deterministic.  Try 5-10 times.

Steps to Reproduce:

1. qemu-kvm -enable-kvm -m 1024 -cpu host -object iothread,id=thread0 -drive file=test.raw,if=none,id=drive0,format=raw,cache=none -device virtio-blk-pci,iothread=thread0,drive=drive0,id=virtio-blk0,bootindex=1

2. Log into the Linux guest.

3. Launch the destination QEMU: qemu-kvm -enable-kvm -m 1024 -cpu host -object iothread,id=thread0 -drive file=test.raw,if=none,id=drive0,format=raw,cache=none -device virtio-blk-pci,iothread=thread0,drive=drive0,id=virtio-blk0,bootindex=1 -incoming tcp::1234

4. Run "fio fio.job" inside the guest.  The contents of the fio.job file are:
[global]
filename=/dev/vda
ioengine=libaio
direct=1
runtime=60
ramp_time=5
gtod_reduce=1

[job]
readwrite=randread
iodepth=8
numjobs=8

3. While fio is running, migrate to the destination QEMU:
(qemu) migrate tcp:127.0.0.1:1234

Actual results:
Occassionally QEMU terminates with the "Virtqueue size exceeded" error message.

Expected results:
Live migration is successful and the fio benchmark continues running in the guest.

Additional info:
This bug does not affect RHEL 7.3 or recent upstream versions.

Comment 1 Stefan Hajnoczi 2016-09-23 09:31:05 UTC
Note this bug only affects RHEL 7.2.z.  Please clone a z-stream bug.

Comment 4 huiqingding 2016-09-28 10:57:13 UTC
I test this bug latest qemu-kvm-rhev:
qemu-kvm-rhev-2.6.0-28.el7.x86_64
3.10.0-510.el7.x86_64

The test steps are as comment #0. Did migration for 10+ times, the results are all pass. Live migration is successful and the fio benchmark continues running in the guest.

Comment 9 errata-xmlrpc 2016-11-07 21:36:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html