Bug 1116729 - Backport qemu_bh_schedule() race condition fix
Summary: Backport qemu_bh_schedule() race condition fix
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: John Snow
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1116728
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-07 07:53 UTC by Stefan Hajnoczi
Modified: 2015-03-05 09:47 UTC (History)
11 users (show)

Fixed In Version: qemu 2.1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1116728
Environment:
Last Closed: 2015-03-05 09:47:42 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0624 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2015-03-05 14:37:36 UTC

Description Stefan Hajnoczi 2014-07-07 07:53:43 UTC
+++ This bug was initially created as a clone of Bug #1116728 +++

There is a race condition in qemu_bh_schedule() that was fixed upstream:

commit 924fe1293c3e7a3c787bbdfb351e7f168caee3e9
Author: Stefan Hajnoczi <stefanha@redhat.com>
Date:   Tue Jun 3 11:21:01 2014 +0200

    aio: fix qemu_bh_schedule() bh->ctx race condition
    
    qemu_bh_schedule() is supposed to be thread-safe at least the first time
    it is called.  Unfortunately this is not quite true:
    
      bh->scheduled = 1;
      aio_notify(bh->ctx);
    
    Since another thread may run the BH callback once it has been scheduled,
    there is a race condition if the callback frees the BH before
    aio_notify(bh->ctx) has a chance to run.
    
    Reported-by: Stefan Priebe <s.priebe@profihost.ag>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Tested-by: Stefan Priebe <s.priebe@profihost.ag>

Upstream both block/rbd.c and block/gluster.c were affected by this race condition and users reported periodic crashes.  In rhev7 (based on QEMU 2.0.0) these block drivers are affected (unlike rhel7!).

Backporting this fix will improve the stability of guests using Gluster or RBD.  It may be hard to test this fix since the crash only happens rarely due to a race condition.

Comment 2 John Snow 2014-07-15 16:17:12 UTC
as noted, this was fixed upstream by 924fe1293c3e7a3c787bbdfb351e7f168caee3e9 so it will be included via the 2.1 rebase.

Comment 3 FuXiangChun 2014-07-22 10:03:34 UTC
Stefan,
would you please provide QE an effective way or reproducer to reproduce this bug? otherwise, QE don't know how to trigger/verify it.  Thanks!

Comment 4 Stefan Hajnoczi 2014-07-23 14:14:20 UTC
(In reply to FuXiangChun from comment #3)
> Stefan,
> would you please provide QE an effective way or reproducer to reproduce this
> bug? otherwise, QE don't know how to trigger/verify it.  Thanks!

There is no good test available for this race condition.  I think we'll have to rely on code review for this one.

Comment 5 FuXiangChun 2014-07-28 02:50:17 UTC
Stefan,
QE didn't find related patch from code. Can you confirm it?  This is my testing command. If it is wrong. please correct me. Thanks.

# rpm -qpi qemu-kvm-rhev-2.1.0-3.el7ev.preview.src.rpm --changelog|grep 1116729
nothing

Comment 6 Stefan Hajnoczi 2014-07-28 09:07:58 UTC
qemu-kvm-rhev-2.1.0-3.el7ev.preview.src.rpm is based on upstream an QEMU 2.1.0 release candidate.  It includes the relevant commit.  Since the commit comes directly from upstream instead of a backport, there is no Red Hat bugzilla information:

commit 924fe1293c3e7a3c787bbdfb351e7f168caee3e9
Author: Stefan Hajnoczi <stefanha@redhat.com>
Date:   Tue Jun 3 11:21:01 2014 +0200

    aio: fix qemu_bh_schedule() bh->ctx race condition

Looks good.

Comment 8 FuXiangChun 2014-08-28 13:41:25 UTC
verified with qemu-kvm-rhev-2.1.0-2.el7.src.rpm
double-check qemu_bh_schedule() in async.c source code.

void qemu_bh_schedule(QEMUBH *bh) 
{
    AioContext *ctx;

    if (bh->scheduled)
        return;
    ctx = bh->ctx;
    bh->idle = 0;
    /* Make sure that:
     * 1. idle & any writes needed by the callback are done before the
     *    locations are read in the aio_bh_poll.
     * 2. ctx is loaded before scheduled is set and the callback has a chance
     *    to execute.
     */
    smp_mb();
    bh->scheduled = 1;
    aio_notify(ctx);
}

According to comment6, This bug is fixed.

Comment 11 errata-xmlrpc 2015-03-05 09:47:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html


Note You need to log in before you can comment on or make changes to this bug.