Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1116728 - Backport qemu_bh_schedule() race condition fix
Backport qemu_bh_schedule() race condition fix
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.1
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: John Snow
Virtualization Bugs
:
Depends On:
Blocks: 1116729
  Show dependency treegraph
 
Reported: 2014-07-07 03:50 EDT by Stefan Hajnoczi
Modified: 2015-03-05 03:10 EST (History)
9 users (show)

See Also:
Fixed In Version: qemu-kvm-1.5.3-67.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1116729 (view as bug list)
Environment:
Last Closed: 2015-03-05 03:10:37 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0349 normal SHIPPED_LIVE Important: qemu-kvm security, bug fix, and enhancement update 2015-03-05 07:27:34 EST

  None (edit)
Description Stefan Hajnoczi 2014-07-07 03:50:27 EDT
There is a race condition in qemu_bh_schedule() that was fixed upstream:

commit 924fe1293c3e7a3c787bbdfb351e7f168caee3e9
Author: Stefan Hajnoczi <stefanha@redhat.com>
Date:   Tue Jun 3 11:21:01 2014 +0200

    aio: fix qemu_bh_schedule() bh->ctx race condition
    
    qemu_bh_schedule() is supposed to be thread-safe at least the first time
    it is called.  Unfortunately this is not quite true:
    
      bh->scheduled = 1;
      aio_notify(bh->ctx);
    
    Since another thread may run the BH callback once it has been scheduled,
    there is a race condition if the callback frees the BH before
    aio_notify(bh->ctx) has a chance to run.
    
    Reported-by: Stefan Priebe <s.priebe@profihost.ag>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Tested-by: Stefan Priebe <s.priebe@profihost.ag>

Upstream both block/rbd.c and block/gluster.c were affected by this race condition and users reported periodic crashes.  In rhel7 (based on QEMU 1.5.3) these block drivers use a pipe to signal request completion instead of QEMUBH.  Therefore they are unaffected in RHEL7.

This fix should still be backported since it is hard to detect the presence of this bug and future RHEL7 backports may bring in code that depends on qemu_bh_schedule()'s thread-safety.
Comment 2 FuXiangChun 2014-07-22 06:04:15 EDT
Stefan,
would you please provide QE an effective way or reproducer to reproduce this bug? otherwise, QE don't know how to trigger/verify it.  Thanks!
Comment 3 Stefan Hajnoczi 2014-07-23 10:14:45 EDT
(In reply to FuXiangChun from comment #2)
> would you please provide QE an effective way or reproducer to reproduce this
> bug? otherwise, QE don't know how to trigger/verify it.  Thanks!

There is no good test available for this race condition.  I think we'll have to rely on code review for this one.
Comment 4 Jeff Nelson 2014-08-08 12:54:56 EDT
Fix included in qemu-kvm-1.5.3-67.el7
Comment 5 FuXiangChun 2014-08-13 02:01:25 EDT
Verify with qemu-kvm-1.5.3-67.el7.src.rpm

# rpm -qpi qemu-kvm-1.5.3-67.el7.src.rpm --changelog|grep 1116728
- kvm-aio-fix-qemu_bh_schedule-bh-ctx-race-condition.patch [bz#1116728]

check kvm-aio-fix-qemu_bh_schedule-bh-ctx-race-condition.patch

diff --git a/async.c b/async.c
index 5ce3633..d7ec1ea 100644
--- a/async.c
+++ b/async.c
@@ -117,15 +117,21 @@ void qemu_bh_schedule_idle(QEMUBH *bh)

 void qemu_bh_schedule(QEMUBH *bh)
 {
+    AioContext *ctx;
+
     if (bh->scheduled)
         return;
+    ctx = bh->ctx;
     bh->idle = 0;
-    /* Make sure that idle & any writes needed by the callback are done
-     * before the locations are read in the aio_bh_poll.
+    /* Make sure that:
+     * 1. idle & any writes needed by the callback are done before the
+     *    locations are read in the aio_bh_poll.
+     * 2. ctx is loaded before scheduled is set and the callback has a chance
+     *    to execute.
      */
-    smp_wmb();
+    smp_mb();
     bh->scheduled = 1;
-    aio_notify(bh->ctx);
+    aio_notify(ctx);
 }

# rpm -qpi qemu-kvm-rhev-2.1.0-1.el7.src.rpm --changelog|grep 1116728
nothing

Stefan,
According to this test result. This bug isn't fixed for qemu-2.1.  It is fixed for qemu-kvm-1.5.3-67.el7. Is this bug verified?
Comment 6 Stefan Hajnoczi 2014-08-13 10:09:37 EDT
(In reply to FuXiangChun from comment #5)
> Verify with qemu-kvm-1.5.3-67.el7.src.rpm
> 
> # rpm -qpi qemu-kvm-1.5.3-67.el7.src.rpm --changelog|grep 1116728
> - kvm-aio-fix-qemu_bh_schedule-bh-ctx-race-condition.patch [bz#1116728]
> 
> check kvm-aio-fix-qemu_bh_schedule-bh-ctx-race-condition.patch
> 
> diff --git a/async.c b/async.c
> index 5ce3633..d7ec1ea 100644
> --- a/async.c
> +++ b/async.c
> @@ -117,15 +117,21 @@ void qemu_bh_schedule_idle(QEMUBH *bh)
> 
>  void qemu_bh_schedule(QEMUBH *bh)
>  {
> +    AioContext *ctx;
> +
>      if (bh->scheduled)
>          return;
> +    ctx = bh->ctx;
>      bh->idle = 0;
> -    /* Make sure that idle & any writes needed by the callback are done
> -     * before the locations are read in the aio_bh_poll.
> +    /* Make sure that:
> +     * 1. idle & any writes needed by the callback are done before the
> +     *    locations are read in the aio_bh_poll.
> +     * 2. ctx is loaded before scheduled is set and the callback has a
> chance
> +     *    to execute.
>       */
> -    smp_wmb();
> +    smp_mb();
>      bh->scheduled = 1;
> -    aio_notify(bh->ctx);
> +    aio_notify(ctx);
>  }
> 
> # rpm -qpi qemu-kvm-rhev-2.1.0-1.el7.src.rpm --changelog|grep 1116728
> nothing
> 
> Stefan,
> According to this test result. This bug isn't fixed for qemu-2.1.  It is
> fixed for qemu-kvm-1.5.3-67.el7. Is this bug verified?

The patch is part of QEMU 2.1 so no explicit backport was needed.  That's why you don't see 1116728 in the grep output.

If you want to double-check, look at async.c in the QEMU 2.1.0 source code used to build the qemu-kvm-rhev-2.1.0-1.el7 rpm.
Comment 8 FuXiangChun 2014-08-28 09:47:40 EDT
verified this bug with qemu-kvm-rhev-2.1.0-2.el7.src.rpm

void qemu_bh_schedule(QEMUBH *bh)
{
    AioContext *ctx;

    if (bh->scheduled)
        return;
    ctx = bh->ctx;
    bh->idle = 0;
    /* Make sure that:
     * 1. idle & any writes needed by the callback are done before the
     *    locations are read in the aio_bh_poll.
     * 2. ctx is loaded before scheduled is set and the callback has a chance
     *    to execute.
     */
    smp_mb();
    bh->scheduled = 1;
    aio_notify(ctx);
}

according to https://bugzilla.redhat.com/show_bug.cgi?id=1116729#c6 and comment 6 as above.

Base on this test result. This bug is fixed on qemu-kvm-rhev-2.1.0-2.el7.src.rpm. but it isn't fixed on qemu-kvm-1.5.3-67.el7
Comment 9 juzhang 2014-09-01 00:50:43 EDT
(In reply to FuXiangChun from comment #8)
> verified this bug with qemu-kvm-rhev-2.1.0-2.el7.src.rpm
> 
> void qemu_bh_schedule(QEMUBH *bh)
> {
>     AioContext *ctx;
> 
>     if (bh->scheduled)
>         return;
>     ctx = bh->ctx;
>     bh->idle = 0;
>     /* Make sure that:
>      * 1. idle & any writes needed by the callback are done before the
>      *    locations are read in the aio_bh_poll.
>      * 2. ctx is loaded before scheduled is set and the callback has a chance
>      *    to execute.
>      */
>     smp_mb();
>     bh->scheduled = 1;
>     aio_notify(ctx);
> }
> 
> according to https://bugzilla.redhat.com/show_bug.cgi?id=1116729#c6 and
> comment 6 as above.
> 
> Base on this test result. This bug is fixed on
> qemu-kvm-rhev-2.1.0-2.el7.src.rpm. but it isn't fixed on
> qemu-kvm-1.5.3-67.el7

qemu-kvm-rhev-2.1.0-2.el7.src.rpm and qemu-kvm-1.5.3-67.el7
Comment 12 errata-xmlrpc 2015-03-05 03:10:37 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0349.html

Note You need to log in before you can comment on or make changes to this bug.