Bug 804851

Summary: PATCH: MD RAID1 device hangs on certain workloads
Product: [Fedora] Fedora Reporter: Ray Morris <support>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 835613 (view as bug list) Environment:
Last Closed: 2012-07-17 12:39:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 835613    
Attachments:
Description Flags
raid1 barrier deadlock fix none

Description Ray Morris 2012-03-19 23:32:14 UTC
User-Agent:       Mozilla/5.0 (X11; Linux i686 on x86_64; rv:5.0) Gecko/20110706 Firefox/5.0 SeaMonkey/2.2

A simple bug fix posted in linux-raid.org recently 
might be appropriate for Fedora kernels without necessarily waiting 
until some version after 3.3, when it should be included in mainline.
This fixes raid 1 devices becoming unresponsive under certain loads, 
including multiple LVM snapshots. I am submitting it here on the advice 
of the maintainer:

> From: NeilBrown <neilb>
> To: Ray Morris <support>
> Cc: linux-raid.org

> I will probably submit this fix to Linus shortly after 3.3 is out,
> with a request for it to be included in other -stable releases.
>
> You seem to be using kernels from Redhat.  If you want them to include
> the patch you should probably raise it as an issue with them.

A full description of the problem and the tested patch is as follows:

From: NeilBrown <neilb>
To: Ray Morris <support>
Cc: linux-raid.org
Subject: Re: debugging md2_resync hang at raise_barrier
Date: Thu, 1 Mar 2012 12:34:18 +1100

It is kind-a complicated and involved the magic code in
block/blk-core.c:generic_make_request which turns recursive calls into tail
recursion.

The fs sends a request to dm.
dm split it in 2 for some reason and sends them both to md.
This involves them getting queued in generic_make_request.
The first gets actioned by md/raid1 and converted into a request to the
underlying device (it must be a read request for this to happen - so just one
device).  This gets added to the queue and is counted in nr_pending.

At this point sync_request is called by another thread and it tries to
raise_battier.  It gets past the first hurdle, increments ->barrier, and
waits for nr_pending to hit zero.

Now the second request from dm to md is passed to raid1.c:make_request where
it tries to wait_barrier.  This blocks because ->barrier is up, and we have a
deadlock - the request to the underlying device will not progress until this
md request progresses, and it is stuck.

Patch:

===================================================================
--- linux-2.6.32-SLE11-SP1.orig/drivers/md/raid1.c      2012-03-01 12:28:05.000000000 +1100
+++ linux-2.6.32-SLE11-SP1/drivers/md/raid1.c   2012-03-01 12:28:22.427992913 +1100
@@ -695,7 +695,11 @@ static void wait_barrier(conf_t *conf)
        spin_lock_irq(&conf->resync_lock);
        if (conf->barrier) {
                conf->nr_waiting++;
-               wait_event_lock_irq(conf->wait_barrier, !conf->barrier,
+               wait_event_lock_irq(conf->wait_barrier,
+                                   !conf->barrier ||
+                                   (current->bio_tail &&
+                                    current->bio_list &&
+                                    conf->nr_pending),
                                    conf->resync_lock,
                                    raid1_unplug(conf->mddev->queue));
                conf->nr_waiting--;



Reproducible: Sometimes

Steps to Reproduce:
Create RAID1 device.
Create a lot of random IO. (LVM snapshots seem to be good for this).

Actual Results:  
Device stops responding due to RAID1 barrier.

Expected Results:  
Device responds.

Comment 1 Ray Morris 2012-03-19 23:33:58 UTC
Created attachment 571248 [details]
raid1 barrier deadlock fix

Comment 2 Josh Boyer 2012-03-20 01:47:03 UTC
(In reply to comment #1)
> Created attachment 571248 [details]
> raid1 barrier deadlock fix

That patch isn't really usable.  Not in the right format.

This is in linux-next here:

http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=0736a32a30aa7a74f1f6589b1ec5addb5084362f

It's CC'd to stable, so it should wind up in both 3.2 and 3.3 shortly.

Comment 3 Dave Jones 2012-03-22 16:43:53 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 4 Dave Jones 2012-03-22 16:48:19 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 5 Dave Jones 2012-03-22 16:57:38 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 6 Josh Boyer 2012-07-17 12:39:22 UTC
This wound up going into the 3.4 kernel as commit d6b42dcb995e6acd7cc276774e751ffc9f0ef4bf.