Bug 804851 - PATCH: MD RAID1 device hangs on certain workloads
Summary: PATCH: MD RAID1 device hangs on certain workloads
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 835613
TreeView+ depends on / blocked
 
Reported: 2012-03-19 23:32 UTC by Ray Morris
Modified: 2012-07-17 12:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 835613 (view as bug list)
Environment:
Last Closed: 2012-07-17 12:39:22 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
raid1 barrier deadlock fix (929 bytes, patch)
2012-03-19 23:33 UTC, Ray Morris
no flags Details | Diff

Description Ray Morris 2012-03-19 23:32:14 UTC
User-Agent:       Mozilla/5.0 (X11; Linux i686 on x86_64; rv:5.0) Gecko/20110706 Firefox/5.0 SeaMonkey/2.2

A simple bug fix posted in linux-raid.org recently 
might be appropriate for Fedora kernels without necessarily waiting 
until some version after 3.3, when it should be included in mainline.
This fixes raid 1 devices becoming unresponsive under certain loads, 
including multiple LVM snapshots. I am submitting it here on the advice 
of the maintainer:

> From: NeilBrown <neilb>
> To: Ray Morris <support>
> Cc: linux-raid.org

> I will probably submit this fix to Linus shortly after 3.3 is out,
> with a request for it to be included in other -stable releases.
>
> You seem to be using kernels from Redhat.  If you want them to include
> the patch you should probably raise it as an issue with them.

A full description of the problem and the tested patch is as follows:

From: NeilBrown <neilb>
To: Ray Morris <support>
Cc: linux-raid.org
Subject: Re: debugging md2_resync hang at raise_barrier
Date: Thu, 1 Mar 2012 12:34:18 +1100

It is kind-a complicated and involved the magic code in
block/blk-core.c:generic_make_request which turns recursive calls into tail
recursion.

The fs sends a request to dm.
dm split it in 2 for some reason and sends them both to md.
This involves them getting queued in generic_make_request.
The first gets actioned by md/raid1 and converted into a request to the
underlying device (it must be a read request for this to happen - so just one
device).  This gets added to the queue and is counted in nr_pending.

At this point sync_request is called by another thread and it tries to
raise_battier.  It gets past the first hurdle, increments ->barrier, and
waits for nr_pending to hit zero.

Now the second request from dm to md is passed to raid1.c:make_request where
it tries to wait_barrier.  This blocks because ->barrier is up, and we have a
deadlock - the request to the underlying device will not progress until this
md request progresses, and it is stuck.

Patch:

===================================================================
--- linux-2.6.32-SLE11-SP1.orig/drivers/md/raid1.c      2012-03-01 12:28:05.000000000 +1100
+++ linux-2.6.32-SLE11-SP1/drivers/md/raid1.c   2012-03-01 12:28:22.427992913 +1100
@@ -695,7 +695,11 @@ static void wait_barrier(conf_t *conf)
        spin_lock_irq(&conf->resync_lock);
        if (conf->barrier) {
                conf->nr_waiting++;
-               wait_event_lock_irq(conf->wait_barrier, !conf->barrier,
+               wait_event_lock_irq(conf->wait_barrier,
+                                   !conf->barrier ||
+                                   (current->bio_tail &&
+                                    current->bio_list &&
+                                    conf->nr_pending),
                                    conf->resync_lock,
                                    raid1_unplug(conf->mddev->queue));
                conf->nr_waiting--;



Reproducible: Sometimes

Steps to Reproduce:
Create RAID1 device.
Create a lot of random IO. (LVM snapshots seem to be good for this).

Actual Results:  
Device stops responding due to RAID1 barrier.

Expected Results:  
Device responds.

Comment 1 Ray Morris 2012-03-19 23:33:58 UTC
Created attachment 571248 [details]
raid1 barrier deadlock fix

Comment 2 Josh Boyer 2012-03-20 01:47:03 UTC
(In reply to comment #1)
> Created attachment 571248 [details]
> raid1 barrier deadlock fix

That patch isn't really usable.  Not in the right format.

This is in linux-next here:

http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=0736a32a30aa7a74f1f6589b1ec5addb5084362f

It's CC'd to stable, so it should wind up in both 3.2 and 3.3 shortly.

Comment 3 Dave Jones 2012-03-22 16:43:53 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 4 Dave Jones 2012-03-22 16:48:19 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 5 Dave Jones 2012-03-22 16:57:38 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 6 Josh Boyer 2012-07-17 12:39:22 UTC
This wound up going into the 3.4 kernel as commit d6b42dcb995e6acd7cc276774e751ffc9f0ef4bf.


Note You need to log in before you can comment on or make changes to this bug.