Bug 808943 - Raid check doesn't actually read from disks
Raid check doesn't actually read from disks
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: mdadm (Show other bugs)
16
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Jes Sorensen
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-01 18:50 EDT by Larkin Lowrey
Modified: 2012-05-31 04:51 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-31 04:51:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Larkin Lowrey 2012-04-01 18:50:47 EDT
Description of problem:
Running a raid check, either via raid-check or manually by echo check > sync_action causes the array to begin, and run the check, but without causing any I/O. The check runs at the full limit speed of 200MB/s, even though the devices cannot run that fast. Running iostat shows zero I/O while /proc/mdstat reports a check in progress.

Version-Release number of selected component (if applicable):


How reproducible:
Every time.


Steps to Reproduce:
1. run raid-check
2. run iostat
3. confirm /prod/mdstat shows check in progress and at 200MB/s
4. confirm no disk io
  
Actual results:
The check runs to completion but the array is not actually checked.

Expected results:
The check should actually check the consistency of the array.

Additional info:
I reported this to linux-raid and a bug was identified and a patch apparently submitted.

I first noticed this phenomenon with kernel 3.3.0-4 and have confirmed it is still occurring with 3.3.0-8.

Here's the email I got from linux-raid:

From 4d79586ebffac308ba11b363d81525882fdf6abe Mon Sep 17 00:00:00 2001
From: majianpeng <majianpeng@gmail.com>
Date: Thu, 29 Mar 2012 11:12:59 +0800
Subject: [PATCH] md/raid5:Fix a bug about judging the operation is syncing or
 replaing in analyse_stripe().

When create a raid5 using assume-clean and echo check or repair to
sync_action.Then component disks did not operated IO but the raid
check/resync faster than normal.
Because the judgement in function analyse_stripe():
		if (do_recovery ||
		    sh->sector >= conf->mddev->recovery_cp)
			s->syncing = 1;
		else
			s->replacing = 1;
When check or repair,the recovery_cp == MaxSectore,so syncing equal zero
not one.

Signed-off-by: majianpeng <majianpeng@gmail.com>
---
 drivers/md/raid5.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 23ac880..4d43ad3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3276,12 +3276,14 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
 		/* If there is a failed device being replaced,
 		 *     we must be recovering.
 		 * else if we are after recovery_cp, we must be syncing
+		 * else if MD_RECOVERY_REQUESTED is set,we all in syning.
 		 * else we can only be replacing
 		 * sync and recovery both need to read all devices, and so
 		 * use the same flag.
 		 */
 		if (do_recovery ||
-		    sh->sector >= conf->mddev->recovery_cp)
+		    sh->sector >= conf->mddev->recovery_cp ||
+		    test_bit(MD_RECOVERY_REQUESTED, &(conf->mddev->recovery)))
 			s->syncing = 1;
 		else
 			s->replacing = 1;
-- 1.7.5.4 --------------
majianpeng 2012-03-29
Comment 2 Jes Sorensen 2012-05-04 08:31:27 EDT
Larkin,

Can you provide me with details on how you created and re-created this array
for the error to occur?

I tried creating a raid5 array and re-creating it with --assume-clean here
but was not able to reproduce the problem you are reporting.

Thanks,
Jes
Comment 3 Jes Sorensen 2012-05-04 09:06:23 EDT
Larkin,

Actually ignore me - I can reproduce it, I was testing against the wrong
kernel :(

I check the upstream kernel tree and the fix is in Linus' tree as
c6d2e084c7411f61f2b446d94989e5aaf9879b0f and I have just requested it
to go into stable-3.3. It should ripple into Fedora automatically after
that.

Cheers,
Jes
Comment 5 Benjamin S. Scarlet 2012-05-24 11:31:16 EDT
This seems to me to be fixed in 3.3.6-3.
Comment 6 Jes Sorensen 2012-05-31 04:51:17 EDT
Per Benjamin's comment, closing.

Note You need to log in before you can comment on or make changes to this bug.