429747 – [Stratus 5.2 bug] md resync "repair" functionality is broken

Bug 429747 - [Stratus 5.2 bug] md resync "repair" functionality is broken

Summary: [Stratus 5.2 bug] md resync "repair" functionality is broken

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Doug Ledford
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	217119
TreeView+	depends on / blocked

Reported:	2008-01-22 21:01 UTC by Simon McGrath
Modified:	2009-06-20 02:09 UTC (History)
CC List:	4 users (show)
Fixed In Version:	RHBA-2008-0314
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-21 15:07:11 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Upstream commits (4.26 KB, patch) 2008-01-29 19:40 UTC, Doug Ledford	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0314	0	normal	SHIPPED_LIVE	Updated kernel packages for Red Hat Enterprise Linux 5.2	2008-05-20 18:43:34 UTC

Description Simon McGrath 2008-01-22 21:01:22 UTC

Description of problem:

The existing RHEL 5.1 provides disk resync functionality for "check" and
"repair". The "check" functionality appears to work correctly. However, the
"repair" functionality does not correct a medium error on the disk. The medium
error is detected, but not corrected.

Version-Release number of selected component (if applicable):
RHEL 5.1 (2.6.18-53) has bug, upstream kernel 2.6.23-9 has the bug fix.

How reproducible:

This is reproducible every time (100%) a medium error is encountered on a disk.

Steps to Reproduce:
1. Bring an md device to full sync.
2. Use sg_write_long to inject a medium error onto one of the physical drives.
   e.g. sg_write_long --lba=1000 --xfer_len=580 /dev/<scsi_device>
3. Start the disk "repair" resync: echo repair > /sys/block/md7/md/sync_action
  
Actual results:

The medium error is detected (/sys/block/md7/md/mismatch_cnt is non-zero).
The system appears to hang for a few minutes before coming back. Performing
another "repair" detects the same medium error and does not correct the medium
error again.

Expected results:
The medium error should be corrected and the resync should complete within a
reasonable amount of time.

Additional info:
In raid1.c, function sync_request_write() needs to have code to perform a
"memcpy" to the correct page before the write I/O is performed to correct medium
error.

Patch:
--- /usr/src/kernels/linux-2.6.18.x86_64/drivers/md/raid1.c.orig	2008-01-21
10:27:01.000000000 -0500
+++ raid1.c	2008-01-21 10:16:55.000000000 -0500
@@ -1204,17 +1204,23 @@
 			}
 		r1_bio->read_disk = primary;
 		for (i=0; i<mddev->raid_disks; i++)
-			if (r1_bio->bios[i]->bi_end_io == end_sync_read &&
-			    test_bit(BIO_UPTODATE, &r1_bio->bios[i]->bi_flags)) {
+			if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
 				int j;
 				int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
 				struct bio *pbio = r1_bio->bios[primary];
 				struct bio *sbio = r1_bio->bios[i];
-				for (j = vcnt; j-- ; )
-					if (memcmp(page_address(pbio->bi_io_vec[j].bv_page),
-						   page_address(sbio->bi_io_vec[j].bv_page),
-						   PAGE_SIZE))
-						break;
+				if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
+					for (j = vcnt; j-- ; ) {
+						struct page *p, *s;
+						p = pbio->bi_io_vec[j].bv_page;
+						s = sbio->bi_io_vec[j].bv_page;
+						if (memcmp(page_address(p),
+							   page_address(s),
+							   PAGE_SIZE))
+							break;
+					}
+				} else
+					j = 0;
 				if (j >= 0)
 					mddev->resync_mismatches += r1_bio->sectors;
 				if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
@@ -1235,6 +1241,11 @@
 					sbio->bi_sector = r1_bio->sector +
 						conf->mirrors[i].rdev->data_offset;
 					sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
+					for (j = 0; j < vcnt ; j++)
+						memcpy(page_address(sbio->bi_io_vec[j].bv_page),
+						       page_address(pbio->bi_io_vec[j].bv_page),
+						       PAGE_SIZE);
+
 				}
 			}
 	}

Comment 1 Andrius Benokraitis 2008-01-22 21:12:07 UTC

Simon, in the future you can attach the patch in the "attachment" section of
this BZ.

Comment 2 Andrius Benokraitis 2008-01-23 15:21:19 UTC

With Stratus' onsite engineer currently in flux, doug has graciously agreed to
push this through the devel process. I will keep an eye on this if any other
feedback is needed from Simon@Stratus in the meantime.

Comment 7 Doug Ledford 2008-01-29 19:40:39 UTC

Created attachment 293321 [details]
Upstream commits

OK, the patch listed in this report is a combination of two different upstream
git commits.  However, there also looked like there was a third relevant
commit.  So, I created this patch file which shows all three git commits (in
chronological order).  They apply to our current RHEL5 kernel (with a little
fuzz).	I need Stratus to build a kernel with this patch and verify that it
still solves their problem.  Thanks.

Comment 9 Andrius Benokraitis 2008-01-29 19:45:08 UTC

Simon, would you be able to take care of what Doug is looking for in Comment #7?

Comment 10 Andrius Benokraitis 2008-01-29 19:58:30 UTC

From Dave@Stratus in email:

--
Simon;

I can do this for you. I will build a kernel with this patch and do the
test. I will send back the results.

Dave

Comment 11 Doug Ledford 2008-01-29 20:01:35 UTC

Patch submitted internally for review.

Comment 12 Andrius Benokraitis 2008-01-29 20:11:57 UTC

Thanks Doug!

Comment 13 David Fairbanks 2008-01-30 20:58:13 UTC

I have patched raid1.c with the above patch and built a new kernel.

After much testing, the patch does indeed fix the "repair" functionality. The 
medium errors are detected and fixed correctly.

However, this patch changes the operation of the "check" functionality.
Prior to this patch the "check" operation did NOT fix any medium errors. It 
only detected them and incremented the mismatch_cnt sysfs variable.

The 3rd section of the supplied patch changes the functionality of the "check" 
operation. With this patch, the "check" operation "corrects" the medium errors 
in addition to just detecting them. 

I have included an excerpt from the md man page below that explains the use of 
the "check" and "repair" operations.

-----------------------------------
Excerpt from md man page:

 md/sync_action

  This can be used to monitor and control the resync/recovery process of  MD.
  In  particular,  writing "check" here will cause the array to read all data
  block and check that they are consistent (e.g. parity is  correct,  or  all
  mirror  replicas are the same).  Any discrepancies found are NOT corrected.

  A count of problems found will be stored in md/mismatch_count.

  Alternately, "repair" can be written which will cause the same check to  be
  performed, but any errors will be corrected.

--------------------------------

Question:
What is the correct functionality for the "check" operation?

If the "check" operation is NOT supposed to correct medium errors, then the 
enclosed patch should not perform the third section of the patch. (The first 
two sections are correct.)

If the "check" operation IS supposed to correct disk errors, then the enclosed 
patch works correctly.

Thanks;

Dave Fairbanks, Stratus

Comment 14 Doug Ledford 2008-01-30 21:34:57 UTC

Well, the third patch is from Neil Brown and is in current upstream kernels.  As
far as I can tell, the check pass now detects and records mismatched blocks, but
only on blocks that were able to be read successfully in the first place.

I think Neil's logic behind this decision is that if someone wanted to run a
check pass on an array, then finding an inconsistency in the array implies that
we have two possibly correct pieces of data, and by not correcting it
automatically the user can extract whichever data block is correct and make sure
that it is the data block copied to the other mirror.  For blocks that weren't
able to be read, there is no possibility that their data might be correct, and
so these are auto-corrected.

For all I know, that's what Neil intended in the first place.  The md man page
excerpt above is silent on the issue of failed reads and could be interpreted
either way.

My preference in a situation like this is to follow upstream, so I would say the
patch is correct.

Comment 15 David Fairbanks 2008-01-30 21:51:55 UTC

Sounds good to me. Let's go with it.

Thanks;

Dave Fairbanks, Stratus

Comment 16 Don Zickus 2008-02-06 20:55:59 UTC

in 2.6.18-78.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 18 John Poelstra 2008-03-21 03:53:11 UTC

Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot1--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you

Comment 20 errata-xmlrpc 2008-05-21 15:07:11 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.