Bug 429747 - [Stratus 5.2 bug] md resync "repair" functionality is broken
[Stratus 5.2 bug] md resync "repair" functionality is broken
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
x86_64 Linux
high Severity urgent
: rc
: ---
Assigned To: Doug Ledford
Martin Jenner
: OtherQA
Depends On:
Blocks: 217119
  Show dependency treegraph
 
Reported: 2008-01-22 16:01 EST by Simon McGrath
Modified: 2009-06-19 22:09 EDT (History)
4 users (show)

See Also:
Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 11:07:11 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Upstream commits (4.26 KB, patch)
2008-01-29 14:40 EST, Doug Ledford
no flags Details | Diff

  None (edit)
Description Simon McGrath 2008-01-22 16:01:22 EST
Description of problem:

The existing RHEL 5.1 provides disk resync functionality for "check" and
"repair". The "check" functionality appears to work correctly. However, the
"repair" functionality does not correct a medium error on the disk. The medium
error is detected, but not corrected.

Version-Release number of selected component (if applicable):
RHEL 5.1 (2.6.18-53) has bug, upstream kernel 2.6.23-9 has the bug fix.

How reproducible:

This is reproducible every time (100%) a medium error is encountered on a disk.

Steps to Reproduce:
1. Bring an md device to full sync.
2. Use sg_write_long to inject a medium error onto one of the physical drives.
   e.g. sg_write_long --lba=1000 --xfer_len=580 /dev/<scsi_device>
3. Start the disk "repair" resync: echo repair > /sys/block/md7/md/sync_action
  
Actual results:

The medium error is detected (/sys/block/md7/md/mismatch_cnt is non-zero).
The system appears to hang for a few minutes before coming back. Performing
another "repair" detects the same medium error and does not correct the medium
error again.

Expected results:
The medium error should be corrected and the resync should complete within a
reasonable amount of time.

Additional info:
In raid1.c, function sync_request_write() needs to have code to perform a
"memcpy" to the correct page before the write I/O is performed to correct medium
error.

Patch:
--- /usr/src/kernels/linux-2.6.18.x86_64/drivers/md/raid1.c.orig	2008-01-21
10:27:01.000000000 -0500
+++ raid1.c	2008-01-21 10:16:55.000000000 -0500
@@ -1204,17 +1204,23 @@
 			}
 		r1_bio->read_disk = primary;
 		for (i=0; i<mddev->raid_disks; i++)
-			if (r1_bio->bios[i]->bi_end_io == end_sync_read &&
-			    test_bit(BIO_UPTODATE, &r1_bio->bios[i]->bi_flags)) {
+			if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
 				int j;
 				int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
 				struct bio *pbio = r1_bio->bios[primary];
 				struct bio *sbio = r1_bio->bios[i];
-				for (j = vcnt; j-- ; )
-					if (memcmp(page_address(pbio->bi_io_vec[j].bv_page),
-						   page_address(sbio->bi_io_vec[j].bv_page),
-						   PAGE_SIZE))
-						break;
+				if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
+					for (j = vcnt; j-- ; ) {
+						struct page *p, *s;
+						p = pbio->bi_io_vec[j].bv_page;
+						s = sbio->bi_io_vec[j].bv_page;
+						if (memcmp(page_address(p),
+							   page_address(s),
+							   PAGE_SIZE))
+							break;
+					}
+				} else
+					j = 0;
 				if (j >= 0)
 					mddev->resync_mismatches += r1_bio->sectors;
 				if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
@@ -1235,6 +1241,11 @@
 					sbio->bi_sector = r1_bio->sector +
 						conf->mirrors[i].rdev->data_offset;
 					sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
+					for (j = 0; j < vcnt ; j++)
+						memcpy(page_address(sbio->bi_io_vec[j].bv_page),
+						       page_address(pbio->bi_io_vec[j].bv_page),
+						       PAGE_SIZE);
+
 				}
 			}
 	}
Comment 1 Andrius Benokraitis 2008-01-22 16:12:07 EST
Simon, in the future you can attach the patch in the "attachment" section of
this BZ.
Comment 2 Andrius Benokraitis 2008-01-23 10:21:19 EST
With Stratus' onsite engineer currently in flux, doug has graciously agreed to
push this through the devel process. I will keep an eye on this if any other
feedback is needed from Simon@Stratus in the meantime.
Comment 7 Doug Ledford 2008-01-29 14:40:39 EST
Created attachment 293321 [details]
Upstream commits

OK, the patch listed in this report is a combination of two different upstream
git commits.  However, there also looked like there was a third relevant
commit.  So, I created this patch file which shows all three git commits (in
chronological order).  They apply to our current RHEL5 kernel (with a little
fuzz).	I need Stratus to build a kernel with this patch and verify that it
still solves their problem.  Thanks.
Comment 9 Andrius Benokraitis 2008-01-29 14:45:08 EST
Simon, would you be able to take care of what Doug is looking for in Comment #7?
Comment 10 Andrius Benokraitis 2008-01-29 14:58:30 EST
From Dave@Stratus in email:

--
Simon;

I can do this for you. I will build a kernel with this patch and do the
test. I will send back the results.

Dave
Comment 11 Doug Ledford 2008-01-29 15:01:35 EST
Patch submitted internally for review.
Comment 12 Andrius Benokraitis 2008-01-29 15:11:57 EST
Thanks Doug!
Comment 13 David Fairbanks 2008-01-30 15:58:13 EST
I have patched raid1.c with the above patch and built a new kernel.

After much testing, the patch does indeed fix the "repair" functionality. The 
medium errors are detected and fixed correctly.

However, this patch changes the operation of the "check" functionality.
Prior to this patch the "check" operation did NOT fix any medium errors. It 
only detected them and incremented the mismatch_cnt sysfs variable.

The 3rd section of the supplied patch changes the functionality of the "check" 
operation. With this patch, the "check" operation "corrects" the medium errors 
in addition to just detecting them. 

I have included an excerpt from the md man page below that explains the use of 
the "check" and "repair" operations.

-----------------------------------
Excerpt from md man page:

 md/sync_action

  This can be used to monitor and control the resync/recovery process of  MD.
  In  particular,  writing "check" here will cause the array to read all data
  block and check that they are consistent (e.g. parity is  correct,  or  all
  mirror  replicas are the same).  Any discrepancies found are NOT corrected.

  A count of problems found will be stored in md/mismatch_count.

  Alternately, "repair" can be written which will cause the same check to  be
  performed, but any errors will be corrected.

--------------------------------

Question:
What is the correct functionality for the "check" operation?

If the "check" operation is NOT supposed to correct medium errors, then the 
enclosed patch should not perform the third section of the patch. (The first 
two sections are correct.)

If the "check" operation IS supposed to correct disk errors, then the enclosed 
patch works correctly.

Thanks;

Dave Fairbanks, Stratus
Comment 14 Doug Ledford 2008-01-30 16:34:57 EST
Well, the third patch is from Neil Brown and is in current upstream kernels.  As
far as I can tell, the check pass now detects and records mismatched blocks, but
only on blocks that were able to be read successfully in the first place.

I think Neil's logic behind this decision is that if someone wanted to run a
check pass on an array, then finding an inconsistency in the array implies that
we have two possibly correct pieces of data, and by not correcting it
automatically the user can extract whichever data block is correct and make sure
that it is the data block copied to the other mirror.  For blocks that weren't
able to be read, there is no possibility that their data might be correct, and
so these are auto-corrected.

For all I know, that's what Neil intended in the first place.  The md man page
excerpt above is silent on the issue of failed reads and could be interpreted
either way.

My preference in a situation like this is to follow upstream, so I would say the
patch is correct.
Comment 15 David Fairbanks 2008-01-30 16:51:55 EST
Sounds good to me. Let's go with it.

Thanks;

Dave Fairbanks, Stratus
Comment 16 Don Zickus 2008-02-06 15:55:59 EST
in 2.6.18-78.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 18 John Poelstra 2008-03-20 23:53:11 EDT
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot1--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you
Comment 20 errata-xmlrpc 2008-05-21 11:07:11 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.