Description of problem: The existing RHEL 5.1 provides disk resync functionality for "check" and "repair". The "check" functionality appears to work correctly. However, the "repair" functionality does not correct a medium error on the disk. The medium error is detected, but not corrected. Version-Release number of selected component (if applicable): RHEL 5.1 (2.6.18-53) has bug, upstream kernel 2.6.23-9 has the bug fix. How reproducible: This is reproducible every time (100%) a medium error is encountered on a disk. Steps to Reproduce: 1. Bring an md device to full sync. 2. Use sg_write_long to inject a medium error onto one of the physical drives. e.g. sg_write_long --lba=1000 --xfer_len=580 /dev/<scsi_device> 3. Start the disk "repair" resync: echo repair > /sys/block/md7/md/sync_action Actual results: The medium error is detected (/sys/block/md7/md/mismatch_cnt is non-zero). The system appears to hang for a few minutes before coming back. Performing another "repair" detects the same medium error and does not correct the medium error again. Expected results: The medium error should be corrected and the resync should complete within a reasonable amount of time. Additional info: In raid1.c, function sync_request_write() needs to have code to perform a "memcpy" to the correct page before the write I/O is performed to correct medium error. Patch: --- /usr/src/kernels/linux-2.6.18.x86_64/drivers/md/raid1.c.orig 2008-01-21 10:27:01.000000000 -0500 +++ raid1.c 2008-01-21 10:16:55.000000000 -0500 @@ -1204,17 +1204,23 @@ } r1_bio->read_disk = primary; for (i=0; i<mddev->raid_disks; i++) - if (r1_bio->bios[i]->bi_end_io == end_sync_read && - test_bit(BIO_UPTODATE, &r1_bio->bios[i]->bi_flags)) { + if (r1_bio->bios[i]->bi_end_io == end_sync_read) { int j; int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9); struct bio *pbio = r1_bio->bios[primary]; struct bio *sbio = r1_bio->bios[i]; - for (j = vcnt; j-- ; ) - if (memcmp(page_address(pbio->bi_io_vec[j].bv_page), - page_address(sbio->bi_io_vec[j].bv_page), - PAGE_SIZE)) - break; + if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) { + for (j = vcnt; j-- ; ) { + struct page *p, *s; + p = pbio->bi_io_vec[j].bv_page; + s = sbio->bi_io_vec[j].bv_page; + if (memcmp(page_address(p), + page_address(s), + PAGE_SIZE)) + break; + } + } else + j = 0; if (j >= 0) mddev->resync_mismatches += r1_bio->sectors; if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) { @@ -1235,6 +1241,11 @@ sbio->bi_sector = r1_bio->sector + conf->mirrors[i].rdev->data_offset; sbio->bi_bdev = conf->mirrors[i].rdev->bdev; + for (j = 0; j < vcnt ; j++) + memcpy(page_address(sbio->bi_io_vec[j].bv_page), + page_address(pbio->bi_io_vec[j].bv_page), + PAGE_SIZE); + } } }
Simon, in the future you can attach the patch in the "attachment" section of this BZ.
With Stratus' onsite engineer currently in flux, doug has graciously agreed to push this through the devel process. I will keep an eye on this if any other feedback is needed from Simon@Stratus in the meantime.
Created attachment 293321 [details] Upstream commits OK, the patch listed in this report is a combination of two different upstream git commits. However, there also looked like there was a third relevant commit. So, I created this patch file which shows all three git commits (in chronological order). They apply to our current RHEL5 kernel (with a little fuzz). I need Stratus to build a kernel with this patch and verify that it still solves their problem. Thanks.
Simon, would you be able to take care of what Doug is looking for in Comment #7?
From Dave@Stratus in email: -- Simon; I can do this for you. I will build a kernel with this patch and do the test. I will send back the results. Dave
Patch submitted internally for review.
Thanks Doug!
I have patched raid1.c with the above patch and built a new kernel. After much testing, the patch does indeed fix the "repair" functionality. The medium errors are detected and fixed correctly. However, this patch changes the operation of the "check" functionality. Prior to this patch the "check" operation did NOT fix any medium errors. It only detected them and incremented the mismatch_cnt sysfs variable. The 3rd section of the supplied patch changes the functionality of the "check" operation. With this patch, the "check" operation "corrects" the medium errors in addition to just detecting them. I have included an excerpt from the md man page below that explains the use of the "check" and "repair" operations. ----------------------------------- Excerpt from md man page: md/sync_action This can be used to monitor and control the resync/recovery process of MD. In particular, writing "check" here will cause the array to read all data block and check that they are consistent (e.g. parity is correct, or all mirror replicas are the same). Any discrepancies found are NOT corrected. A count of problems found will be stored in md/mismatch_count. Alternately, "repair" can be written which will cause the same check to be performed, but any errors will be corrected. -------------------------------- Question: What is the correct functionality for the "check" operation? If the "check" operation is NOT supposed to correct medium errors, then the enclosed patch should not perform the third section of the patch. (The first two sections are correct.) If the "check" operation IS supposed to correct disk errors, then the enclosed patch works correctly. Thanks; Dave Fairbanks, Stratus
Well, the third patch is from Neil Brown and is in current upstream kernels. As far as I can tell, the check pass now detects and records mismatched blocks, but only on blocks that were able to be read successfully in the first place. I think Neil's logic behind this decision is that if someone wanted to run a check pass on an array, then finding an inconsistency in the array implies that we have two possibly correct pieces of data, and by not correcting it automatically the user can extract whichever data block is correct and make sure that it is the data block copied to the other mirror. For blocks that weren't able to be read, there is no possibility that their data might be correct, and so these are auto-corrected. For all I know, that's what Neil intended in the first place. The md man page excerpt above is silent on the issue of failed reads and could be interpreted either way. My preference in a situation like this is to follow upstream, so I would say the patch is correct.
Sounds good to me. Let's go with it. Thanks; Dave Fairbanks, Stratus
in 2.6.18-78.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot1--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html