Description of problem: Spare disk added to a raid1 array by mdadm command is dropped on next boot. If you add, remove, re-add a spare and then reboot the spare is lost due to mismatched sb info. Version-Release number of selected component (if applicable): RHEL 5.3 How reproducible: 100% Steps to Reproduce: 1. create a raid1 array with a spare. + All three devices should have the same sb event count (seen with mdadm --examine). 2. Use mdadm to remove the spare. + Verify with /proc/mdstat that it is removed. 3. Use mdadm to add the spare back to the same array. + Verify with /proc/mdstat that the spare is part of the array. You can use --examine to show that the spare sb has an event count that is two less than the md and primary/secondary disks. 4. Reboot RHEL and the array will be autoassembled,either via a "nash raidautorun <dev>" command or command or "/sbin/mdadm -A -s". 5. Spare is dropped because it is "non-fresh". Actual results: Added spare is dropped. Expected results: Results of mdadm commands such as --add should persist across boots, spare not dropped. Additional info: Fixed in the following upstream commit in 2.6.19.3 to make sure superblocks are properly refreshed. This code is already within RHEL6 stream. " [PATCH] md: assorted md and raid1 one-liners author NeilBrown <neilb> Sun, 10 Dec 2006 10:20:52 +0000 (02:20 -0800) committer Linus Torvalds <torvalds.org> Sun, 10 Dec 2006 17:57:21 +0000 (09:57 -0800) commit 1757128438d41670ded8bc3bc735325cc07dc8f9 tree e85679cbe949e337616ac53ab3b3fd1a3fa14a63 tree parent c2b00852fbae4f8c45c2651530ded3bd01bde814 commit | diff [PATCH] md: assorted md and raid1 one-liners Fix few bugs that meant that: - superblocks weren't alway written at exactly the right time (this could show up if the array was not written to - writting to the array causes lots of superblock updates and so hides these errors). - restarting device recovery after a clean shutdown (version-1 metadata only) didn't work as intended (or at all). 1/ Ensure superblock is updated when a new device is added. <<<<<< 2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync. The body of this if takes one of two branches depending on whether MD_RECOVERY_SYNC is set, so testing it in the clause of the if is wrong. 3/ Flag superblock for updating after a resync/recovery finishes. 4/ If we find the neeed to restart a recovery in the middle (version-1 metadata only) make sure a full recovery (not just as guided by bitmaps) does get done. " Patch consists of 4 line changes: 3 additions and 1 removal. Customer has added just one line from the full patch and tested to show this addresses the immediate issue. The patch/line added was in add_new_disk(): if (err) export_rdev(rdev); + md_update_sb(mddev); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
From SFDC ticket below.. Any idea if this will make it into RHEL5.7? Do we have any update on this case and when it will be ported to RHEL5. The customer is using a custom kernel with the upstream patch and this works for them. Knowing an approximate time period would help in setting the SLA correctly.
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.
*** Bug 647274 has been marked as a duplicate of this bug. ***
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available in kernel-2.6.18-289.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0150.html