668529 – Spare disk added to a raid1 array by mdadm command is dropped upon next boot.

Bug 668529 - Spare disk added to a raid1 array by mdadm command is dropped upon next boot.

Summary: Spare disk added to a raid1 array by mdadm command is dropped upon next boot.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Doug Ledford
QA Contact:	Boris Ranto
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	647274 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-01-10 18:12 UTC by Dwight (Bud) Brown
Modified:	2018-11-26 19:30 UTC (History)
CC List:	8 users (show)
Fixed In Version:	kernel-2.6.18-289.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-02-21 03:30:09 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Legacy)	39396	None	None	None	Never
Red Hat Knowledge Base (Legacy)	43298	None	None	None	Never
Red Hat Product Errata	RHSA-2012:0150	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux 5.8 kernel update	2012-02-21 07:35:24 UTC

Description Dwight (Bud) Brown 2011-01-10 18:12:01 UTC

Description of problem:
Spare disk added to a raid1 array by mdadm command is dropped on next boot.

If you add, remove, re-add a spare and then reboot the spare is lost due to mismatched sb info.


Version-Release number of selected component (if applicable):
RHEL 5.3


How reproducible:
100%


Steps to Reproduce:
1. create a raid1 array with a spare. 
   + All three devices should have the same sb event count (seen 
     with mdadm --examine). 
2. Use mdadm to remove the spare. 
   + Verify with /proc/mdstat that it is removed. 
3. Use mdadm to add the spare back to the same array. 
   + Verify with /proc/mdstat that the spare is part of the array. 
     You can use --examine to show that the spare sb has an event 
     count that is two less than the md and primary/secondary disks.
4. Reboot RHEL and the array will be autoassembled,either via a "nash 
   raidautorun <dev>" command or command or "/sbin/mdadm -A -s". 
5. Spare is dropped because it is "non-fresh".


  
Actual results:
Added spare is dropped.


Expected results:
Results of mdadm commands such as --add should persist across boots, spare
not dropped.


Additional info:

Fixed in the following upstream commit in 2.6.19.3 to make sure superblocks are
properly refreshed.  This code is already within RHEL6 stream.

"
[PATCH] md: assorted md and raid1 one-liners
author  NeilBrown <neilb>
        Sun, 10 Dec 2006 10:20:52 +0000 (02:20 -0800)
committer       Linus Torvalds <torvalds.org>
        Sun, 10 Dec 2006 17:57:21 +0000 (09:57 -0800)
commit  1757128438d41670ded8bc3bc735325cc07dc8f9
tree    e85679cbe949e337616ac53ab3b3fd1a3fa14a63        tree
parent  c2b00852fbae4f8c45c2651530ded3bd01bde814        commit | diff
[PATCH] md: assorted md and raid1 one-liners

Fix few bugs that meant that:
  - superblocks weren't alway written at exactly the right time (this
    could show up if the array was not written to - writting to the array
    causes lots of superblock updates and so hides these errors).

  - restarting device recovery after a clean shutdown (version-1 metadata
    only) didn't work as intended (or at all).

1/ Ensure superblock is updated when a new device is added.  <<<<<<
2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync.
   The body of this if takes one of two branches depending on whether
   MD_RECOVERY_SYNC is set, so testing it in the clause of the if
   is wrong.
3/ Flag superblock for updating after a resync/recovery finishes.
4/ If we find the neeed to restart a recovery in the middle (version-1
   metadata only) make sure a full recovery (not just as guided by
   bitmaps) does get done.

"

Patch consists of 4 line changes: 3 additions and 1 removal.  Customer
has added just one line from the full patch and tested to show this 
addresses the immediate issue.  The patch/line added was in add_new_disk():

                if (err)
                        export_rdev(rdev);
+               md_update_sb(mddev);
 
                set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);

Comment 1 Debbie Johnson 2011-04-25 16:53:34 UTC

From SFDC ticket below..  Any idea if this will make it into RHEL5.7?


Do we have any update on this case and when it will be ported to RHEL5. The customer is using a custom kernel with the upstream patch and this works for them. 

Knowing an approximate time period would help in setting the SLA correctly.

Comment 2 RHEL Program Management 2011-06-20 22:27:15 UTC

This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 3 Doug Ledford 2011-10-04 20:07:03 UTC

*** Bug 647274 has been marked as a duplicate of this bug. ***

Comment 4 RHEL Program Management 2011-10-06 15:51:09 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Jarod Wilson 2011-10-14 18:31:17 UTC

Patch(es) available in kernel-2.6.18-289.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 10 errata-xmlrpc 2012-02-21 03:30:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0150.html

Note You need to log in before you can comment on or make changes to this bug.