Bug 475634

Summary:	disk re-added as spare in raid10
Product:	[Fedora] Fedora	Reporter:	Dimitri Maziuk <dmaziuk>
Component:	mdadm	Assignee:	Doug Ledford <dledford>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	high	Docs Contact:
Priority:	low
Version:	9	CC:	dledford
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-12-09 21:24:56 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dimitri Maziuk 2008-12-09 21:08:59 UTC

Description of problem:

replacement drive is added as spare instead of active, to already degraded raid. Which means next drive failure will kill the data.

Version-Release number of selected component (if applicable):

mdadm-2.6.7.1-1.fc9.x86_64

Steps to Reproduce:
1. fail and remove a drive (/dev/sde1 below) from the array
2. replace the drive, create partition, and re-add it to the array
(mdadm /dev/md0 --re-add /dev/sde1)

Actual results:

       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       0        0        3      removed
       4       8       81        4      active sync   /dev/sdf1
       5       8       97        5      active sync   /dev/sdg1

       6       8       65        -      spare   /dev/sde1


Expected results:

/dev/sde1 should be added as #3 and array should be resyncing.

Additional info:
[root@octopus ~]# mdadm --stop /dev/md0
mdadm: stopped /dev/md0                
[root@octopus ~]# mdadm -v --assemble --update=resync /dev/md0 /dev/sdb1 
/dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
mdadm: looking for devices for /dev/md0                                                                                   
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.                                                           
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.                                                           
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.                                                           
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 6.                                                           
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.                                                           
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.                                                           
mdadm: added /dev/sdc1 to /dev/md0 as 1                                                                                   
mdadm: added /dev/sdd1 to /dev/md0 as 2                                                                                   
mdadm: no uptodate device for slot 3 of /dev/md0
mdadm: added /dev/sdf1 to /dev/md0 as 4
mdadm: added /dev/sdg1 to /dev/md0 as 5
mdadm: added /dev/sde1 to /dev/md0 as 6
mdadm: added /dev/sdb1 to /dev/md0 as 0
mdadm: /dev/md0 has been started with 5 drives (out of 6) and 1 spare.

Comment 1 Doug Ledford 2008-12-09 21:24:56 UTC

This is not a bug.  You are attempting to re-add a drive that wasn't part of the array in the first place.  Re-add is only for adding back an existing member of an array.  To add a totally new drive to an array and cause it to replace a missing drive, use add.

Comment 2 Dimitri Maziuk 2008-12-10 17:07:18 UTC

Let me get this straight:

I have an array with failed drive. It's in the degraded state, another drive failure will destroy all data. To prevent that (which is the whole point of raid), it should start resyncing as soon as another drive becomes available.

When the drive actually becomes available, it gets silently marked as spare, there is no resync, and array continues in degraded state.

In what version of reality is this not a critical bug?

Comment 3 Doug Ledford 2008-12-10 17:30:22 UTC

(In reply to comment #2)
> Let me get this straight:
> 
> I have an array with failed drive. It's in the degraded state, another drive
> failure will destroy all data. To prevent that (which is the whole point of
> raid), it should start resyncing as soon as another drive becomes available.
> 
> When the drive actually becomes available, it gets silently marked as spare,
> there is no resync, and array continues in degraded state.
> 
> In what version of reality is this not a critical bug?

In the reality where you used the wrong command to add the new drive into the array.  If it bothers you so much to use the right command to get the actions you want, then I'm afraid computers aren't for you.  The re-add command specifically states it is used to add an already existing member of an array back into the array.  The add command is used to add a new drive into an array.

From the mdadm man page:
For Manage mode:
       -a, --add
              hot-add listed devices.

       --re-add
              re-add a device that was recently removed from an array.

In this context, the man page is referring to a physical drive with an already existing superblock for the given array when it says "a device that was recently removed from an array."  A totally new drive does not already contain a superblock for that array and therefore does not meet this criteria.  The add command is the proper command to use.  If you would like the --re-add command to automatically fall back to --add behaviour in the event that the given device has not previously been a member of the array, then that's a valid feature request for upstream, but as the --re-add command was never intended to be used in the way you are using it and that there *is* a specific command for what you are doing, that it doesn't do what you want is not a bug.

Comment 4 Dimitri Maziuk 2008-12-10 19:08:49 UTC

(In reply to comment #3)
 
> From the mdadm man page:
> For Manage mode:
>        -a, --add
>               hot-add listed devices.
> 
>        --re-add
>               re-add a device that was recently removed from an array.
> 
> In this context, the man page is referring to a physical drive with an already
> existing superblock for the given array when it says "a device that was
> recently removed from an array."  A totally new drive does not already contain
> a superblock for that array and therefore does not meet this criteria.  The add
> command is the proper command to use.

First of all, wouldn't it be nice if the manpage actually explained that so I would've used the right command to begin with.

>  If you would like the --re-add command
> to automatically fall back to --add behaviour in the event that the given
> device has not previously been a member of the array, then that's a valid
> feature request for upstream, but as the --re-add command was never intended to
> be used in the way you are using it and that there *is* a specific command for
> what you are doing, that it doesn't do what you want is not a bug.

Secondly, if --re-add cannot add a brand new device, it should fail with "can't add /dev/xyz because it wasn't a member of the array" instead of doing what it does -- I'd say it's bug. 

Fundamentally, if a device is added to *degraded* array, the array shouldn't care if the device is a "spare", "replacement", or "pink polka dots" -- it should make it active and start resyncing right away (assuming it's the right size etc.) Your suggestion: make --re-add fall back to --add is one way to achieve that. Another way is leave it as it is: add device as a spare, but have mdadm start using the spare immediately.

Of course, the intuitive feature would be the "--replace-failed-drive-with-this-device-here" option. Ability to re-add the exact same drive can be useful while you're writing and debugging mdadm, but how often do you need it in production use?

Comment 5 Doug Ledford 2008-12-16 15:30:21 UTC

(In reply to comment #4)
> 
> First of all, wouldn't it be nice if the manpage actually explained that so I
> would've used the right command to begin with.

The --add command is right before this one in the option list, so I'm curious as to what you thought it was for if not for adding new devices into the array?

> >  If you would like the --re-add command
> > to automatically fall back to --add behaviour in the event that the given
> > device has not previously been a member of the array, then that's a valid
> > feature request for upstream, but as the --re-add command was never intended to
> > be used in the way you are using it and that there *is* a specific command for
> > what you are doing, that it doesn't do what you want is not a bug.
> 
> Secondly, if --re-add cannot add a brand new device, it should fail with "can't
> add /dev/xyz because it wasn't a member of the array" instead of doing what it
> does -- I'd say it's bug. 

That very well may be true, I'll submit that upstream.

> Fundamentally, if a device is added to *degraded* array, the array shouldn't
> care if the device is a "spare", "replacement", or "pink polka dots" -- it
> should make it active and start resyncing right away (assuming it's the right
> size etc.) Your suggestion: make --re-add fall back to --add is one way to
> achieve that. Another way is leave it as it is: add device as a spare, but have
> mdadm start using the spare immediately.

--re-add does more than just add a device to an array.  When combined with a bitmap, it checks the bitmap on the readded device versus the existing bitmap on the still good devices and resyncs any sections that have been written to since the drive was kicked from the array.  It didn't start a resync because the new drive didn't have all the information necessary for the stack to figure out which parts have changed since the device was kicked.

> Of course, the intuitive feature would be the
> "--replace-failed-drive-with-this-device-here" option. Ability to re-add the
> exact same drive can be useful while you're writing and debugging mdadm, but
> how often do you need it in production use?

It's actually quite useful in the event that a drive has a single sector failure, gets kicked from the array, but is otherwise functional.  You can readd the device to the array and it won't resync the entire drive, just the parts that have changed since it got kicked (which will include rewriting the bad sector which usually fixes the problem as the drive remaps the bad sector to a good one on write automatically).  Or maybe you have an external USB drive and you accidentally unplugged the cable and the drive got kicked from the array.  Again, --re-add would be the right choice there.

But I'm still brought back to the original question of what made you choose --re-add over --add?  We have --grow, which is what you use to actually reshape a device (for example, changing a 4 drive raid5 array to a 5 drive raid5 array).  So since we have --grow for reshaping an array, it wouldn't make any sense for --add to do that, and it doesn't (if it did, there would be additional options to --add such as specifying what layout to grow the raid array to).  So, --add is listed as adding drives to an array, but isn't used to reshape the array, so the added drives must be used for something.  If there are any drives missing, the new drive is first added as a spare (new drives always come in as a spare, they only get moved into the array proper *after* reconstruction has completed), then reconstruction is started on the spare, then when reconstruction is complete the drive is moved from a spare slot to the slot where the drive is missing and that signals completion of the reconstruction event and you no longer have a degraded array.  If the array wasn't degraded, then the new drive is just added as a hot spare so that if any drive ever does get kicked, reconstruction starts immediately on the spare drive.  Finally, you have --re-add, which if it worked as you believed it did, would be a functional equivalent to --add, so it stands to reason that there is something special about --re-add or it wouldn't need to exist.  The man page certainly isn't what I would call crystal clear on what that something special is, but it did draw a specific special condition that needed to be met.  Unfortunately, that condition, "adding a drive back into an array", could be interpreted to mean "replace the missing drive with this one" when an array is degraded instead of what it actually means which is "this specific drive used to be part of this array and is not any longer, please attempt to reintegrate it back where it came from".

I don't think this issue is as serious as you make it sound, as this is the first bug report I've had since --re-add was implemented, meaning that either the rest of the users that had your situation simply used --add from the beginning, or tried --re-add and switched to --add when it didn't work as it didn't in your case.  I will, however, submit a patch upstream to make the man page clearer on the difference between --add and --re-add.

Comment 6 Dimitri Maziuk 2008-12-16 16:38:45 UTC

(In reply to comment #5)

> But I'm still brought back to the original question of what made you choose
> --re-add over --add?

I guess probably because when I did unix 101, a "device" meant /dev/xyz. (I know it's not true anymore, but old habits etc.) So if I were replacing /dev/sde1 with, say, /dev/sdf1, I'd use --add or --grow... or if I had to specify component devices via major/minor, I'd see they're different... I don't think choosing --re-add is that strange when you're replacing /dev/sde1 with /dev/sde1.

> If there
> are any drives missing, the new drive is first added as a spare (new drives
> always come in as a spare, they only get moved into the array proper *after*
> reconstruction has completed), then reconstruction is started on the spare,

So when does the reconstruction start? -- this is the reason I filed this report in the first place, because it should start immediately and it didn't.

> I don't think this issue is as serious as you make it sound, as this is the
> first bug report I've had since --re-add was implemented, meaning that either
> the rest of the users that had your situation simply used --add from the
> beginning, or tried --re-add and switched to --add when it didn't work as it
> didn't in your case.

Perhaps. Or perhaps people aren't using Fedora on production kit so they're complainig somewhere else.

>  I will, however, submit a patch upstream to make the man
> page clearer on the difference between --add and --re-add.

Thanks, hopefully that'll stop someone else from making the same mistake.

Comment 7 Doug Ledford 2008-12-16 16:48:21 UTC

The reconstruction starts immediately if you use --add.

And although not too many people use Fedora for production kit, lots of people use RHEL and I get all those mdadm bug reports too.  The mdadm in RHEL has implemented --re-add for a while now.  So this is still a fairly rare occurrence.

Comment 8 Dimitri Maziuk 2008-12-16 18:15:53 UTC

(In reply to comment #7)
> The reconstruction starts immediately if you use --add.

My point is it should start immediately regardless of how you add a device. Getting out of degraded state should be the top priority: a degraded array with an unused spare, as my original listing shows, is just plain wrong, wouldn't you agree?

Comment 9 Doug Ledford 2008-12-16 18:22:49 UTC

No I wouldn't agree.  If a person *knows* what re-add is supposed to do, then the last thing you want to do is perform an add behind their back.  That could lead to destroyed data.  It's the element of least surprise that we need to follow here.  Taking it upon ourselves to treat the new device differently than the user requested we treat it is not good programming practice.  However, I would agree with your earlier statement that when the stack figures out that this device was not previously part of the array, then it should fail to perform a re-add operation and leave the drive alone, possible even suggesting that add is the correction action for a device that has not previously been part of the array.