Bug 194585

Summary: mdadm --grow -n 2 (old: 3) fails on particular raid1 devices
Product: Red Hat Enterprise Linux 4 Reporter: Peter Bieringer <pb>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 4.0CC: akarlsso, jplans, peterm, ralston
Target Milestone: ---Keywords: Patch
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:11:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
fix backported to RHEL4.6 none

Description Peter Bieringer 2006-06-14 08:32:26 UTC
Submitted yesterday as Bug 194966 (before the raid crash)

Description of problem:
During redesign of hard disk usage I want to decrease used devices on raid1 devices.

Version-Release number of selected component (if applicable):
mdadm-1.6.0-3

How reproducible:
In special cases

Steps to Reproduce:

Old layout was that most of the raid1 devices had 3 devices.

Current layout:
# cat /proc/mdstat
Personalities : [raid1]
md4 : active raid1 hdc1[1] hda1[0]
      120384 blocks [2/2] [UU]

md1 : active raid1 hdc2[1] hda2[0]
      7823552 blocks [2/2] [UU]

md5 : active raid1 hdc5[0] hda5[2]
      3911680 blocks [3/2] [U_U]

md2 : active raid1 hdc6[0] hda6[2]
      7823552 blocks [3/2] [U_U]

md3 : active raid1 hdc7[0] hda7[2]
      7823552 blocks [3/2] [U_U]

md6 : active raid1 hdc9[0] hda9[2]
      78132032 blocks [3/2] [U_U]

md7 : active raid1 hdc10[0] hda10[1]
      7823552 blocks [2/2] [UU]

md0 : active raid1 hdc8[1] hda8[0]
      39069952 blocks [2/2] [UU]

unused devices: <none>

# mdadm --grow -n 2 /dev/md5
  
Actual results:
mdadm: Cannot set device size/shape for /dev/md5: Device or resource busy

Expected results:
Working, like for others already worked (e.g. md4, md1)

Additional info:
Looks like the reason is located in the sequential numbering of raid members:

md4 (where grow -n 2 worked):
md4 : active raid1 hdc1[1] hda1[0]
      120384 blocks [2/2] [UU]

-> [0] [1]

md5 (where grow -n 2 is not working).
md5 : active raid1 hdc5[0] hda5[2]
      3911680 blocks [3/2] [U_U]

-> [0] [2]

Now the big question: is there any possibility to reorder the sequence number to
[0] & [1] for all raid devices which have currently [0] & [2].

Comment 1 James Ralston 2006-09-11 18:53:49 UTC
I'm seeing this problem as well.

I had a mirror with two devices (sda2 and sdb2).  I was attempting to "safely"
replace sdb2 with sdc2, as follows:

1.  hotadd sdc2
2.  grow the number of raid disks from 2 to 3
3.  wait for resync to sdc2 to complete
4.  hotfail sdb2
5.  hotremove sdb2
6.  grow the number of raid disks from 3 to 2

However, step 6 fails:

$ Personalities : [raid1]
md0 : active raid1 sdc2[2] sda2[0]
      35037696 blocks [3/2] [U_U]

$ mdadm --detail /dev/md0
...
    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       0        0       -1      removed
       2       8       34        2      active sync   /dev/sdc2

$ mdadm --grow /dev/md0 --raid-disks=2
mdadm: Cannot set device size/shape for /dev/md0: Device or resource busy

It's looking like the only way to effectively shrink the mirror back down to
only 2 members is to destroy the mirror and reconstruct it anew.  Since this
array holds my root partition, this means I'm going to be forced to down the
machine and boot from rescue media.

This is a *really* nasty bug, because in my case, I could have trivially avoided
it as follows:

1.  hotadd sdc2
2.  hotfail sdb2
3.  wait for resync to sdc2 complete
4.  hotremove sdb2

The reason I didn't just do this in the first place is because the "safe"
procedure (above) keeps at least 2 members in the mirror at all times, and the
man page gave me every indication that --grow would work for both increasing and
decreasing the number of array members.

If --grow mode doesn't work uniformly, it should be disable entirely, to prevent
others from falling into this trap.

(This is with RHEL4 on i686, kernel 2.6.9-42.0.2.ELsmp.)


Comment 2 James Ralston 2006-09-11 19:03:16 UTC
Since we have a support contract with Red Hat, I've opened Service Request
1004974 to request servicing of this bug.


Comment 3 James Ralston 2006-09-11 21:01:50 UTC
Actually, I just realized that it's possible to work around this bug, as follows:

Instead of doing this:

1.  hotadd sdc2
2.  grow the number of raid disks from 2 to 3
3.  wait for resync to sdc2 to complete
4.  hotfail sdb2
5.  hotremove sdb2
6.  grow the number of raid disks from 3 to 2

Do this:

1.  hotadd sdc2
2.  grow the number of raid disks from 2 to 3
3.  wait for resync to sdc2 to complete
4.  hotfail sdb2
5.  hotremove sdb2
6.  hotfail sdc2
7.  hotremove sdc2
8.  grow the number of raid disks from 3 to 2
9.  hotadd sdc2

Of course, doing that isn't any better than doing this:

1.  hotadd sdc2
2.  hotfail sdb2
3.  wait for resync to sdc2 complete
4.  hotremove sdb2

The only reason why you'd want to perform the second series of steps is if you
were attempting to perform the first series of steps, but were bitten by the
bug.  In that case, you can perform steps 6-9 of the second series of steps to
recover.

Still, the bug with --grow mode should be fixed for RHEL.  (I just tested my FC5
box, and it seems like it's not affected by this bug.)


Comment 4 Doug Ledford 2007-01-31 19:20:41 UTC
The problem here is not an mdadm bug, but a kernel issue.  The raid subsystem
doesn't re-organize active array members from one slot to another, yet in order
to  grow the array to a smaller size, the last array member (/dev/sdc2) would be
in a slot that is no longer valid.  In other words, if we could get /dev/sdc2
down into /dev/sdb2's slot in the output of /proc/mdstat, then you could grow
the array to a smaller size, but as it is, growing the array down would lop off
the slot /dev/sdc2 is occupying.  This will require kernel changes to resolve. 
The process of swapping a disc from one slot in the array to another is a race
prone operation and if we happened to encounter an error at the same time,
things could get rather confused, so it's not a change I would undertake
lightly.  I'll have management review this for possible inclusion in a future
update.

Comment 9 Peter Martuccelli 2008-04-22 20:08:05 UTC
Devel ACK for R4.7, Doug will post a patch for review.

Comment 10 RHEL Program Management 2008-04-22 20:11:20 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 11 Vivek Goyal 2008-05-16 17:35:12 UTC
Committed in 70.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 15 errata-xmlrpc 2008-07-24 19:11:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html