Submitted yesterday as Bug 194966 (before the raid crash) Description of problem: During redesign of hard disk usage I want to decrease used devices on raid1 devices. Version-Release number of selected component (if applicable): mdadm-1.6.0-3 How reproducible: In special cases Steps to Reproduce: Old layout was that most of the raid1 devices had 3 devices. Current layout: # cat /proc/mdstat Personalities : [raid1] md4 : active raid1 hdc1[1] hda1[0] 120384 blocks [2/2] [UU] md1 : active raid1 hdc2[1] hda2[0] 7823552 blocks [2/2] [UU] md5 : active raid1 hdc5[0] hda5[2] 3911680 blocks [3/2] [U_U] md2 : active raid1 hdc6[0] hda6[2] 7823552 blocks [3/2] [U_U] md3 : active raid1 hdc7[0] hda7[2] 7823552 blocks [3/2] [U_U] md6 : active raid1 hdc9[0] hda9[2] 78132032 blocks [3/2] [U_U] md7 : active raid1 hdc10[0] hda10[1] 7823552 blocks [2/2] [UU] md0 : active raid1 hdc8[1] hda8[0] 39069952 blocks [2/2] [UU] unused devices: <none> # mdadm --grow -n 2 /dev/md5 Actual results: mdadm: Cannot set device size/shape for /dev/md5: Device or resource busy Expected results: Working, like for others already worked (e.g. md4, md1) Additional info: Looks like the reason is located in the sequential numbering of raid members: md4 (where grow -n 2 worked): md4 : active raid1 hdc1[1] hda1[0] 120384 blocks [2/2] [UU] -> [0] [1] md5 (where grow -n 2 is not working). md5 : active raid1 hdc5[0] hda5[2] 3911680 blocks [3/2] [U_U] -> [0] [2] Now the big question: is there any possibility to reorder the sequence number to [0] & [1] for all raid devices which have currently [0] & [2].
I'm seeing this problem as well. I had a mirror with two devices (sda2 and sdb2). I was attempting to "safely" replace sdb2 with sdc2, as follows: 1. hotadd sdc2 2. grow the number of raid disks from 2 to 3 3. wait for resync to sdc2 to complete 4. hotfail sdb2 5. hotremove sdb2 6. grow the number of raid disks from 3 to 2 However, step 6 fails: $ Personalities : [raid1] md0 : active raid1 sdc2[2] sda2[0] 35037696 blocks [3/2] [U_U] $ mdadm --detail /dev/md0 ... Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 0 0 -1 removed 2 8 34 2 active sync /dev/sdc2 $ mdadm --grow /dev/md0 --raid-disks=2 mdadm: Cannot set device size/shape for /dev/md0: Device or resource busy It's looking like the only way to effectively shrink the mirror back down to only 2 members is to destroy the mirror and reconstruct it anew. Since this array holds my root partition, this means I'm going to be forced to down the machine and boot from rescue media. This is a *really* nasty bug, because in my case, I could have trivially avoided it as follows: 1. hotadd sdc2 2. hotfail sdb2 3. wait for resync to sdc2 complete 4. hotremove sdb2 The reason I didn't just do this in the first place is because the "safe" procedure (above) keeps at least 2 members in the mirror at all times, and the man page gave me every indication that --grow would work for both increasing and decreasing the number of array members. If --grow mode doesn't work uniformly, it should be disable entirely, to prevent others from falling into this trap. (This is with RHEL4 on i686, kernel 2.6.9-42.0.2.ELsmp.)
Since we have a support contract with Red Hat, I've opened Service Request 1004974 to request servicing of this bug.
Actually, I just realized that it's possible to work around this bug, as follows: Instead of doing this: 1. hotadd sdc2 2. grow the number of raid disks from 2 to 3 3. wait for resync to sdc2 to complete 4. hotfail sdb2 5. hotremove sdb2 6. grow the number of raid disks from 3 to 2 Do this: 1. hotadd sdc2 2. grow the number of raid disks from 2 to 3 3. wait for resync to sdc2 to complete 4. hotfail sdb2 5. hotremove sdb2 6. hotfail sdc2 7. hotremove sdc2 8. grow the number of raid disks from 3 to 2 9. hotadd sdc2 Of course, doing that isn't any better than doing this: 1. hotadd sdc2 2. hotfail sdb2 3. wait for resync to sdc2 complete 4. hotremove sdb2 The only reason why you'd want to perform the second series of steps is if you were attempting to perform the first series of steps, but were bitten by the bug. In that case, you can perform steps 6-9 of the second series of steps to recover. Still, the bug with --grow mode should be fixed for RHEL. (I just tested my FC5 box, and it seems like it's not affected by this bug.)
The problem here is not an mdadm bug, but a kernel issue. The raid subsystem doesn't re-organize active array members from one slot to another, yet in order to grow the array to a smaller size, the last array member (/dev/sdc2) would be in a slot that is no longer valid. In other words, if we could get /dev/sdc2 down into /dev/sdb2's slot in the output of /proc/mdstat, then you could grow the array to a smaller size, but as it is, growing the array down would lop off the slot /dev/sdc2 is occupying. This will require kernel changes to resolve. The process of swapping a disc from one slot in the array to another is a race prone operation and if we happened to encounter an error at the same time, things could get rather confused, so it's not a change I would undertake lightly. I'll have management review this for possible inclusion in a future update.
Created attachment 298976 [details] fix backported to RHEL4.6 Upstream commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6ea9c07c6c6d1c14d9757dd8470dc4c85bbe9f28
Devel ACK for R4.7, Doug will post a patch for review.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Committed in 70.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html