From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1 Description of problem: mdadm /dev/md7 --fail /dev/hdd5 Resulted in /dev/hdd5 AND /dev/hdc5 marked as failed. Situation: Have RAID5 (Software) installation on IDE. Mobo Gigabyte 7VRX (has 4 IDE channels) Initial setup was hda, hdc, hdd (on IDE 1 & 2) Strategy: Move third raid device (hdd) to IDE 3 Added in hdg. -> mdadm /dev/md7 --add /dev/hdg1 mdstat confirmed additional md device Attempted to fail out hdd5 Result: Both /dev/hdc5 and /dev/hdd5 marked (F) ? Is there anyway to UNmark these devices?? OR MORE IMPORTANTLY recover the "failed" devices? Version-Release number of selected component (if applicable): recent - Last up2date run 03/03/05 How reproducible: Didn't try Steps to Reproduce: Not game to attempt to reporoduce such error at this stage. Additional info:
To move a drive from one controller to another doesn't require removing/adding the drive from the array. You simply shut down the machine, move the drive, then at startup it detects that you have moved the drive and puts it back in the array from the new device location. If you are wanting to change the physical disk that the data resides on, then you have to do what you tried to do. However, a word of caution: IDE drives now a days are, unfortunately, not what I would call high reliability devices. Any time you move data from one drive to another like this, you are taking a device offline and forcing the array into degraded mode, at which point it is no longer fault tolerant, and then telling it to rebuild onto a different drive. The risk is that something will go wrong during that rebuild. For IDE drives, I recommend that prior to doing something like this, you always do something like dd if=/dev/hda of=/dev/null and do that for each drive you have in your current array as a quick read test to make sure there are no bad blocks hiding in rarely/never used parts of the drive. Now, to your specific case, when you added /dev/hdg1 it should have just became a hot spare. Once you then removed /dev/hdd5, it should have been marked as Failed, and reconstruction should have started on /dev/hdg1. At that point, the raid subsystem would have to read every single block on /dev/hda5 and /dev/hdc5 in order to reconstruct /dev/hdg1, and if /dev/hdc5 had any bad sectors, then it would end up failing as a result and taking the array offline. I'm guessing that's what happened here. If you still can, check your logs for any error messages indicating I/O errors to /dev/hdc5. If that's what happened, then your next option is to reboot into rescue more and use mdadm to manually assemble the raid5 array. To do that, do something like: mdadm -A /dev/md7 --force --run --update=summaries /dev/hda5 /dev/hdc5 failed I wouldn't try to add /dev/hdd5 back into the array, I would just try to get it back into the degraded state it was in before. However, if you know for certain that you didn't write to the array after failing /dev/hdd5, then you could bring the array back up with all three devices. The problem is, if the array was still active after you removed /dev/hdd5, then any writes that would have went to /dev/hdd5 would have been stored in parity blocks on /dev/hda5 and /dev/hdc5 instead, and if you bring /dev/hdd5 back into the array as a clean device, we'll read from it instead of the parity blocks and get stale data, possibly resulting in a corrupted filesystem. Instead, you have to readd /dev/hdd5 as a new disk and let it get rebuilt (although since you have to rebuild a drive anyway, rebuilding /dev/hdg1 makes more sense than rebuilding to /dev/hdd5 and having to start the move process over again). Hope that helps.
No activity in multiple months, closing.