Red Hat Bugzilla – Bug 132980
raid1 mirrors that lose a drive go into endless redirecting loop
Last modified: 2015-01-04 17:09:54 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Description of problem:
When a drive fails out of a RAID1 mirror, the RAID goes into an
endless "redirecting sector" loop that requires a reboot to stop.
When this happens, the RAID becomes unusable and anything that tries
to access it gets hung.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. create raid1 mirror
2. mdadm /dev/mdX --set-faulty /dev/sdX
3. tail -f /var/log/messages
Actual Results: RAID goes into endless "redirecting sector" loop
(messages sent to klogd/console). RAID is unusable.
Expected Results: Drive fails out cleanly. RAID operation continues
on one drive.
A patch has been posted to the linux-raid mailing list that fixes this
METOO, but on a production system with a twist and a real hardware fault:
1) raid1 mirror /dev/hda2, /dev/hdc2
2) disk hda develops a hardware fault (unreadable sectors)
3) system freezes after the first "i/o error sector blah..." followed
by "raid1: disk failure hda2, disabling", "raid1: hda2 rescheduling
blah...", SYSTEM FROZEN SOLID.
4) reboot, hdc2 is no longer in the raid volume (out of sync, disabled
pending mirror rebuild?)
5) /dev/md0 uses the faulty disk hda until it hits the unreadable
sectors, then goes into infinite loop as described in this bug: "try
to read hda2, get i/o error", "raid1: hda2: reschedule blah...".
Neil Brown, maintainer of the md module, released a series of patches
today that fix this problem and a couple of others. It would be great
if this could be merged into the next kernel release:
I'm also bumping the severity of this to high since I've personally
lost data when these machines go into their crash because of resulting
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat. The Fedora legacy project will be producing further kernel
updates for security problems only.
If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.