From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20040913 Firefox/0.10 Description of problem: When a drive fails out of a RAID1 mirror, the RAID goes into an endless "redirecting sector" loop that requires a reboot to stop. When this happens, the RAID becomes unusable and anything that tries to access it gets hung. Version-Release number of selected component (if applicable): kernel-2.6.8-1.521 How reproducible: Always Steps to Reproduce: 1. create raid1 mirror 2. mdadm /dev/mdX --set-faulty /dev/sdX 3. tail -f /var/log/messages Actual Results: RAID goes into endless "redirecting sector" loop (messages sent to klogd/console). RAID is unusable. Expected Results: Drive fails out cleanly. RAID operation continues on one drive. Additional info: A patch has been posted to the linux-raid mailing list that fixes this problem: http://marc.theaimsgroup.com/?l=linux-raid&m=109527014728404&w=2
METOO, but on a production system with a twist and a real hardware fault: 1) raid1 mirror /dev/hda2, /dev/hdc2 2) disk hda develops a hardware fault (unreadable sectors) 3) system freezes after the first "i/o error sector blah..." followed by "raid1: disk failure hda2, disabling", "raid1: hda2 rescheduling blah...", SYSTEM FROZEN SOLID. 4) reboot, hdc2 is no longer in the raid volume (out of sync, disabled pending mirror rebuild?) 5) /dev/md0 uses the faulty disk hda until it hits the unreadable sectors, then goes into infinite loop as described in this bug: "try to read hda2, get i/o error", "raid1: hda2: reschedule blah...". K.O.
Neil Brown, maintainer of the md module, released a series of patches today that fix this problem and a couple of others. It would be great if this could be merged into the next kernel release: http://marc.theaimsgroup.com/?l=linux-raid&m=109824318228668&w=2 http://marc.theaimsgroup.com/?l=linux-raid&m=109824318202358&w=2 http://marc.theaimsgroup.com/?l=linux-raid&m=109824318216933&w=2 http://marc.theaimsgroup.com/?l=linux-raid&m=109824318110429&w=2 http://marc.theaimsgroup.com/?l=linux-raid&m=109824321013934&w=2 http://marc.theaimsgroup.com/?l=linux-raid&m=109824307111239&w=2 I'm also bumping the severity of this to high since I've personally lost data when these machines go into their crash because of resulting fs inconsistencies.
Fedora Core 2 has now reached end of life, and no further updates will be provided by Red Hat. The Fedora legacy project will be producing further kernel updates for security problems only. If this bug has not been fixed in the latest Fedora Core 2 update kernel, please try to reproduce it under Fedora Core 3, and reopen if necessary, changing the product version accordingly. Thank you.