Bug 132980 - raid1 mirrors that lose a drive go into endless redirecting loop
raid1 mirrors that lose a drive go into endless redirecting loop
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
2
All Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-09-20 12:24 EDT by Hrunting Johnson
Modified: 2015-01-04 17:09 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-04-16 00:36:54 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Hrunting Johnson 2004-09-20 12:24:56 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20040913 Firefox/0.10

Description of problem:
When a drive fails out of a RAID1 mirror, the RAID goes into an
endless "redirecting sector" loop that requires a reboot to stop. 
When this happens, the RAID becomes unusable and anything that tries
to access it gets hung.

Version-Release number of selected component (if applicable):
kernel-2.6.8-1.521

How reproducible:
Always

Steps to Reproduce:
1. create raid1 mirror
2. mdadm /dev/mdX --set-faulty /dev/sdX
3. tail -f /var/log/messages
    

Actual Results:  RAID goes into endless "redirecting sector" loop
(messages sent to klogd/console).  RAID is unusable.

Expected Results:  Drive fails out cleanly.  RAID operation continues
on one drive.

Additional info:

A patch has been posted to the linux-raid mailing list that fixes this
problem:  http://marc.theaimsgroup.com/?l=linux-raid&m=109527014728404&w=2
Comment 1 Konstantin Olchanski 2004-10-18 18:30:35 EDT
METOO, but on a production system with a twist and a real hardware fault:
1) raid1 mirror /dev/hda2, /dev/hdc2
2) disk hda develops a hardware fault (unreadable sectors)
3) system freezes after the first "i/o error sector blah..." followed
by "raid1: disk failure hda2, disabling", "raid1: hda2 rescheduling
blah...", SYSTEM FROZEN SOLID.
4) reboot, hdc2 is no longer in the raid volume (out of sync, disabled
pending mirror rebuild?)
5) /dev/md0 uses the faulty disk hda until it hits the unreadable
sectors, then goes into infinite loop as described in this bug: "try
to read hda2, get i/o error", "raid1: hda2: reschedule blah...".
K.O.
Comment 2 Hrunting Johnson 2004-10-20 09:36:36 EDT
Neil Brown, maintainer of the md module, released a series of patches
today that fix this problem and a couple of others.  It would be great
if this could be merged into the next kernel release:

http://marc.theaimsgroup.com/?l=linux-raid&m=109824318228668&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=109824318202358&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=109824318216933&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=109824318110429&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=109824321013934&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=109824307111239&w=2

I'm also bumping the severity of this to high since I've personally
lost data when these machines go into their crash because of resulting
fs inconsistencies.
Comment 3 Dave Jones 2005-04-16 00:36:54 EDT
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.

Note You need to log in before you can comment on or make changes to this bug.