132980 – raid1 mirrors that lose a drive go into endless redirecting loop

Bug 132980 - raid1 mirrors that lose a drive go into endless redirecting loop

Summary: raid1 mirrors that lose a drive go into endless redirecting loop

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-09-20 16:24 UTC by Hrunting Johnson
Modified:	2015-01-04 22:09 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-04-16 04:36:54 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Hrunting Johnson 2004-09-20 16:24:56 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20040913 Firefox/0.10

Description of problem:
When a drive fails out of a RAID1 mirror, the RAID goes into an
endless "redirecting sector" loop that requires a reboot to stop. 
When this happens, the RAID becomes unusable and anything that tries
to access it gets hung.

Version-Release number of selected component (if applicable):
kernel-2.6.8-1.521

How reproducible:
Always

Steps to Reproduce:
1. create raid1 mirror
2. mdadm /dev/mdX --set-faulty /dev/sdX
3. tail -f /var/log/messages
    

Actual Results:  RAID goes into endless "redirecting sector" loop
(messages sent to klogd/console).  RAID is unusable.

Expected Results:  Drive fails out cleanly.  RAID operation continues
on one drive.

Additional info:

A patch has been posted to the linux-raid mailing list that fixes this
problem:  http://marc.theaimsgroup.com/?l=linux-raid&m=109527014728404&w=2

Comment 1 Konstantin Olchanski 2004-10-18 22:30:35 UTC

METOO, but on a production system with a twist and a real hardware fault:
1) raid1 mirror /dev/hda2, /dev/hdc2
2) disk hda develops a hardware fault (unreadable sectors)
3) system freezes after the first "i/o error sector blah..." followed
by "raid1: disk failure hda2, disabling", "raid1: hda2 rescheduling
blah...", SYSTEM FROZEN SOLID.
4) reboot, hdc2 is no longer in the raid volume (out of sync, disabled
pending mirror rebuild?)
5) /dev/md0 uses the faulty disk hda until it hits the unreadable
sectors, then goes into infinite loop as described in this bug: "try
to read hda2, get i/o error", "raid1: hda2: reschedule blah...".
K.O.

Comment 2 Hrunting Johnson 2004-10-20 13:36:36 UTC

Neil Brown, maintainer of the md module, released a series of patches
today that fix this problem and a couple of others.  It would be great
if this could be merged into the next kernel release:

http://marc.theaimsgroup.com/?l=linux-raid&m=109824318228668&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=109824318202358&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=109824318216933&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=109824318110429&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=109824321013934&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=109824307111239&w=2

I'm also bumping the severity of this to high since I've personally
lost data when these machines go into their crash because of resulting
fs inconsistencies.

Comment 3 Dave Jones 2005-04-16 04:36:54 UTC

Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.

Note You need to log in before you can comment on or make changes to this bug.