Bug 82815

Summary: initiating md RAID1 reconstructs causes Oops in mdrecoveryd
Product: [Retired] Red Hat Linux Reporter: James Ralston <ralston>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 8.0   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:40:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
latest Oops
none
oops-2003-01-27T17:42:57-0500.txt
none
oops-2003-01-27T18:46:32-0500.txt none

Description James Ralston 2003-01-27 06:54:02 UTC
Description of problem:

I'm setting up Red Hat Linux 8.0 on a Dell PowerEdge 2650.  The PE has 3 disks
on a AIC7XXX controller.  The first disk is the system disk; it's not using RAID
in any form.  I want create a software RAID mirror using the second and third
disks and mount it on /data.

Figuring out how to create the mirror was easy enough.  But since this was my
first experience with Linux software RAID, I wanted to play around with it
before I tossed the box into production.  So I've spent the last week doing
things like simulating failures, performing replacements and rebuilds, etc.

I'm up-to-date with the latest errata packages.  Nonetheless, I've found the
Linux software RAID to be disturbingly brittle.  I've managed to make it Oops at
3 times so far, and I'm not even trying particularly hard.

Are there known problems with software RAID in kernel-2.4.18-19.8.0 on RH8?  If
so, are there any work-arounds you can suggest?

Version-Release number of selected component (if applicable):

kernel-2.4.18-19.8.0
raidtools-1.00.2-3.3
mdadm-1.0.0-6

Comment 1 James Ralston 2003-01-27 06:59:20 UTC
Created attachment 89615 [details]
latest Oops

This is the ksymoops report for the latest Oops I generated.

(I'm not sure what the reason is for the "cannot stat" errors.	If you can tell
me how to correct that, I'll re-generate the report.)

Comment 2 James Ralston 2003-01-28 00:03:48 UTC
Ok, a little more information.

The oops is related to triggering recovery processes.  For example, the
following command (which simulates a failure and replacement) frequently causes
an oops:

$ mdadm /dev/md0 -f /dev/sdc1 -r /dev/sdc1 -a /dev/sdc1

(I can cause oopses to occur with the raidtools commands as well.)

I generated two more oops reports this afternoon; I'll attach them in a moment.


Comment 3 James Ralston 2003-01-28 00:06:11 UTC
Created attachment 89627 [details]
oops-2003-01-27T17:42:57-0500.txt

Comment 4 James Ralston 2003-01-28 00:08:22 UTC
Created attachment 89628 [details]
oops-2003-01-27T18:46:32-0500.txt

Uhhh... the previous oops report isn't a patch, obviously.  Oops.  :p

Comment 5 James Ralston 2003-01-28 00:30:01 UTC
Ok, from pondering the 3 oops I've made so far, they're all clearly the same
problem, so I won't bother to attach any more oops.  (Unless I can get an oops
in a different location, that is.)

I've updated the summary of this bug to more accurately reflect the problem.

I've skimmed through /usr/src/linux-2.4.18-19.8.0/drivers/md/md.c, but alas, I
have little kernel hacking experience; whatever the bug, it isn't immediately
apparent to me.

I'm no stranger to building customized Red Hat kernels.  If this oops is a known
bug, and there's a patch, smack it in here and I'll go build my own kernel and
test it.

In the meantime, I'll compare md.c from 2.4.18-19.8.0 again Phoebe's kernel, and
against vanilla 2.4.20.  Perhaps something will leap out from the diffs...


Comment 6 Bugzilla owner 2004-09-30 15:40:27 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/