116679 – crash on first (write?) access to raid 5 physical volume

Bug 116679 - crash on first (write?) access to raid 5 physical volume

Summary: crash on first (write?) access to raid 5 physical volume

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-02-24 11:04 UTC by Alexandre Oliva
Modified:	2007-11-30 22:10 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-03-08 18:17:42 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Stack trace got while replaying the journal when a raid5 resync is needed (47.31 KB, image/jpeg) 2004-02-24 11:35 UTC, Alexandre Oliva	no flags	Details
Stack trace obtained while copying files from fedora/linux/core/development/SRPMS from another box into an ext3 filesystem mounted on a physical volume mapped to raid5 devices (62.23 KB, image/jpeg) 2004-02-24 11:49 UTC, Alexandre Oliva	no flags	Details
interesting crash in first xor_block() call while reading from degraded array (68.16 KB, image/jpeg) 2004-02-26 03:34 UTC, Alexandre Oliva	no flags	Details
Patch that fixes the bug (1.84 KB, patch) 2004-02-26 18:20 UTC, Alexandre Oliva	no flags	Details \| Diff
View All

Description Alexandre Oliva 2004-02-24 11:04:17 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040217

Description of problem:
The first write access to a physical volume on software raid 5 causes
the system to hang.  A reboot afterwards crashes while replaying the
journal (better than 2.6.3-1.91, that would crash with a double page
fault during vgscan).  I remember earlier 2.6 kernels (before
FC2test1) would crash when accessing a raid5 volume that needed a
resync.  I'm still trying to obtain a stack trace of the hangs, but it
doesn't fit the screen in the replay-journal case, and the hang has
always occurred with X running to me.  I'll keep trying, if it would
be useful.

Version-Release number of selected component (if applicable):
kernel-2.6.3-1.97

How reproducible:
Always

Steps to Reproduce:
1.Access (write?) to a filesystem mounted off a logical volume on RAID
5 physical volumes
2.Reboot after the hang


Actual Results:  1. hang.  2. oops

Expected Results:  Erhm...  Should work

Additional info:

Comment 1 Alexandre Oliva 2004-02-24 11:35:19 UTC

Created attachment 97980 [details]
Stack trace got while replaying the journal when a raid5 resync is needed

I've bit the bullet and decided I'd trigger the problem on my desktop again,
since I was using the laptop anyway.  Unfortunately, the picture I took after
the hang during the raid 5 write access was totally unreadable (yes, far worse
than this one :-(  As soon as resync (on FC1) completes, I'll do it again.
I found out it's not just any write access that triggers the problem. 
Apparently, it's necessary to write some big file.  I don't know whether the
number of raid members (8, in this case) makes any difference, but the stack
trace didn't seem to indicate any relationship with the number of partitions.

Comment 2 Alexandre Oliva 2004-02-24 11:49:52 UTC

Created attachment 97982 [details]
Stack trace obtained while copying files from fedora/linux/core/development/SRPMS from another box into an ext3 filesystem mounted on a physical volume mapped to raid5 devices

Comment 3 Alexandre Oliva 2004-02-24 21:55:32 UTC

Problem still present in 2.6.3-1.100, but with different stack traces.
 The pictures were totally unreadable, unfortunately.

Comment 4 Alexandre Oliva 2004-02-25 20:54:31 UTC

The problem appears to only occur for raid 5 arrays with 5 or more
members.  Duplicating it is *very* simple.  Start with 5 partitions,
logical volumes or any kind of block device of roughly the same size,
and run: mdadm -C /dev/md0 -l 5 -n 5 device[1-5].  Just after the
message to the console that the degraded array got the additional disk
and is beginning to resync, I get a stack trace.  The stack trace
varies a lot, but it always has xor_p5_mmx_5.  Sometimes, this
function is (oddly) the very top of the stack.  In one of the several
times I tested this recipe, the machine didn't oops; it rebooted
instead.  Very scary.

I've been staring at raid5.c, xor.c and i386/xor.h looking for
significant differences between the 2.4 code that worked, and all I
could find was that raid5.c now uses bio instead of buffer_head, and
xor.h now calls kernel_fpu_begin()/end() in the mmx variants, instead
of FPU_SAVE() and FPU_RESTORE().  Does it matter that I'm doing this
on an athlon?  I'm going to try on a regular i686 as soon as I have a
chance.

Comment 5 Alexandre Oliva 2004-02-25 22:11:31 UTC

Now here's one that might provide a clue: I created a degraded raid 5
device with 5 partitions, and everything was fine.  As soon as I wrote
to it, I got 10 lines in the console like this:

kernel/modulle.c:1965: spin_lock(kenrel/module.c:c03149c0) already
locked by kernel/module.c/1965

after these 10 lines:

Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1
Call Trace:

and then a hard freeze.  No actual call trace.

Comment 6 Alexandre Oliva 2004-02-25 22:12:04 UTC

Oh, the error above was with 2.6.3-1.106.

Comment 7 Alexandre Oliva 2004-02-25 22:17:22 UTC

Still with 2.6.3-1.106, if I create the same degraded raid 5 array
with 5 members and then dd if=/dev/md0 of=/dev/null, I get:

double fault, gdt at c030f100 [255 bytes]
double fault, tss at c0387800
eip = c011d1df, esp = e28bdd30
eax = 000001b0, ebx = ee484690, ecx = c0101e1c, ed = 21c00000
esi = 00000000, edi = e28bf900

and nothing else.

Comment 8 Alexandre Oliva 2004-02-25 22:25:44 UTC

And if I do dd if=/dev/zero of=/dev/md0, I get an instant reset.  Nice!

Comment 9 Alexandre Oliva 2004-02-26 03:34:12 UTC

Created attachment 98060 [details]
interesting crash in first xor_block() call while reading from degraded array

I patched xor_block() such that it would printk its arguments before calling
the actual xor_block function.	After several reboots and stack traces that
wouldn't fit in the screen, I got this very interesting one when I tried to
read from a degraded raid5 array with 5 members (1 missing).  The interesting
bit is that it shows it's the very first call to the xor_block() function that
fails, and the address of the failed paging request looks totally bogus to me. 
I've analyzed the code of xor_p5_mmx_5(), the function that is called, and
couldn't find any problems with it, so it loks very much like the kernel is
failing to handle the page fault properly.

Comment 10 Alexandre Oliva 2004-02-26 16:53:38 UTC

Interesting...  If I tweak check_xor() in raid5.c so as to act when
count == 4, instead of on 5 (MAX_XOR_BLOCKS), then it works.

Comment 11 Alexandre Oliva 2004-02-26 18:20:21 UTC

Created attachment 98080 [details]
Patch that fixes the bug

I guess we can't use push and pop in an inline asm that takes a general
operand.  I could change +g to +r, like xor_pII_mmx_5, but this is safe and
faster.

Note You need to log in before you can comment on or make changes to this bug.