Red Hat Bugzilla – Bug 116679
crash on first (write?) access to raid 5 physical volume
Last modified: 2007-11-30 17:10:37 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040217
Description of problem:
The first write access to a physical volume on software raid 5 causes
the system to hang. A reboot afterwards crashes while replaying the
journal (better than 2.6.3-1.91, that would crash with a double page
fault during vgscan). I remember earlier 2.6 kernels (before
FC2test1) would crash when accessing a raid5 volume that needed a
resync. I'm still trying to obtain a stack trace of the hangs, but it
doesn't fit the screen in the replay-journal case, and the hang has
always occurred with X running to me. I'll keep trying, if it would
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Access (write?) to a filesystem mounted off a logical volume on RAID
5 physical volumes
2.Reboot after the hang
Actual Results: 1. hang. 2. oops
Expected Results: Erhm... Should work
Created attachment 97980 [details]
Stack trace got while replaying the journal when a raid5 resync is needed
I've bit the bullet and decided I'd trigger the problem on my desktop again,
since I was using the laptop anyway. Unfortunately, the picture I took after
the hang during the raid 5 write access was totally unreadable (yes, far worse
than this one :-( As soon as resync (on FC1) completes, I'll do it again.
I found out it's not just any write access that triggers the problem.
Apparently, it's necessary to write some big file. I don't know whether the
number of raid members (8, in this case) makes any difference, but the stack
trace didn't seem to indicate any relationship with the number of partitions.
Created attachment 97982 [details]
Stack trace obtained while copying files from fedora/linux/core/development/SRPMS from another box into an ext3 filesystem mounted on a physical volume mapped to raid5 devices
Problem still present in 2.6.3-1.100, but with different stack traces.
The pictures were totally unreadable, unfortunately.
The problem appears to only occur for raid 5 arrays with 5 or more
members. Duplicating it is *very* simple. Start with 5 partitions,
logical volumes or any kind of block device of roughly the same size,
and run: mdadm -C /dev/md0 -l 5 -n 5 device[1-5]. Just after the
message to the console that the degraded array got the additional disk
and is beginning to resync, I get a stack trace. The stack trace
varies a lot, but it always has xor_p5_mmx_5. Sometimes, this
function is (oddly) the very top of the stack. In one of the several
times I tested this recipe, the machine didn't oops; it rebooted
instead. Very scary.
I've been staring at raid5.c, xor.c and i386/xor.h looking for
significant differences between the 2.4 code that worked, and all I
could find was that raid5.c now uses bio instead of buffer_head, and
xor.h now calls kernel_fpu_begin()/end() in the mmx variants, instead
of FPU_SAVE() and FPU_RESTORE(). Does it matter that I'm doing this
on an athlon? I'm going to try on a regular i686 as soon as I have a
Now here's one that might provide a clue: I created a degraded raid 5
device with 5 partitions, and everything was fine. As soon as I wrote
to it, I got 10 lines in the console like this:
kernel/modulle.c:1965: spin_lock(kenrel/module.c:c03149c0) already
locked by kernel/module.c/1965
after these 10 lines:
Debug: sleeping function called from invalid context at
and then a hard freeze. No actual call trace.
Oh, the error above was with 2.6.3-1.106.
Still with 2.6.3-1.106, if I create the same degraded raid 5 array
with 5 members and then dd if=/dev/md0 of=/dev/null, I get:
double fault, gdt at c030f100 [255 bytes]
double fault, tss at c0387800
eip = c011d1df, esp = e28bdd30
eax = 000001b0, ebx = ee484690, ecx = c0101e1c, ed = 21c00000
esi = 00000000, edi = e28bf900
and nothing else.
And if I do dd if=/dev/zero of=/dev/md0, I get an instant reset. Nice!
Created attachment 98060 [details]
interesting crash in first xor_block() call while reading from degraded array
I patched xor_block() such that it would printk its arguments before calling
the actual xor_block function. After several reboots and stack traces that
wouldn't fit in the screen, I got this very interesting one when I tried to
read from a degraded raid5 array with 5 members (1 missing). The interesting
bit is that it shows it's the very first call to the xor_block() function that
fails, and the address of the failed paging request looks totally bogus to me.
I've analyzed the code of xor_p5_mmx_5(), the function that is called, and
couldn't find any problems with it, so it loks very much like the kernel is
failing to handle the page fault properly.
Interesting... If I tweak check_xor() in raid5.c so as to act when
count == 4, instead of on 5 (MAX_XOR_BLOCKS), then it works.
Created attachment 98080 [details]
Patch that fixes the bug
I guess we can't use push and pop in an inline asm that takes a general
operand. I could change +g to +r, like xor_pII_mmx_5, but this is safe and