From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040217 Description of problem: The first write access to a physical volume on software raid 5 causes the system to hang. A reboot afterwards crashes while replaying the journal (better than 2.6.3-1.91, that would crash with a double page fault during vgscan). I remember earlier 2.6 kernels (before FC2test1) would crash when accessing a raid5 volume that needed a resync. I'm still trying to obtain a stack trace of the hangs, but it doesn't fit the screen in the replay-journal case, and the hang has always occurred with X running to me. I'll keep trying, if it would be useful. Version-Release number of selected component (if applicable): kernel-2.6.3-1.97 How reproducible: Always Steps to Reproduce: 1.Access (write?) to a filesystem mounted off a logical volume on RAID 5 physical volumes 2.Reboot after the hang Actual Results: 1. hang. 2. oops Expected Results: Erhm... Should work Additional info:
Created attachment 97980 [details] Stack trace got while replaying the journal when a raid5 resync is needed I've bit the bullet and decided I'd trigger the problem on my desktop again, since I was using the laptop anyway. Unfortunately, the picture I took after the hang during the raid 5 write access was totally unreadable (yes, far worse than this one :-( As soon as resync (on FC1) completes, I'll do it again. I found out it's not just any write access that triggers the problem. Apparently, it's necessary to write some big file. I don't know whether the number of raid members (8, in this case) makes any difference, but the stack trace didn't seem to indicate any relationship with the number of partitions.
Created attachment 97982 [details] Stack trace obtained while copying files from fedora/linux/core/development/SRPMS from another box into an ext3 filesystem mounted on a physical volume mapped to raid5 devices
Problem still present in 2.6.3-1.100, but with different stack traces. The pictures were totally unreadable, unfortunately.
The problem appears to only occur for raid 5 arrays with 5 or more members. Duplicating it is *very* simple. Start with 5 partitions, logical volumes or any kind of block device of roughly the same size, and run: mdadm -C /dev/md0 -l 5 -n 5 device[1-5]. Just after the message to the console that the degraded array got the additional disk and is beginning to resync, I get a stack trace. The stack trace varies a lot, but it always has xor_p5_mmx_5. Sometimes, this function is (oddly) the very top of the stack. In one of the several times I tested this recipe, the machine didn't oops; it rebooted instead. Very scary. I've been staring at raid5.c, xor.c and i386/xor.h looking for significant differences between the 2.4 code that worked, and all I could find was that raid5.c now uses bio instead of buffer_head, and xor.h now calls kernel_fpu_begin()/end() in the mmx variants, instead of FPU_SAVE() and FPU_RESTORE(). Does it matter that I'm doing this on an athlon? I'm going to try on a regular i686 as soon as I have a chance.
Now here's one that might provide a clue: I created a degraded raid 5 device with 5 partitions, and everything was fine. As soon as I wrote to it, I got 10 lines in the console like this: kernel/modulle.c:1965: spin_lock(kenrel/module.c:c03149c0) already locked by kernel/module.c/1965 after these 10 lines: Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():0, irqs_disabled():1 Call Trace: and then a hard freeze. No actual call trace.
Oh, the error above was with 2.6.3-1.106.
Still with 2.6.3-1.106, if I create the same degraded raid 5 array with 5 members and then dd if=/dev/md0 of=/dev/null, I get: double fault, gdt at c030f100 [255 bytes] double fault, tss at c0387800 eip = c011d1df, esp = e28bdd30 eax = 000001b0, ebx = ee484690, ecx = c0101e1c, ed = 21c00000 esi = 00000000, edi = e28bf900 and nothing else.
And if I do dd if=/dev/zero of=/dev/md0, I get an instant reset. Nice!
Created attachment 98060 [details] interesting crash in first xor_block() call while reading from degraded array I patched xor_block() such that it would printk its arguments before calling the actual xor_block function. After several reboots and stack traces that wouldn't fit in the screen, I got this very interesting one when I tried to read from a degraded raid5 array with 5 members (1 missing). The interesting bit is that it shows it's the very first call to the xor_block() function that fails, and the address of the failed paging request looks totally bogus to me. I've analyzed the code of xor_p5_mmx_5(), the function that is called, and couldn't find any problems with it, so it loks very much like the kernel is failing to handle the page fault properly.
Interesting... If I tweak check_xor() in raid5.c so as to act when count == 4, instead of on 5 (MAX_XOR_BLOCKS), then it works.
Created attachment 98080 [details] Patch that fixes the bug I guess we can't use push and pop in an inline asm that takes a general operand. I could change +g to +r, like xor_pII_mmx_5, but this is safe and faster.