From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041111 Firefox/1.0 Description of problem: I have a DELL PE 2650, Dual Xeon, 1G memory and several software raid partitions. Main duties include NFS, DHCP and samba. No desktop. This system ran FC1 all of last year without problems. Has just been upgraded to FC3. I had a similar panic recently with kernel-smp-2.6.9-1.681_FC3 but did not have a serial console setup to capture the panic message. Version-Release number of selected component (if applicable): kernel-smp-2.6.10-1.737_FC3 How reproducible: Didn't try Additional info:
Created attachment 109656 [details] panic message
Created attachment 109657 [details] boot messages
Created attachment 109736 [details] A few days latter, another panic Panic occurred at 3:42AM while nothing much was happening. The panic message is very similar, diff below. Any ideas of what else I could do? Enable some sort of debugging maybe? Here is the diff between the last panic and this panic: diff panic1 panic2 1c1 < Unable to handle kernel NULL pointer dereference at virtual address 00000038 --- > Unable to handle kernel paging request at virtual address 00010037 4c4 < *pde = 3746e001 --- > *pde = 36933001 8c8 < CPU: 1 --- > CPU: 3 12,13c12,13 < eax: 00000000 ebx: f7992400 ecx: f7974220 edx: 00000000 < esi: 00000018 edi: f7978980 ebp: f7992400 esp: c03abf18 --- > eax: 0000ffff ebx: f7a38600 ecx: f78ef540 edx: 00000000 > esi: 00000018 edi: f7a46e80 ebp: f7a38600 esp: c03adf18 15,18c15,18 < Process swapper (pid: 0, threadinfo=c03ab000 task=f7f58530) < Stack: f1103f00 00001000 f8829381 00000000 c015643b 00001000 f1103f00 00000000 < c03abf60 c0217acf f74f37d4 00000000 00000000 00000000 00001000 f74f37d4 < f7d5002c f7dcfe00 00000001 f88435ec 00000001 f7941680 f74f37d4 f7dcfe00 --- > Process swapper (pid: 0, threadinfo=c03ad000 task=f7f5fa40) > Stack: f3364180 00001000 f8829381 00000000 c015643b 00001000 f3364180 00000000 > c03adf60 c0217acf f6d2c8fc 00000000 00000000 00000000 00003000 f6d2c8fc > f7d4c33c f7dc8e00 00000001 f88435ec 00000001 f7921080 f6d2c8fc f7dc8e00
I've found a slab-corruption bug today, and will be pushing out an update soon. Your problem(s) could just be caused by that in-memory corruption.
Now running 2.6.10-1.741_FC3smp No news is good news.
Created attachment 109797 [details] panic with 2.6.10-1.741_FC3smp Bad news. Another panic. Still in raid1_end_write_request. This time with the new 2.6.10-1.741_FC3smp kernel.
Created attachment 109837 [details] Another panic with 2.6.10-1.741_FC3smp I guess there are enough of these panic messages posted here now.
Just noticed 2.6.10-ac10 with: * Fix bio free before reuse case for clones (Jens Axboe) | Fixes assorted raid oops/crashes I wonder if that will help?
Trying kernel-smp-2.6.10-1.747_FC3 from http://people.redhat.com/davej/kernels/Fedora/FC3 This has 2.6.10-ac10 according to the changelog. Thanks Dave.
No panic this time, just a lock-up. Nothing on the console(s), sysrq key would not do anything (it is enabled). Caps lock etc keys all off and not coming back on. Not network pingable. Nothing in syslog messages. This system has an NMI button. Is it worth enabling that? nmi_watchdog=1 on the command line is the only way to do that right? Oh yes, this is kernel 2.6.10-1.747_FC3smp Sigh.
kernel 2.6.10-1.747_FC3smp has been running with numerous panics and lockups. Nothing consistent. I have started to think hardware problems but the dell diagnostics picked up nothing. Thought I had a memory problem when I ran memtest86 from the FC3 rescue disk but I discovered I had to turn off USB BIOS and then memtest86 ran OK. Running 2.6.10-1.747_FC3smp now with USB BIOS disabled. Could that cause problems?
After disabling USB in the BIOS, 2.6.10-1.747_FC3smp has been running without problem for 2.5 days now. So was this problem a buggy BIOS crashing the kernel? Or have I just moved something around in memory and the problem is now hidden? I guess I will never know. Anyway, I'm becoming happier as each hour of uptime passes. Since I seem to be the only person in the universe with this problem, I'll close this bug as NOTABUG after a few more days of uptime.
I spoke too soon. Another panic in raid1_end_write_request this morning.
It's been a stable few months now. I've be running FC3 kernels now with acpi=off and it's been rock solid. I have no hypertheading but this system does not have a huge workload and it's doing it's job. I'm the only one that seems to have this problem. Other people with similar systems tend to have H/W raid. They seem to have a whole new set of stability problems with 2.6 kernel. The problems (mostly) seem to go away when you update to the latest Dell firmware/BIOSs. So I guess I'll put the bug down to buggy Dell BIOSs even though I have not tested this. This has been a theme in some other Dell systems I have as well. I'll close this bug as NOTABUG and do my bit to tidy up bugzilla.