Bug 144869 - kernel panic in raid1_end_write_request
Summary: kernel panic in raid1_end_write_request
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 3
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-01-12 06:31 UTC by Norman Gaywood
Modified: 2015-01-04 22:15 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-14 04:17:56 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
panic message (1.97 KB, text/plain)
2005-01-12 06:35 UTC, Norman Gaywood
no flags Details
boot messages (19.64 KB, text/plain)
2005-01-12 06:36 UTC, Norman Gaywood
no flags Details
A few days latter, another panic (1.88 KB, text/plain)
2005-01-13 20:17 UTC, Norman Gaywood
no flags Details
panic with 2.6.10-1.741_FC3smp (1.94 KB, text/plain)
2005-01-14 22:20 UTC, Norman Gaywood
no flags Details
Another panic with 2.6.10-1.741_FC3smp (1.91 KB, text/plain)
2005-01-16 04:54 UTC, Norman Gaywood
no flags Details

Description Norman Gaywood 2005-01-12 06:31:24 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041111 Firefox/1.0

Description of problem:
I have a DELL PE 2650, Dual Xeon, 1G memory and several software raid partitions. Main duties include NFS, DHCP and samba. No desktop.

This system ran FC1 all of last year without problems. Has just been upgraded to FC3.

I had a similar panic recently with kernel-smp-2.6.9-1.681_FC3 but did not have a serial console setup to capture the panic message.


Version-Release number of selected component (if applicable):
kernel-smp-2.6.10-1.737_FC3

How reproducible:
Didn't try


Additional info:

Comment 1 Norman Gaywood 2005-01-12 06:35:11 UTC
Created attachment 109656 [details]
panic message

Comment 2 Norman Gaywood 2005-01-12 06:36:03 UTC
Created attachment 109657 [details]
boot messages

Comment 3 Norman Gaywood 2005-01-13 20:17:47 UTC
Created attachment 109736 [details]
A few days latter, another panic

Panic occurred at 3:42AM while nothing much was happening. The panic message is
very similar, diff below.

Any ideas of what else I could do? Enable some sort of debugging maybe?

Here is the diff between the last panic and this panic:

diff panic1 panic2
1c1
< Unable to handle kernel NULL pointer dereference at virtual address 00000038
---
> Unable to handle kernel paging request at virtual address 00010037
4c4
< *pde = 3746e001
---
> *pde = 36933001
8c8
< CPU:	  1
---
> CPU:	  3
12,13c12,13
< eax: 00000000   ebx: f7992400   ecx: f7974220   edx: 00000000
< esi: 00000018   edi: f7978980   ebp: f7992400   esp: c03abf18
---
> eax: 0000ffff   ebx: f7a38600   ecx: f78ef540   edx: 00000000
> esi: 00000018   edi: f7a46e80   ebp: f7a38600   esp: c03adf18
15,18c15,18
< Process swapper (pid: 0, threadinfo=c03ab000 task=f7f58530)
< Stack: f1103f00 00001000 f8829381 00000000 c015643b 00001000 f1103f00
00000000
<	 c03abf60 c0217acf f74f37d4 00000000 00000000 00000000 00001000
f74f37d4
<	 f7d5002c f7dcfe00 00000001 f88435ec 00000001 f7941680 f74f37d4
f7dcfe00
---
> Process swapper (pid: 0, threadinfo=c03ad000 task=f7f5fa40)
> Stack: f3364180 00001000 f8829381 00000000 c015643b 00001000 f3364180
00000000
>	 c03adf60 c0217acf f6d2c8fc 00000000 00000000 00000000 00003000
f6d2c8fc
>	 f7d4c33c f7dc8e00 00000001 f88435ec 00000001 f7921080 f6d2c8fc
f7dc8e00

Comment 4 Dave Jones 2005-01-13 21:13:07 UTC
I've found a slab-corruption bug today, and will be pushing out an update soon.
Your problem(s) could just be caused by that in-memory corruption.


Comment 5 Norman Gaywood 2005-01-14 05:10:02 UTC
Now running 2.6.10-1.741_FC3smp

No news is good news.

Comment 6 Norman Gaywood 2005-01-14 22:20:15 UTC
Created attachment 109797 [details]
panic with 2.6.10-1.741_FC3smp

Bad news. Another panic. Still in raid1_end_write_request. This time with the
new 2.6.10-1.741_FC3smp kernel.

Comment 7 Norman Gaywood 2005-01-16 04:54:57 UTC
Created attachment 109837 [details]
Another panic with 2.6.10-1.741_FC3smp

I guess there are enough of these panic messages posted here now.

Comment 8 Norman Gaywood 2005-01-17 01:27:12 UTC
Just noticed 2.6.10-ac10 with:

* Fix bio free before reuse case for clones (Jens Axboe)
| Fixes assorted raid oops/crashes

I wonder if that will help?

Comment 9 Norman Gaywood 2005-01-17 12:10:45 UTC
Trying kernel-smp-2.6.10-1.747_FC3 from
http://people.redhat.com/davej/kernels/Fedora/FC3

This has 2.6.10-ac10 according to the changelog.

Thanks Dave.

Comment 10 Norman Gaywood 2005-01-18 05:56:30 UTC
No panic this time, just a lock-up. Nothing on the console(s), sysrq key would 
not do anything (it is enabled). Caps lock etc keys all off and not coming back
on. Not network pingable.

Nothing in syslog messages.

This system has an NMI button. Is it worth enabling that? nmi_watchdog=1 on the
command line is the only way to do that right?

Oh yes, this is kernel 2.6.10-1.747_FC3smp

Sigh.

Comment 11 Norman Gaywood 2005-01-24 04:22:46 UTC
kernel 2.6.10-1.747_FC3smp has been running with numerous panics and
lockups. Nothing consistent. I have started to think hardware problems
but the dell diagnostics picked up nothing.

Thought I had a memory problem when I ran memtest86 from the FC3
rescue disk but I discovered I had to turn off USB BIOS and then
memtest86 ran OK.

Running 2.6.10-1.747_FC3smp now with USB BIOS disabled. Could that
cause problems?

Comment 12 Norman Gaywood 2005-01-27 23:55:41 UTC
After disabling USB in the BIOS, 2.6.10-1.747_FC3smp has been running without
problem for 2.5 days now.

So was this problem a buggy BIOS crashing the kernel? Or have I just moved
something around in memory and the problem is now hidden? I guess I will never know.

Anyway, I'm becoming happier as each hour of uptime passes. Since I seem to be
the only person in the universe with this problem, I'll close this bug as
NOTABUG after a few more days of uptime.

Comment 13 Norman Gaywood 2005-01-28 21:05:04 UTC
I spoke too soon. Another panic in raid1_end_write_request this morning.

Comment 14 Norman Gaywood 2005-04-14 04:17:56 UTC
It's been a stable few months now. I've be running FC3 kernels now with acpi=off
and it's been rock solid. I have no hypertheading but this system does not have
a huge workload and it's doing it's job.

I'm the only one that seems to have this problem. Other people with similar
systems tend to have H/W raid. They seem to have a whole new set of stability
problems with 2.6 kernel. The problems (mostly) seem to go away when you update
to the latest Dell firmware/BIOSs.

So I guess I'll put the bug down to buggy Dell BIOSs even though I have not
tested this. This has been a theme in some other Dell systems I have as well.

I'll close this bug as NOTABUG and do my bit to tidy up bugzilla.



Note You need to log in before you can comment on or make changes to this bug.