Bug 60823

Summary: Kernel bug at inode.c:686!
Product: [Retired] Red Hat Linux Reporter: Need Real Name <rovero>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: ajb, gberthet, marcel, sct
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-06-08 02:00:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
part of /var/log/messages none

Description Need Real Name 2002-03-07 13:17:05 UTC
Description of Problem:

Each day since upgrading to 2.4.9-31 kernel, my machine has
been "dead" when arriving to work.  Inspection of /var/log/messages
reveals:
Mar  7 04:04:27 entropy kernel: ------------[ cut here ]------------
Mar  7 04:04:27 entropy kernel: kernel BUG at inode.c:686!
Mar  7 04:04:27 entropy kernel: invalid operand: 0000
Mar  7 04:04:27 entropy kernel: Kernel 2.4.9-31
Mar  7 04:04:27 entropy kernel: CPU:    0
Mar  7 04:04:27 entropy kernel: EIP:    0010:[prune_icache+153/304]    Not tainted
Mar  7 04:04:27 entropy kernel: EIP:    0010:[<c0149949>]    Not tainted
Mar  7 04:04:27 entropy kernel: EFLAGS: 00010286
Mar  7 04:04:27 entropy kernel: EIP is at prune_icache [kernel] 0x99 
Mar  7 04:04:27 entropy kernel: eax: 0000001b   ebx: ceec2e48   ecx: 00000001  
edx: 00001bbf
Mar  7 04:04:27 entropy kernel: esi: ceec2e40   edi: ceccd908   ebp: c1643fa8  
esp: c1643f84
Mar  7 04:04:27 entropy kernel: ds: 0018   es: 0018   ss: 0018
Mar  7 04:04:27 entropy kernel: Process kswapd (pid: 5, stackpage=c1643000)
Mar  7 04:04:27 entropy kernel: Stack: c022e752 000002ae 00000000 00008c83
ce3193c8 ce598208 000000fd 000000c0 
Mar  7 04:04:27 entropy kernel:        000000c0 0008e000 c0149a01 ffff737d
c012f513 00000000 000000c0 c1823130 
Mar  7 04:04:27 entropy kernel:        000000c0 000000c0 00000000 c1642000
00000006 c012f595 000000c0 00000000 
Mar  7 04:04:27 entropy kernel: Call Trace: [IRQ0x0f_interrupt+114610/138304]
.rodata.str1.1 [kernel] 0x288d 
Mar  7 04:04:28 entropy kernel: Call Trace: [<c022e752>] .rodata.str1.1 [kernel]
0x288d 
Mar  7 04:04:29 entropy kernel: [shrink_icache_memory+33/64]
shrink_icache_memory [kernel] 0x21 
Mar  7 04:04:29 entropy kernel: [<c0149a01>] shrink_icache_memory [kernel] 0x21 
Mar  7 04:04:29 entropy kernel: [do_try_to_free_pages+35/80]
do_try_to_free_pages [kernel] 0x23 
Mar  7 04:04:29 entropy kernel: [<c012f513>] do_try_to_free_pages [kernel] 0x23 
Mar  7 04:04:29 entropy kernel: [kswapd+85/240] kswapd [kernel] 0x55 
Mar  7 04:04:29 entropy kernel: [<c012f595>] kswapd [kernel] 0x55 
Mar  7 04:04:29 entropy kernel: [_stext+0/48] stext [kernel] 0x0 
Mar  7 04:04:29 entropy kernel: [<c0105000>] stext [kernel] 0x0 
Mar  7 04:04:29 entropy kernel: [kernel_thread+38/48] kernel_thread [kernel] 0x26 
Mar  7 04:04:29 entropy kernel: [<c0105726>] kernel_thread [kernel] 0x26 
Mar  7 04:04:29 entropy kernel: [kswapd+0/240] kswapd [kernel] 0x0 
Mar  7 04:04:29 entropy kernel: [<c012f540>] kswapd [kernel] 0x0 
Mar  7 04:04:29 entropy kernel: 
Mar  7 04:04:29 entropy kernel: 
Mar  7 04:04:29 entropy kernel: Code: 0f 0b 58 5a 8b 53 04 8b 03 89 50 04 89 02
8b 53 fc c7 43 04 
Mar  7 07:45:40 entropy syslogd 1.4.1: restart.

Version-Release number of selected component (if applicable):
kernel 2.4.9-31

How Reproducible:
Just let system run overnight

Steps to Reproduce:
1. Reboot
2. Let system run
3. 

Actual Results:


Expected Results:


Additional Information:
Pentium III (Katmai) stepping 02.  Crash results in 
"unclean" / partition, requires full fsck on reboot.

Comment 1 Arjan van de Ven 2002-03-07 13:40:39 UTC
which filesystem(s) are you using ?

Comment 2 Need Real Name 2002-03-07 14:36:18 UTC
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on ide0(3,2), internal journal
 hdd: hdd4
 hdd: hdd4
 hdd: hdd4
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.11, 3 Oct 2001 on ide1(22,1), internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.

Comment 3 Stephen Tweedie 2002-03-07 15:01:31 UTC
An full fsck on reboot indicates that root is in fact ext2: could you show the
contents of /proc/mounts ?

What other device drivers are you using?  Is the oops footprint the same every time?

Comment 4 Need Real Name 2002-03-07 16:26:43 UTC
Contents of /proc/mounts - sure looks like ext3 to me..........

cat /proc/mounts

/dev/root / ext3 rw 0 0
/proc /proc proc rw 0 0
/dev/hda1 /boot ext3 rw 0 0
/dev/hdc1 /data ext3 rw 0 0
none /dev/pts devpts rw 0 0
none /dev/shm tmpfs rw 0 0
thunder:/USERS /home nfs
rw,v3,rsize=32768,wsize=32768,hard,udp,lock,addr=thunder 0 0
//LUCIFER/USERS /mnt/lucifer smbfs rw 0 0
automount(pid869) /wX autofs rw 0 0
automount(pid848) /common autofs rw 0 0
automount(pid800) /misc autofs rw 0 0
automount(pid802) /appbin autofs rw 0 0
automount(pid824) /project autofs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
thunder:/APPBIN/frame /appbin/frame nfs
ro,v3,rsize=32768,wsize=32768,hard,intr,udp,lock,addr=thunder 0 0
dustdevil:/common/ccm /common/ccm nfs
rw,v3,rsize=32768,wsize=32768,hard,intr,udp,lock,addr=dustdevil 0 0

Other drivers are:

3c59x.o, 
Uniform Multi-Platform E-IDE driver Revision: 6.31,
Serial driver version 5.05c (2001-07-08)

No special peripherals, no SCSI.



Comment 5 Stephen Tweedie 2002-03-07 16:50:28 UTC
If the crash produces a full fsck on ext3, then that is almost certainly because
ext3 has detected a fs error and recorded the need for a precautionary fsck in
the superblock.  It would help to know for sure --- if you could tell us the
exact message that fsck gives when announcing that the full fsck is forced, that
would help.

Are there any other kernel errors recorded in /var/log/messages?  Is /var/log on
the same disk as "/"?  If so, I'd expect ext3's error handling to take the
/var/log fs readonly if it ever found an error which required a forced fsck, so
you'd not see the ext3 full error text in the syslog (you can't log to a
readonly filesystem.)  It's important to try to work out what is the first error
to have occurred here.

Comment 6 Andreas J. Bathe 2002-03-20 12:33:19 UTC
Created attachment 49125 [details]
part of /var/log/messages

Comment 7 Andreas J. Bathe 2002-03-20 12:35:35 UTC
after "upgrading" to the errata kernel 2.4.9-31, I got two kernel bugs reported
(just initiated with the command 'rpm -Va'):

kernel BUG at page_alloc.c:87!
and
kernel BUG at vmscan.c:496!

the interesting part of /var/log/messages are included (see above)

furthermore, I discovered that kswapd was a zombie after that, so I think this
could be a
good hint.

with the stock kernel 2.4.18 I can't reproduce this bug.

somewhere I read that the QS tests are so hard that the standard kernel doesn't
stand these tests, but I don't want believe that this errata kernel stand the
tests if they were so good...

btw,  I think that RedHat should increase the quality of released erratas to a
higher point of reliance, otherwise other distributions will take over. this is
just my personal point of view.

Andreas


Comment 8 Arjan van de Ven 2002-03-20 12:40:27 UTC
kernel: NVRM: loading NVIDIA NVdriver Kernel Module  1.0.2314  Fri Nov 30
19:33:20 PST 2001

please try without that.

(and yes we do test errata. a lot. but not with unsupported binary only modules
we can't fix anyway)

Comment 9 Stephen Tweedie 2002-03-20 12:54:37 UTC
rovero: 

I'd still like to see your own log file for this, because the forced fsck after
reboot does imply that ext3 has detected a corruption that required it to mark
the filesystem as having errors.

Comment 10 Need Real Name 2002-04-19 07:09:15 UTC
I have a similar problem (see below). Which fix should I install?
--------------------------------------------------------------------
Apr 17 02:16:12 mibcentral kernel: kernel BUG at inode.c:514!
Apr 17 02:16:12 mibcentral kernel: invalid operand: 0000
Apr 17 02:16:12 mibcentral kernel: CPU:    0
Apr 17 02:16:12 mibcentral kernel: EIP:    0010:[clear_inode+48/304]    Not 
tainted
Apr 17 02:16:12 mibcentral kernel: EIP:    0010:[<c01492b0>]    Not tainted
Apr 17 02:16:12 mibcentral kernel: EFLAGS: 00010286
Apr 17 02:16:12 mibcentral kernel: EIP is at clear_inode [kernel] 0x30
Apr 17 02:16:12 mibcentral kernel: eax: 0000001b   ebx: d4939e40   ecx: 
00000001   edx: 00001eb5
Apr 17 02:16:12 mibcentral kernel: esi: c1c6bc00   edi: d4939e40   ebp: 
db301560   esp: c7d81e68
Apr 17 02:16:12 mibcentral kernel: ds: 0018   es: 0018   ss: 0018
Apr 17 02:16:12 mibcentral kernel: Process rm (pid: 17203, stackpage=c7d81000)
Apr 17 02:16:12 mibcentral kernel: Stack: c022e381 00000202 db301560 d12a8140 
00000000 00000000 d4939e40 e080de22
Apr 17 02:16:12 mibcentral kernel:        d4939e40 c7d81edc 00000000 00000000 
db301560 bfffdac8 e0811ce4 0018c5ac
Apr 17 02:16:12 mibcentral kernel:        e0811cf5 d12a8140 c7d80000 db301574 
c7d81ed8 d4939e40 c7d81ed8 d4939e40
Apr 17 02:16:12 mibcentral kernel: Call Trace: 
[IRQ0x0f_interrupt+114689/138400] .rodata.str1.1 [kernel] 0x287c
Apr 17 02:16:12 mibcentral kernel: Call Trace: [<c022e381>] .rodata.str1.1 
[kernel] 0x287c  
Apr 17 02:16:12 mibcentral kernel: 
[8139too:__insmod_8139too_O/lib/modules/2.4.9-21/kernel/drivers/net/+-
487902/96] __insmod_ext3_S.
text_L43040 [ext3] 0x1dc2  
Apr 17 02:16:12 mibcentral kernel: [<e080de22>] __insmod_ext3_S.text_L43040 
[ext3] 0x1dc2
Apr 17 02:16:12 mibcentral kernel: 
[8139too:__insmod_8139too_O/lib/modules/2.4.9-21/kernel/drivers/net/+-
471836/96] __insmod_ext3_S.
text_L43040 [ext3] 0x5c84  
Apr 17 02:16:12 mibcentral kernel: [<e0811ce4>] __insmod_ext3_S.text_L43040 
[ext3] 0x5c84
Apr 17 02:16:12 mibcentral kernel: 
[8139too:__insmod_8139too_O/lib/modules/2.4.9-21/kernel/drivers/net/+-
471819/96] __insmod_ext3_S.
text_L43040 [ext3] 0x5c95  
Apr 17 02:16:12 mibcentral kernel: [<e0811cf5>] __insmod_ext3_S.text_L43040 
[ext3] 0x5c95 
Apr 17 02:16:12 mibcentral kernel: 
[8139too:__insmod_8139too_O/lib/modules/2.4.9-21/kernel/drivers/net/+-
471577/96] __insmod_ext3_S.
text_L43040 [ext3] 0x5d87  
Apr 17 02:16:12 mibcentral kernel: [<e0811de7>] __insmod_ext3_S.text_L43040 
[ext3] 0x5d87
Apr 17 02:16:12 mibcentral kernel: 
[8139too:__insmod_8139too_O/lib/modules/2.4.9-21/kernel/drivers/net/+-
483737/96] __insmod_ext3_S.
text_L43040 [ext3] 0x2e07  
Apr 17 02:16:12 mibcentral kernel: [<e080ee67>] __insmod_ext3_S.text_L43040 
[ext3] 0x2e07
Apr 17 02:16:12 mibcentral kernel: 
[8139too:__insmod_8139too_O/lib/modules/2.4.9-21/kernel/drivers/net/+-
434208/96] __insmod_ext3_S.
data_L672 [ext3] 0x200 
Apr 17 02:16:12 mibcentral kernel: [<e081afe0>] __insmod_ext3_S.data_L672 
[ext3] 0x200
Apr 17 02:16:12 mibcentral kernel: 
[8139too:__insmod_8139too_O/lib/modules/2.4.9-21/kernel/drivers/net/+-
434208/96] __insmod_ext3_S.
data_L672 [ext3] 0x200
Apr 17 02:16:12 mibcentral kernel: [<e081afe0>] __insmod_ext3_S.data_L672 
[ext3] 0x200
Apr 17 02:16:12 mibcentral kernel: [iput_free+245/464] iput_free [kernel] 0xf5
Apr 17 02:16:12 mibcentral kernel: [<c0149f35>] iput_free [kernel] 0xf5
Apr 17 02:16:12 mibcentral kernel: 
[8139too:__insmod_8139too_O/lib/modules/2.4.9-21/kernel/drivers/net/+-
463916/96] __insmod_ext3_S.
text_L43040 [ext3] 0x7b74
Apr 17 02:16:12 mibcentral kernel: [<e0813bd4>] __insmod_ext3_S.text_L43040 
[ext3] 0x7b74
Apr 17 02:16:12 mibcentral kernel: [d_delete+76/128] d_delete [kernel] 0x4c
Apr 17 02:16:12 mibcentral kernel: [<c01481fc>] d_delete [kernel] 0x4c
Apr 17 02:16:12 mibcentral kernel: [vfs_permission+121/288] vfs_permission 
[kernel] 0x79
Apr 17 02:16:12 mibcentral kernel: [<c013f809>] vfs_permission [kernel] 0x79
Apr 17 02:16:12 mibcentral kernel: [vfs_unlink+335/400] vfs_unlink [kernel] 
0x14f
Apr 17 02:16:12 mibcentral kernel: [<c01418bf>] vfs_unlink [kernel] 0x14f
Apr 17 02:16:12 mibcentral kernel: [lookup_hash+106/144] lookup_hash [kernel] 
0x6a
Apr 17 02:16:12 mibcentral kernel: [<c014077a>] lookup_hash [kernel] 0x6a
Apr 17 02:16:12 mibcentral kernel: [sys_unlink+153/256] sys_unlink [kernel] 0x99
Apr 17 02:16:12 mibcentral kernel: [<c0141999>] sys_unlink [kernel] 0x99
Apr 17 02:16:12 mibcentral kernel: [system_call+51/56] system_call [kernel] 0x33
Apr 17 02:16:12 mibcentral kernel: [<c0106f3b>] system_call [kernel] 0x33
Apr 17 02:16:12 mibcentral kernel:
Apr 17 02:16:12 mibcentral kernel:
Apr 17 02:16:12 mibcentral kernel: Code: 0f 0b 59 58 8b 83 0c 01 00 00 a9 10 00 
00 00 75 19 68 04 02
Apr 17 02:19:35 mibcentral sshd(pam_unix)[9717]: session closed for user admin
Apr 17 02:19:37 mibcentral sshd(pam_unix)[15932]: session closed for user admin
Apr 17 02:56:37 mibcentral syslogd 1.4.1: restart.
Apr 17 02:56:37 mibcentral syslog: syslogd startup succeeded
Apr 17 02:56:37 mibcentral syslog: klogd startup succeeded
Apr 17 02:56:37 mibcentral kernel: klogd 1.4.1, log source = /proc/kmsg started.
Apr 17 02:56:37 mibcentral kernel: Inspecting /boot/System.map-2.4.9-21
.....

Upon reboot, my log file reports all filesystems clean except /var
Other times, all filesystems are clean upon rebooting on the same bug.


Comment 11 Marcel Mol 2002-05-21 20:07:33 UTC
Hi, seems quite silent on this bug...
At may 17th I upgraded to 2.4.9-31 and the 

May 17 22:26:26 linserv kernel: kernel BUG at inode.c:686!

one appears regularly. I get this with several processes: kswapd, tar, http,
updatedb, mrtg, sadc. The first few days only once or twice. But on the may 20th  
at 16:10 and form then on every 5 to 10 minutes (with mrtg) until the 21th 04:00
when the system came to a halt.

Here is the first one May 17 22:26:26 linserv kernel: kernel BUG at inode.c:686!
    May 17 22:26:26 linserv kernel: invalid operand: 0000  
    May 17 22:26:26 linserv kernel: Kernel 2.4.9-31
    May 17 22:26:26 linserv kernel: CPU:    0
    May 17 22:26:26 linserv kernel: EIP:    0010:[prune_icache+153/304]    Not
tainted
    May 17 22:26:26 linserv kernel: EIP:    0010:[<c0148f79>]    Not tainted
    May 17 22:26:26 linserv kernel: EFLAGS: 00013286
    May 17 22:26:26 linserv kernel: EIP is at prune_icache [kernel] 0x99 
    May 17 22:26:26 linserv kernel: eax: 0000001b   ebx: dffc5908   ecx: 00000001  
    edx: 0000671c
    May 17 22:26:26 linserv kernel: esi: dffc5900   edi: de936048   ebp:    
dffdffa8  
    esp: dffdff84
    May 17 22:26:26 linserv kernel: ds: 0018   es: 0018   ss: 0018
    May 17 22:26:26 linserv kernel: Process kswapd (pid: 5, stackpage=dffdf000)
    May 17 22:26:26 linserv kernel: Stack: c0231633 000002ae 00000000 000039c8
    dfbf1ac8 da0cc908 000000b9 000000c0 
    May 17 22:26:26 linserv kernel:        000000c0 0008e000 c0149031 ffffc638
    c012ebb3 00000000 000000c0 c1c86130 
    May 17 22:26:26 linserv kernel:        000000c0 000000c0 00000000 dffde000
    00000006 c012ec35 000000c0 00000000 
    May 17 22:26:26 linserv kernel: Call Trace:
    [call_spurious_interrupt+122502/146195] .rodata.str1.1 [kernel] 0x2dce 
    May 17 22:26:26 linserv kernel: Call Trace: [<c0231633>] .rodata.str1.1    
[kernel]
    0x2dce 
    May 17 22:26:26 linserv kernel: [shrink_icache_memory+33/64]
    shrink_icache_memory [kernel] 0x21 
    May 17 22:26:26 linserv kernel: [<c0149031>] shrink_icache_memory [kernel] 0x21 
    May 17 22:26:26 linserv kernel: [do_try_to_free_pages+35/80]
    do_try_to_free_pages [kernel] 0x23 
    May 17 22:26:26 linserv kernel: [<c012ebb3>] do_try_to_free_pages [kernel] 
 0x23 
    May 17 22:26:26 linserv kernel: [kswapd+85/240] kswapd [kernel] 0x55 
    May 17 22:26:26 linserv kernel: [<c012ec35>] kswapd [kernel] 0x55 
    May 17 22:26:26 linserv kernel: [_stext+0/48] stext [kernel] 0x0 
    May 17 22:26:26 linserv kernel: [<c0105000>] stext [kernel] 0x0 
    May 17 22:26:26 linserv kernel: [kernel_thread+38/48] kernel_thread [kernel]
0x26 
    May 17 22:26:26 linserv kernel: [<c0105746>] kernel_thread [kernel] 0x26 
    May 17 22:26:26 linserv kernel: [kswapd+0/240] kswapd [kernel] 0x0 
    May 17 22:26:26 linserv kernel: [<c012ebe0>] kswapd [kernel] 0x0 
    May 17 22:26:26 linserv kernel: 
    May 17 22:26:26 linserv kernel: 
    May 17 22:26:26 linserv kernel: Code: 0f 0b 58 5a 8b 53 04 8b 03 89 50 04 89 02
    8b 53 fc c7 43 04 

And here is the last one

May 21 04:00:00 linserv kernel: kernel BUG at inode.c:686!
May 21 04:00:00 linserv kernel: invalid operand: 0000
May 21 04:00:00 linserv kernel: Kernel 2.4.9-31
May 21 04:00:00 linserv kernel: CPU:    0
May 21 04:00:00 linserv kernel: EIP:    0010:[prune_icache+153/304]    Not tainted
May 21 04:00:00 linserv kernel: EIP:    0010:[<c0148f79>]    Not tainted
May 21 04:00:00 linserv kernel: EFLAGS: 00010286
May 21 04:00:00 linserv kernel: EIP is at prune_icache [kernel] 0x99 
May 21 04:00:00 linserv kernel: eax: 0000001b   ebx: dba09588   ecx: 00000001  
edx: 0004543f
May 21 04:00:00 linserv kernel: esi: dba09580   edi: df233c88   ebp: d6febe14  
esp: d6febdf0
May 21 04:00:00 linserv kernel: ds: 0018   es: 0018   ss: 0018
May 21 04:00:01 linserv kernel: Process mrtg (pid: 1625, stackpage=d6feb000)
May 21 04:00:01 linserv kernel: Stack: c0231633 000002ae 00000000 00000009
d7169e48 d7169748 00000000 000000d2 
May 21 04:00:01 linserv kernel:        00000000 000000d2 c0149031 fffffff7
c012ebb3 00000000 000000d2 c1c86130 
May 21 04:00:01 linserv kernel:        000000d2 000000d2 00000001 d6fea000
00000001 c012ed28 000000d2 00000001 
May 21 04:00:01 linserv kernel: Call Trace:
[call_spurious_interrupt+122502/146195] .rodata.str1.1 [kernel] 0x2dce 
May 21 04:00:01 linserv kernel: Call Trace: [<c0231633>] .rodata.str1.1 [kernel]
0x2dce 
May 21 04:00:01 linserv kernel: [shrink_icache_memory+33/64]
shrink_icache_memory [kernel] 0x21 
May 21 04:00:01 linserv kernel: [<c0149031>] shrink_icache_memory [kernel] 0x21 
May 21 04:00:01 linserv kernel: [do_try_to_free_pages+35/80]
do_try_to_free_pages [kernel] 0x23 
May 21 04:00:01 linserv kernel: [<c012ebb3>] do_try_to_free_pages [kernel] 0x23 
May 21 04:00:01 linserv kernel: [try_to_free_pages+40/64] try_to_free_pages
[kernel] 0x28 
May 21 04:00:01 linserv kernel: [<c012ed28>] try_to_free_pages [kernel] 0x28 
May 21 04:00:01 linserv kernel: [__alloc_pages+446/608] __alloc_pages [kernel]
0x1be 
May 21 04:00:01 linserv kernel: [<c012f8fe>] __alloc_pages [kernel] 0x1be 
May 21 04:00:01 linserv kernel: [do_anonymous_page+61/224] do_anonymous_page
[kernel] 0x3d 
May 21 04:00:01 linserv kernel: [<c01253bd>] do_anonymous_page [kernel] 0x3d 
May 21 04:00:01 linserv kernel:
[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-31/kernel/drivers/net/3c+-733947/96]
__insmod_ext3_S.text_L43056 [ext3] 0x5ca5 
May 21 04:00:01 linserv kernel: [<e0848d05>] __insmod_ext3_S.text_L43056 [ext3]
0x5ca5 
May 21 04:00:01 linserv kernel:
[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-31/kernel/drivers/net/3c+-781145/96]
journal_blocks_per_page_Rb3e23b75 [jbd] 0x77 
May 21 04:00:01 linserv kernel: [<e083d4a7>] journal_blocks_per_page_Rb3e23b75
[jbd] 0x77 
May 21 04:00:01 linserv kernel: [do_no_page+50/272] do_no_page [kernel] 0x32 
May 21 04:00:01 linserv kernel: [<c0125492>] do_no_page [kernel] 0x32 
May 21 04:00:01 linserv kernel: [handle_mm_fault+101/224] handle_mm_fault
[kernel] 0x65 
May 21 04:00:01 linserv kernel: [<c01255d5>] handle_mm_fault [kernel] 0x65 
May 21 04:00:01 linserv kernel: [file_read_actor+112/224] file_read_actor
[kernel] 0x70 
May 21 04:00:01 linserv kernel: [<c01283e0>] file_read_actor [kernel] 0x70 
May 21 04:00:01 linserv kernel: [__mark_inode_dirty+42/128] __mark_inode_dirty
[kernel] 0x2a 
May 21 04:00:01 linserv kernel: [<c014824a>] __mark_inode_dirty [kernel] 0x2a 
May 21 04:00:01 linserv kernel: [do_page_fault+0/1168] do_page_fault [kernel] 0x0 
May 21 04:00:01 linserv kernel: [<c01139c0>] do_page_fault [kernel] 0x0 
May 21 04:00:01 linserv kernel: [do_page_fault+378/1168] do_page_fault [kernel]
0x17a 
May 21 04:00:01 linserv kernel: [<c0113b3a>] do_page_fault [kernel] 0x17a 
May 21 04:00:01 linserv kernel: [do_munmap+100/608] do_munmap [kernel] 0x64 
May 21 04:00:01 linserv kernel: [<c01264c4>] do_munmap [kernel] 0x64 
May 21 04:00:01 linserv kernel: [generic_file_read+100/128] generic_file_read
[kernel] 0x64 
May 21 04:00:01 linserv kernel: [<c01284b4>] generic_file_read [kernel] 0x64 
May 21 04:00:01 linserv kernel: [file_read_actor+0/224] file_read_actor [kernel]
0x0 
May 21 04:00:01 linserv kernel: [<c0128370>] file_read_actor [kernel] 0x0 
May 21 04:00:01 linserv kernel: [do_brk+180/352] do_brk [kernel] 0xb4 
May 21 04:00:01 linserv kernel: [<c01267c4>] do_brk [kernel] 0xb4 
May 21 04:00:01 linserv kernel: [sys_brk+169/224] sys_brk [kernel] 0xa9 
May 21 04:00:01 linserv kernel: [<c01258f9>] sys_brk [kernel] 0xa9 
May 21 04:00:01 linserv kernel: [do_page_fault+0/1168] do_page_fault [kernel] 0x0 
May 21 04:00:01 linserv kernel: [<c01139c0>] do_page_fault [kernel] 0x0 
May 21 04:00:01 linserv kernel: [error_code+56/64] error_code [kernel] 0x38 
May 21 04:00:01 linserv kernel: [<c0107058>] error_code [kernel] 0x38 
May 21 04:00:01 linserv kernel: 
May 21 04:00:01 linserv kernel: 
May 21 04:00:01 linserv kernel: Code: 0f 0b 58 5a 8b 53 04 8b 03 89 50 04 89 02
8b 53 fc c7 43 04 

In total there are 151 instances before the kernel seemed to crash.

This is on an Athlon XP 1900+  ASUS A7v266EX motherboard.
All filesystems are ext3 and uses RAID 1 on two IDE disk for every filesystem
(and swap).
After reboot everythings comes up fine. fsck finds an orphaned inode in /tmp and
/var which are cleared.

Please let me know if more info is needed.

-Marcel

Comment 12 Stephen Tweedie 2002-05-22 20:06:48 UTC
marcel: Has this happened since the reboot?  It can be a sign of random memory
corruption (in fact, all the cases we have definitely been able to resolve have
come down either to the use of a buggy third-party binary-only kernel module, or
bad hardware).  Running memtest86 overnight is a good first step towards
checking that possibility.

Comment 13 Marcel Mol 2002-05-23 06:50:22 UTC
I checked the system again, I thought I had upgraded the kernel on the 17th, but
found that the system had been running kernel-2.4.9-31 since its installation.
On the 17th I just did upgrade/downgrade a few pacakages. The system was up
since may 6th. To summarise: on the 17th in the morning I downgraded from samba
2.2.4 to 2.2.3a (because 2.2.4 had problems with listing print queues). But only
from 1t7th evening on the BUGs are reported. 
I'll try to run memtest this week (need to pick a proper time as it is a
production file server) and let you know the results.

Comment 14 Stephen Tweedie 2002-05-23 08:47:35 UTC
Update received in email (please use bugzilla if you can!) from "Gerard Berthet"
<gberthet>:

I am confirming that a similar problem on my server was indeed caused
by a faulty memory (detected using memtest86). After we replaced the
memory stick, the problem did not reappeared.

Gerard