From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 Description of problem: uname -a Linux pullasorsa 2.4.20-9 #1 Wed Apr 2 13:15:01 EST 2003 i586 i586 i386 GNU/Linux Usually during heavy disk access, I get one of the following messages into /var/log/messages Unable to handle kernel NULL pointer dereference at virtual address 0000002b printing eip: c0153c6a *pde = 00000000 Oops: 0000 autofs 3c59x ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat ip_conntrack iptable_filter ip_tables sg sr_mod ide-scsi scsi_mod ide-cd cdrom ext3 jbd CPU: 0 EIP: 0060:[<c0153c6a>] Not tainted EFLAGS: 00010217 EIP is at find_inode [kernel] 0x1a (2.4.20-9) eax: 00000000 ebx: 00000003 ecx: 00003fff edx: 00000000 esi: 00000003 edi: c1afc888 ebp: 003068fb esp: c87c7e6c ds: 0068 es: 0068 ss: 0068 Process updatedb (pid: 10716, stackpage=c87c7000) Stack: 003068fb c1afc888 003068fb c1a6e400 c0153f83 c1a6e400 003068fb c1afc888 00000000 00000000 003068fb c5e981a0 ceeee2a0 c5e981a0 d0820f88 c1a6e400 003068fb 00000000 00000000 cbc34230 fffffff4 ceeee30c c014990a ceeee2a0 Call Trace: [<c0153f83>] iget4 [kernel] 0x43 (0xc87c7e7c)) [<d0820f88>] ext3_lookup [ext3] 0x58 (0xc87c7ea4)) [<c014990a>] real_lookup [kernel] 0x9a (0xc87c7ec4)) [<c0149de8>] link_path_walk [kernel] 0x3c8 (0xc87c7ee0)) [<c014a251>] path_lookup [kernel] 0x21 (0xc87c7f20)) [<c014a47a>] __user_walk [kernel] 0x2a (0xc87c7f30)) [<c0146537>] vfs_lstat [kernel] 0x17 (0xc87c7f44)) [<c0146a81>] sys_lstat64 [kernel] 0x11 (0xc87c7f70)) [<c0120001>] proc_dostring [kernel] 0x41 (0xc87c7f84)) [<c0109103>] system_call [kernel] 0x33 (0xc87c7fc0)) Code: 39 6b 28 75 f1 8b 44 24 14 39 83 94 00 00 00 75 e5 8b 44 24 --------------------------------------------------------------------------------------- Unable to handle kernel paging request at virtual address 240489ff printing eip: c01523d1 *pde = 00000000 Oops: 0000 autofs 3c59x ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat ip_conntrack iptable_f ilter ip_tables sg sr_mod ide-scsi scsi_mod ide-cd cdrom ext3 jbd CPU: 0 EIP: 0060:[<c01523d1>] Not tainted EFLAGS: 00010203 EIP is at d_lookup [kernel] 0x61 (2.4.20-9) eax: cff92a58 ebx: 240489ff ecx: 0000000f edx: cff80000 esi: 240489ef edi: 00000000 ebp: b1da24f6 esp: c3fe9eac ds: 0068 es: 0068 ss: 0068 Process updatedb (pid: 26777, stackpage=c3fe9000) Stack: c1ac2bb8 240489ef cff92a58 c1a73000 00000006 c3fe9f00 c1a73006 00000000 c3fe9f48 c0149820 c0c4b2a0 c3fe9f00 c1a73000 c0149dba c0c4b2a0 c3fe9f00 00000000 00000008 00000000 c0fdb7c0 00000000 c1a73000 00000006 b1da24f6 Call Trace: [<c0149820>] cached_lookup [kernel] 0x10 (0xc3fe9ed0)) [<c0149dba>] link_path_walk [kernel] 0x39a (0xc3fe9ee0)) [<c014a251>] path_lookup [kernel] 0x21 (0xc3fe9f20)) [<c014a47a>] __user_walk [kernel] 0x2a (0xc3fe9f30)) [<c0146537>] vfs_lstat [kernel] 0x17 (0xc3fe9f44)) [<c012d726>] do_brk [kernel] 0xf6 (0xc3fe9f50)) [<c0146a81>] sys_lstat64 [kernel] 0x11 (0xc3fe9f70)) [<c012c749>] sys_brk [kernel] 0xd9 (0xc3fe9f9c)) [<c01155d0>] do_page_fault [kernel] 0x0 (0xc3fe9fb0)) [<c0109214>] error_code [kernel] 0x34 (0xc3fe9fb8)) [<c0109103>] system_call [kernel] 0x33 (0xc3fe9fc0)) Code: 8b 1b 39 6e 44 75 e8 8b 7c 24 28 39 7e 0c 75 df 8b 47 4c 85 --------------------------------------------------------------------------------------- Unable to handle kernel paging request at virtual address 240489ff printing eip: c01523d1 *pde = 00000000 Oops: 0000 autofs 3c59x ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat ip_conntrack iptable_f ilter ip_tables sg sr_mod ide-scsi scsi_mod ide-cd cdrom ext3 jbd CPU: 0 EIP: 0060:[<c01523d1>] Not tainted EFLAGS: 00010203 EIP is at d_lookup [kernel] 0x61 (2.4.20-9) eax: cff92a58 ebx: 240489ff ecx: 0000000f edx: cff80000 esi: 240489ef edi: 00000000 ebp: e1bd7c66 esp: c87fdeac ds: 0068 es: 0068 ss: 0068 Process smbd (pid: 25756, stackpage=c87fd000) Stack: cf988006 240489ef cff92a58 cf98801d 00000013 c87fdf00 cf988030 00000000 c87fdf48 c0149820 c04db9c0 c87fdf00 cf98801d c0149dba c04db9c0 c87fdf00 00000000 00000009 00000000 c04ed860 00000000 cf98801d 00000013 e1bd7c66 Call Trace: [<c0149820>] cached_lookup [kernel] 0x10 (0xc87fded0)) [<c0149dba>] link_path_walk [kernel] 0x39a (0xc87fdee0)) [<c014a251>] path_lookup [kernel] 0x21 (0xc87fdf20)) [<c014a47a>] __user_walk [kernel] 0x2a (0xc87fdf30)) [<c01464e7>] vfs_stat [kernel] 0x17 (0xc87fdf44)) [<c0146a51>] sys_stat64 [kernel] 0x11 (0xc87fdf70)) [<c013564b>] activate_page_nolock [kernel] 0x18b (0xc87fdf90)) [<c013f233>] sys_close [kernel] 0x43 (0xc87fdfb0)) [<c0109f29>] math_state_restore [kernel] 0x19 (0xc87fdfb8)) [<c0109103>] system_call [kernel] 0x33 (0xc87fdfc0)) Code: 8b 1b 39 6e 44 75 e8 8b 7c 24 28 39 7e 0c 75 df 8b 47 4c 85 --------------------------------------------------------------------------------------- The process is usually updatedb, but it can also be samba or my backup script. So this happens usually when there is heavy disk access. When the process is updatedb, I get following mail from Cron Daemon: ----------------------------------- /etc/cron.daily/slocate.cron: line 3: 22773 Segmentation fault /usr/bin/updatedb -f "nfs,smbfs,ncpfs,proc,devpts" -e "/tmp,/var/tmp,/usr/tmp,/afs,/net" ----------------------------------- /etc/cron.daily/slocate.cron: line 3: 10716 Segmentation fault /usr/bin/updatedb -f "nfs,smbfs,ncpfs,proc,devpts" -e "/tmp,/var/tmp,/usr/tmp,/afs,/net" ----------------------------------- /etc/cron.daily/slocate.cron: line 3: 26777 Segmentation fault /usr/bin/updatedb -f "nfs,smbfs,ncpfs,proc,devpts" -e "/tmp,/var/tmp,/usr/tmp,/afs,/net" ------------------------------------- Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: Happens randomily, usually when there is lot of disk access. The server might run fine for few weeks, but eventually this happens. Additional info:
Looks very much like hardware memory corruption. The places you're hitting the OOPSes are locations where the kernel is walking long lists of data structures, and these are exactly the locations which you expect to see OOPS randomly in cases where you've got bad memory. memtest86 is the advised next step. http://www.memtest86.com/
I run the memtest86 for about 48 hours and it passed all the test. Now I've been up and running for about 11 days without any of these messages. However now I got another bug which might be related to this. I reported it (bug id # 92013)
Althought the memtest86 didn't find anything, I changed the memory module (256 -> 128MB). I also added one fan just in case. Then I removed the swap partition and recreated it (mkswap -c ... didn't find anything). Today I got the another kernel oops: kernel: Unable to handle kernel paging request at virtual address a01cb0a9 kernel: printing eip: kernel: c0154248 kernel: *pde = 00000000 kernel: Oops: 0000 kernel: autofs 3c59x ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat ip_conntrack iptable_filter ip_ta bles sg sr_mod ide-scsi scsi_mod ide-cd cdrom ext3 jbd kernel: CPU: 0 kernel: EIP: 0060:[<c0154248>] Not tainted kernel: EFLAGS: 00010282 kernel: kernel: EIP is at iput [kernel] 0x28 (2.4.20-13.9) kernel: eax: 00000000 ebx: c57929c0 ecx: c57929d0 edx: c57929d0 kernel: esi: a01cb089 edi: 00000000 ebp: 00000063 esp: c7feff94 kernel: ds: 0068 es: 0068 ss: 0068 kernel: Process kswapd (pid: 5, stackpage=c7fef000) kernel: Stack: c4b51ad8 c4b51ac0 c57929c0 c0151f40 c57929c0 c7fee000 00000000 000001d0 kernel: 00000000 c01522c5 00000286 c0137480 00000006 000001d0 c7fee000 00000000 kernel: 00000002 00000000 c0137726 000001d0 c01376b0 00000000 00000000 c01072ad kernel: Call Trace: [<c0151f40>] prune_dcache [kernel] 0xc0 (0xc7feffa0)) kernel: [<c01522c5>] shrink_dcache_memory [kernel] 0x25 (0xc7feffb8)) kernel: [<c0137480>] do_try_to_free_pages_kswapd [kernel] 0x10 (0xc7feffc0)) kernel: [<c0137726>] kswapd [kernel] 0x76 (0xc7feffdc)) kernel: [<c01376b0>] kswapd [kernel] 0x0 (0xc7feffe4)) kernel: [<c01072ad>] kernel_thread_helper [kernel] 0x5 (0xc7fefff0)) kernel: kernel: kernel: Code: 8b 46 20 85 c0 74 02 89 c7 85 ff 74 0b 8b 47 18 85 c0 0f 85 lsmod displays: Module Size Used by Not tainted autofs 12148 0 (autoclean) (unused) 3c59x 29392 1 ipt_REJECT 3736 1 (autoclean) ipt_limit 1496 2 (autoclean) ipt_LOG 4120 4 (autoclean) ipt_state 1048 5 (autoclean) iptable_nat 20568 0 (autoclean) (unused) ip_conntrack 26088 2 (autoclean) [ipt_state iptable_nat] iptable_filter 2316 1 (autoclean) ip_tables 14488 8 [ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat iptable_filter] sg 34572 0 (autoclean) sr_mod 16856 0 (autoclean) ide-scsi 11120 0 scsi_mod 103000 3 [sg sr_mod ide-scsi] ide-cd 33440 0 cdrom 31040 0 [sr_mod ide-cd] ext3 64704 4 jbd 47828 4 [ext3] I'm not sure why there is modules ide-scsi and scsi_mod since I don't have any scsi hardware? I only have 3 HDs and CD-R. cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: MITSUMI Model: CR-2801TE Rev: 1.10 Type: CD-ROM ANSI SCSI revision: 02 That is the CD-R I have and it is not scsi, it is ide? cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 5 model : 8 model name : AMD-K6(tm) 3D processor stepping : 12 cpu MHz : 451.017 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr bogomips : 901.12 cat /proc/pci PCI devices found: Bus 0, device 0, function 0: Host bridge: ALi Corporation. [ALi] M1541 (rev 4). Master Capable. Latency=64. Non-prefetchable 32 bit memory at 0xe5000000 [0xe5ffffff]. Bus 0, device 1, function 0: PCI bridge: ALi Corporation. [ALi] M1541 PCI to AGP Controller (rev 4). Master Capable. Latency=64. Bus 0, device 3, function 0: Bridge: ALi Corporation. [ALi] M7101 PMU (rev 0). Bus 0, device 7, function 0: ISA bridge: ALi Corporation. [ALi] M1533 PCI to ISA Bridge [Aladdin IV] (rev 195). Bus 0, device 10, function 0: Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 116). IRQ 5. Master Capable. Latency=64. Min Gnt=10.Max Lat=10. I/O at 0xd800 [0xd87f]. Non-prefetchable 32 bit memory at 0xe4000000 [0xe400007f]. Bus 0, device 12, function 0: VGA compatible controller: Tseng Labs Inc ET6000 (rev 96). IRQ 11. Non-prefetchable 32 bit memory at 0xe3000000 [0xe3ffffff]. I/O at 0xd400 [0xd4ff]. Bus 0, device 15, function 0: IDE interface: ALi Corporation. [ALi] M5229 IDE (rev 193). Master Capable. Latency=32. Min Gnt=2.Max Lat=4. I/O at 0xd000 [0xd00f].
Since there is a possibility that this is related to the hard drives (fsck didn't find anything), here is the information about them. hdparm /dev/hda multcount = 16 (on) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 8 (on) geometry = 3737/255/63, sectors = 60036480, start = 0 hdparm /dev/hdb multcount = 16 (on) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 8 (on) geometry = 15017/255/63, sectors = 241254720, start = 0 hdparm /dev/hdd multcount = 16 (on) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 8 (on) geometry = 9964/255/63, sectors = 160086528, start = 0
I've made the other bug, 92013, depend on this one --- both are just different symptoms of the same underlying memory corruption, not separate bugs. This still looks like hardware to me. It could be an unclean power supply that can't quite cope under heavy disk load, a problem on the motherboard when doing DMA and heavy CPU memory access at the same time, or any number of things like that.
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/