Bug 90990
Summary: | Unable to handle kernel NULL pointer dereference/paging request | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Toni Parviainen <tonitop> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 9 | CC: | sct |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:40:56 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 92013 |
Description
Toni Parviainen
2003-05-16 06:56:52 UTC
Looks very much like hardware memory corruption. The places you're hitting the OOPSes are locations where the kernel is walking long lists of data structures, and these are exactly the locations which you expect to see OOPS randomly in cases where you've got bad memory. memtest86 is the advised next step. http://www.memtest86.com/ I run the memtest86 for about 48 hours and it passed all the test. Now I've been up and running for about 11 days without any of these messages. However now I got another bug which might be related to this. I reported it (bug id # 92013) Althought the memtest86 didn't find anything, I changed the memory module (256 -> 128MB). I also added one fan just in case. Then I removed the swap partition and recreated it (mkswap -c ... didn't find anything). Today I got the another kernel oops: kernel: Unable to handle kernel paging request at virtual address a01cb0a9 kernel: printing eip: kernel: c0154248 kernel: *pde = 00000000 kernel: Oops: 0000 kernel: autofs 3c59x ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat ip_conntrack iptable_filter ip_ta bles sg sr_mod ide-scsi scsi_mod ide-cd cdrom ext3 jbd kernel: CPU: 0 kernel: EIP: 0060:[<c0154248>] Not tainted kernel: EFLAGS: 00010282 kernel: kernel: EIP is at iput [kernel] 0x28 (2.4.20-13.9) kernel: eax: 00000000 ebx: c57929c0 ecx: c57929d0 edx: c57929d0 kernel: esi: a01cb089 edi: 00000000 ebp: 00000063 esp: c7feff94 kernel: ds: 0068 es: 0068 ss: 0068 kernel: Process kswapd (pid: 5, stackpage=c7fef000) kernel: Stack: c4b51ad8 c4b51ac0 c57929c0 c0151f40 c57929c0 c7fee000 00000000 000001d0 kernel: 00000000 c01522c5 00000286 c0137480 00000006 000001d0 c7fee000 00000000 kernel: 00000002 00000000 c0137726 000001d0 c01376b0 00000000 00000000 c01072ad kernel: Call Trace: [<c0151f40>] prune_dcache [kernel] 0xc0 (0xc7feffa0)) kernel: [<c01522c5>] shrink_dcache_memory [kernel] 0x25 (0xc7feffb8)) kernel: [<c0137480>] do_try_to_free_pages_kswapd [kernel] 0x10 (0xc7feffc0)) kernel: [<c0137726>] kswapd [kernel] 0x76 (0xc7feffdc)) kernel: [<c01376b0>] kswapd [kernel] 0x0 (0xc7feffe4)) kernel: [<c01072ad>] kernel_thread_helper [kernel] 0x5 (0xc7fefff0)) kernel: kernel: kernel: Code: 8b 46 20 85 c0 74 02 89 c7 85 ff 74 0b 8b 47 18 85 c0 0f 85 lsmod displays: Module Size Used by Not tainted autofs 12148 0 (autoclean) (unused) 3c59x 29392 1 ipt_REJECT 3736 1 (autoclean) ipt_limit 1496 2 (autoclean) ipt_LOG 4120 4 (autoclean) ipt_state 1048 5 (autoclean) iptable_nat 20568 0 (autoclean) (unused) ip_conntrack 26088 2 (autoclean) [ipt_state iptable_nat] iptable_filter 2316 1 (autoclean) ip_tables 14488 8 [ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat iptable_filter] sg 34572 0 (autoclean) sr_mod 16856 0 (autoclean) ide-scsi 11120 0 scsi_mod 103000 3 [sg sr_mod ide-scsi] ide-cd 33440 0 cdrom 31040 0 [sr_mod ide-cd] ext3 64704 4 jbd 47828 4 [ext3] I'm not sure why there is modules ide-scsi and scsi_mod since I don't have any scsi hardware? I only have 3 HDs and CD-R. cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: MITSUMI Model: CR-2801TE Rev: 1.10 Type: CD-ROM ANSI SCSI revision: 02 That is the CD-R I have and it is not scsi, it is ide? cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 5 model : 8 model name : AMD-K6(tm) 3D processor stepping : 12 cpu MHz : 451.017 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr bogomips : 901.12 cat /proc/pci PCI devices found: Bus 0, device 0, function 0: Host bridge: ALi Corporation. [ALi] M1541 (rev 4). Master Capable. Latency=64. Non-prefetchable 32 bit memory at 0xe5000000 [0xe5ffffff]. Bus 0, device 1, function 0: PCI bridge: ALi Corporation. [ALi] M1541 PCI to AGP Controller (rev 4). Master Capable. Latency=64. Bus 0, device 3, function 0: Bridge: ALi Corporation. [ALi] M7101 PMU (rev 0). Bus 0, device 7, function 0: ISA bridge: ALi Corporation. [ALi] M1533 PCI to ISA Bridge [Aladdin IV] (rev 195). Bus 0, device 10, function 0: Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 116). IRQ 5. Master Capable. Latency=64. Min Gnt=10.Max Lat=10. I/O at 0xd800 [0xd87f]. Non-prefetchable 32 bit memory at 0xe4000000 [0xe400007f]. Bus 0, device 12, function 0: VGA compatible controller: Tseng Labs Inc ET6000 (rev 96). IRQ 11. Non-prefetchable 32 bit memory at 0xe3000000 [0xe3ffffff]. I/O at 0xd400 [0xd4ff]. Bus 0, device 15, function 0: IDE interface: ALi Corporation. [ALi] M5229 IDE (rev 193). Master Capable. Latency=32. Min Gnt=2.Max Lat=4. I/O at 0xd000 [0xd00f]. Since there is a possibility that this is related to the hard drives (fsck didn't find anything), here is the information about them. hdparm /dev/hda multcount = 16 (on) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 8 (on) geometry = 3737/255/63, sectors = 60036480, start = 0 hdparm /dev/hdb multcount = 16 (on) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 8 (on) geometry = 15017/255/63, sectors = 241254720, start = 0 hdparm /dev/hdd multcount = 16 (on) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 8 (on) geometry = 9964/255/63, sectors = 160086528, start = 0 I've made the other bug, 92013, depend on this one --- both are just different symptoms of the same underlying memory corruption, not separate bugs. This still looks like hardware to me. It could be an unclean power supply that can't quite cope under heavy disk load, a problem on the motherboard when doing DMA and heavy CPU memory access at the same time, or any number of things like that. Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |