Bug 90990

Summary:	Unable to handle kernel NULL pointer dereference/paging request
Product:	[Retired] Red Hat Linux	Reporter:	Toni Parviainen <tonitop>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED WONTFIX	QA Contact:	Brian Brock <bbrock>
Severity:	high	Docs Contact:
Priority:	medium
Version:	9	CC:	sct
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-09-30 15:40:56 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	92013

Description Toni Parviainen 2003-05-16 06:56:52 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2)
Gecko/20030208 Netscape/7.02

Description of problem:
uname -a
Linux pullasorsa 2.4.20-9 #1 Wed Apr 2 13:15:01 EST 2003 i586 i586 i386 GNU/Linux

Usually during heavy disk access, I get one of the following
messages into /var/log/messages

Unable to handle kernel NULL pointer dereference at virtual address 0000002b
printing eip:
c0153c6a
*pde = 00000000
Oops: 0000
autofs 3c59x ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat ip_conntrack
iptable_filter
ip_tables sg sr_mod ide-scsi scsi_mod ide-cd cdrom ext3 jbd
CPU:    0
EIP:    0060:[<c0153c6a>]    Not tainted
EFLAGS: 00010217

EIP is at find_inode [kernel] 0x1a (2.4.20-9)
eax: 00000000   ebx: 00000003   ecx: 00003fff   edx: 00000000
esi: 00000003   edi: c1afc888   ebp: 003068fb   esp: c87c7e6c
ds: 0068   es: 0068   ss: 0068
Process updatedb (pid: 10716, stackpage=c87c7000)
Stack: 003068fb c1afc888 003068fb c1a6e400 c0153f83 c1a6e400 003068fb c1afc888
00000000 00000000 003068fb c5e981a0 ceeee2a0 c5e981a0 d0820f88 c1a6e400
003068fb 00000000 00000000 cbc34230 fffffff4 ceeee30c c014990a ceeee2a0
Call Trace:   [<c0153f83>] iget4 [kernel] 0x43 (0xc87c7e7c))
[<d0820f88>] ext3_lookup [ext3] 0x58 (0xc87c7ea4))
[<c014990a>] real_lookup [kernel] 0x9a (0xc87c7ec4))
[<c0149de8>] link_path_walk [kernel] 0x3c8 (0xc87c7ee0))
[<c014a251>] path_lookup [kernel] 0x21 (0xc87c7f20))
[<c014a47a>] __user_walk [kernel] 0x2a (0xc87c7f30))
[<c0146537>] vfs_lstat [kernel] 0x17 (0xc87c7f44))
[<c0146a81>] sys_lstat64 [kernel] 0x11 (0xc87c7f70))
[<c0120001>] proc_dostring [kernel] 0x41 (0xc87c7f84))
[<c0109103>] system_call [kernel] 0x33 (0xc87c7fc0))

Code: 39 6b 28 75 f1 8b 44 24 14 39 83 94 00 00 00 75 e5 8b 44 24
---------------------------------------------------------------------------------------
Unable to handle kernel paging request at virtual address 240489ff
printing eip:
c01523d1
*pde = 00000000
Oops: 0000
autofs 3c59x ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat ip_conntrack
iptable_f
ilter ip_tables sg sr_mod ide-scsi scsi_mod ide-cd cdrom ext3 jbd
CPU:    0
EIP:    0060:[<c01523d1>]    Not tainted
EFLAGS: 00010203

EIP is at d_lookup [kernel] 0x61 (2.4.20-9)
eax: cff92a58   ebx: 240489ff   ecx: 0000000f   edx: cff80000
esi: 240489ef   edi: 00000000   ebp: b1da24f6   esp: c3fe9eac
ds: 0068   es: 0068   ss: 0068
Process updatedb (pid: 26777, stackpage=c3fe9000)
Stack: c1ac2bb8 240489ef cff92a58 c1a73000 00000006 c3fe9f00 c1a73006 00000000
c3fe9f48 c0149820 c0c4b2a0 c3fe9f00 c1a73000 c0149dba c0c4b2a0 c3fe9f00
00000000 00000008 00000000 c0fdb7c0 00000000 c1a73000 00000006 b1da24f6
Call Trace:   [<c0149820>] cached_lookup [kernel] 0x10 (0xc3fe9ed0))
[<c0149dba>] link_path_walk [kernel] 0x39a (0xc3fe9ee0))
[<c014a251>] path_lookup [kernel] 0x21 (0xc3fe9f20))
[<c014a47a>] __user_walk [kernel] 0x2a (0xc3fe9f30))
[<c0146537>] vfs_lstat [kernel] 0x17 (0xc3fe9f44))
[<c012d726>] do_brk [kernel] 0xf6 (0xc3fe9f50))
[<c0146a81>] sys_lstat64 [kernel] 0x11 (0xc3fe9f70))
[<c012c749>] sys_brk [kernel] 0xd9 (0xc3fe9f9c))
[<c01155d0>] do_page_fault [kernel] 0x0 (0xc3fe9fb0))
[<c0109214>] error_code [kernel] 0x34 (0xc3fe9fb8))
[<c0109103>] system_call [kernel] 0x33 (0xc3fe9fc0))

Code: 8b 1b 39 6e 44 75 e8 8b 7c 24 28 39 7e 0c 75 df 8b 47 4c 85
---------------------------------------------------------------------------------------
Unable to handle kernel paging request at virtual address 240489ff
printing eip:
c01523d1
*pde = 00000000
Oops: 0000
autofs 3c59x ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat ip_conntrack
iptable_f
ilter ip_tables sg sr_mod ide-scsi scsi_mod ide-cd cdrom ext3 jbd
CPU:    0
EIP:    0060:[<c01523d1>]    Not tainted
EFLAGS: 00010203

EIP is at d_lookup [kernel] 0x61 (2.4.20-9)
eax: cff92a58   ebx: 240489ff   ecx: 0000000f   edx: cff80000
esi: 240489ef   edi: 00000000   ebp: e1bd7c66   esp: c87fdeac
ds: 0068   es: 0068   ss: 0068
Process smbd (pid: 25756, stackpage=c87fd000)
Stack: cf988006 240489ef cff92a58 cf98801d 00000013 c87fdf00 cf988030 00000000
c87fdf48 c0149820 c04db9c0 c87fdf00 cf98801d c0149dba c04db9c0 c87fdf00
00000000 00000009 00000000 c04ed860 00000000 cf98801d 00000013 e1bd7c66
Call Trace:   [<c0149820>] cached_lookup [kernel] 0x10 (0xc87fded0))
[<c0149dba>] link_path_walk [kernel] 0x39a (0xc87fdee0))
[<c014a251>] path_lookup [kernel] 0x21 (0xc87fdf20))
[<c014a47a>] __user_walk [kernel] 0x2a (0xc87fdf30))
[<c01464e7>] vfs_stat [kernel] 0x17 (0xc87fdf44))
[<c0146a51>] sys_stat64 [kernel] 0x11 (0xc87fdf70))
[<c013564b>] activate_page_nolock [kernel] 0x18b (0xc87fdf90))
[<c013f233>] sys_close [kernel] 0x43 (0xc87fdfb0))
[<c0109f29>] math_state_restore [kernel] 0x19 (0xc87fdfb8))
[<c0109103>] system_call [kernel] 0x33 (0xc87fdfc0))
Code: 8b 1b 39 6e 44 75 e8 8b 7c 24 28 39 7e 0c 75 df 8b 47 4c 85
---------------------------------------------------------------------------------------

The process is usually updatedb, but it can also be samba or
my backup script. So this happens usually when there is heavy
disk access.

When the process is updatedb, I get following mail from
Cron Daemon:

-----------------------------------
/etc/cron.daily/slocate.cron: line 3: 22773 Segmentation fault     
/usr/bin/updatedb -f "nfs,smbfs,ncpfs,proc,devpts"
-e "/tmp,/var/tmp,/usr/tmp,/afs,/net"
-----------------------------------
/etc/cron.daily/slocate.cron: line 3: 10716 Segmentation fault     
/usr/bin/updatedb -f "nfs,smbfs,ncpfs,proc,devpts"
-e "/tmp,/var/tmp,/usr/tmp,/afs,/net"
-----------------------------------
/etc/cron.daily/slocate.cron: line 3: 26777 Segmentation fault     
/usr/bin/updatedb -f "nfs,smbfs,ncpfs,proc,devpts"
-e "/tmp,/var/tmp,/usr/tmp,/afs,/net"
-------------------------------------


Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
Happens randomily, usually when there is lot of disk access.
The server might run fine for few weeks, but eventually this happens.
    

Additional info:

Comment 1 Stephen Tweedie 2003-05-16 09:39:21 UTC

Looks very much like hardware memory corruption.  The places you're hitting the
OOPSes are locations where the kernel is walking long lists of data structures,
and these are exactly the locations which you expect to see OOPS randomly in
cases where you've got bad memory.  memtest86 is the advised next step.

http://www.memtest86.com/

Comment 2 Toni Parviainen 2003-05-31 07:43:47 UTC

I run the memtest86 for about 48 hours and it passed all the test.
Now I've been up and running for about 11 days without any of these
messages. However now I got another bug which might be related to
this. I reported it (bug id # 92013)

Comment 3 Toni Parviainen 2003-06-04 06:26:55 UTC

Althought the memtest86 didn't find anything, I changed the memory module (256
-> 128MB). I also added one fan just in case. Then I removed the swap partition
and recreated it (mkswap -c ... didn't find anything). 

Today I got the another kernel oops:

kernel: Unable to handle kernel paging request at virtual address a01cb0a9
kernel:  printing eip:
kernel: c0154248
kernel: *pde = 00000000
kernel: Oops: 0000
kernel: autofs 3c59x ipt_REJECT ipt_limit ipt_LOG ipt_state iptable_nat
ip_conntrack iptable_filter ip_ta
bles sg sr_mod ide-scsi scsi_mod ide-cd cdrom ext3 jbd
kernel: CPU:    0
kernel: EIP:    0060:[<c0154248>]    Not tainted
kernel: EFLAGS: 00010282
kernel:
kernel: EIP is at iput [kernel] 0x28 (2.4.20-13.9)
kernel: eax: 00000000   ebx: c57929c0   ecx: c57929d0   edx: c57929d0
kernel: esi: a01cb089   edi: 00000000   ebp: 00000063   esp: c7feff94
kernel: ds: 0068   es: 0068   ss: 0068
kernel: Process kswapd (pid: 5, stackpage=c7fef000)
kernel: Stack: c4b51ad8 c4b51ac0 c57929c0 c0151f40 c57929c0 c7fee000 00000000
000001d0
kernel:        00000000 c01522c5 00000286 c0137480 00000006 000001d0 c7fee000
00000000
kernel:        00000002 00000000 c0137726 000001d0 c01376b0 00000000 00000000
c01072ad
kernel: Call Trace:   [<c0151f40>] prune_dcache [kernel] 0xc0 (0xc7feffa0))
kernel: [<c01522c5>] shrink_dcache_memory [kernel] 0x25 (0xc7feffb8))
kernel: [<c0137480>] do_try_to_free_pages_kswapd [kernel] 0x10 (0xc7feffc0))
kernel: [<c0137726>] kswapd [kernel] 0x76 (0xc7feffdc))
kernel: [<c01376b0>] kswapd [kernel] 0x0 (0xc7feffe4))
kernel: [<c01072ad>] kernel_thread_helper [kernel] 0x5 (0xc7fefff0))
kernel:
kernel:
kernel: Code: 8b 46 20 85 c0 74 02 89 c7 85 ff 74 0b 8b 47 18 85 c0 0f 85

lsmod displays:
Module                  Size  Used by    Not tainted
autofs                 12148   0  (autoclean) (unused)
3c59x                  29392   1
ipt_REJECT              3736   1  (autoclean)
ipt_limit               1496   2  (autoclean)
ipt_LOG                 4120   4  (autoclean)
ipt_state               1048   5  (autoclean)
iptable_nat            20568   0  (autoclean) (unused)
ip_conntrack           26088   2  (autoclean) [ipt_state iptable_nat]
iptable_filter          2316   1  (autoclean)
ip_tables              14488   8  [ipt_REJECT ipt_limit ipt_LOG ipt_state
iptable_nat iptable_filter]
sg                     34572   0  (autoclean)
sr_mod                 16856   0  (autoclean)
ide-scsi               11120   0
scsi_mod              103000   3  [sg sr_mod ide-scsi]
ide-cd                 33440   0
cdrom                  31040   0  [sr_mod ide-cd]
ext3                   64704   4
jbd                    47828   4  [ext3]

I'm not sure why there is modules ide-scsi and scsi_mod since I don't have
any scsi hardware? I only have 3 HDs and CD-R.

cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: MITSUMI  Model: CR-2801TE        Rev: 1.10
  Type:   CD-ROM                           ANSI SCSI revision: 02

That is the CD-R I have and it is not scsi, it is ide?

cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 5
model           : 8
model name      : AMD-K6(tm) 3D processor
stepping        : 12
cpu MHz         : 451.017
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
bogomips        : 901.12

cat /proc/pci
PCI devices found:
  Bus  0, device   0, function  0:
    Host bridge: ALi Corporation. [ALi] M1541 (rev 4).
      Master Capable.  Latency=64.
      Non-prefetchable 32 bit memory at 0xe5000000 [0xe5ffffff].
  Bus  0, device   1, function  0:
    PCI bridge: ALi Corporation. [ALi] M1541 PCI to AGP Controller (rev 4).
      Master Capable.  Latency=64.
  Bus  0, device   3, function  0:
    Bridge: ALi Corporation. [ALi] M7101 PMU (rev 0).
  Bus  0, device   7, function  0:
    ISA bridge: ALi Corporation. [ALi] M1533 PCI to ISA Bridge [Aladdin IV] (rev
195).
  Bus  0, device  10, function  0:
    Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 116).
      IRQ 5.
      Master Capable.  Latency=64.  Min Gnt=10.Max Lat=10.
      I/O at 0xd800 [0xd87f].
      Non-prefetchable 32 bit memory at 0xe4000000 [0xe400007f].
  Bus  0, device  12, function  0:
    VGA compatible controller: Tseng Labs Inc ET6000 (rev 96).
      IRQ 11.
      Non-prefetchable 32 bit memory at 0xe3000000 [0xe3ffffff].
      I/O at 0xd400 [0xd4ff].
  Bus  0, device  15, function  0:
    IDE interface: ALi Corporation. [ALi] M5229 IDE (rev 193).
      Master Capable.  Latency=32.  Min Gnt=2.Max Lat=4.
      I/O at 0xd000 [0xd00f].

Comment 4 Toni Parviainen 2003-06-04 07:17:38 UTC

Since there is a possibility that this is related to the hard drives
(fsck didn't find anything), here is the information about them.

hdparm /dev/hda
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 3737/255/63, sectors = 60036480, start = 0

hdparm /dev/hdb
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 15017/255/63, sectors = 241254720, start = 0

hdparm /dev/hdd
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 9964/255/63, sectors = 160086528, start = 0

Comment 5 Stephen Tweedie 2003-06-04 08:59:41 UTC

I've made the other bug, 92013, depend on this one --- both are just different
symptoms of the same underlying memory corruption, not separate bugs.

This still looks like hardware to me.  It could be an unclean power supply that
can't quite cope under heavy disk load, a problem on the motherboard when doing
DMA and heavy CPU memory access at the same time, or any number of things like that.

Comment 6 Bugzilla owner 2004-09-30 15:40:56 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/