Bug 74434

Summary: Unable to handle NULL kernel pointer dereference at virtual address 00000008
Product: [Retired] Red Hat Linux Reporter: Umaid Singh Rajpurohit <sunadm>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.2CC: the_end
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-09-24 09:21:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Umaid Singh Rajpurohit 2002-09-24 09:21:47 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.7-10 i686)

Description of problem:
The error is less frequent then on the kernel 2.4.2-2smp but it occurs once in a
while. Not always but sometimes the system stops responding to anything and need
a hard reset.Current kernel 2.4.9-13smp.

Version-Release number of selected component (if applicable):


How reproducible:
Couldn't Reproduce

Steps to Reproduce:
1.Unable to reproduce but it occurs quite frequently
2.
3.
	

Additional info:

A. uname -a 
Linux linhost010 2.4.9-13smp #1 SMP Fri May 24 13:53:20 IST 2002 i686 unknown
B. Services (autofs,NIS)
C. other Softwares: 

1.Rational's Clearcase 
#cleartool -ver
ClearCase version 4.2 (2001A.04.00) (Mon Jul 02 16:33:34 EDT 2001)
clearcase_p4.2-17 (Fri May 24 11:04:02 EDT 2002)
clearcase_p4.2-18 (Fri May 24 10:59:27 EDT 2002)
@(#) MVFS version 4.2+ (Sun Mar 17 14:38:59 EST 2002)
cleartool                         V4.2 (Mon Jun 25 11:57:13 EDT 2001)
db_server                         V4.2+ (Sun Mar 31 12:39:18 EST 2002)
VOB database schema version: 53

2.LSF (load sharing Tool)

D. MESSAGES (/var/log/messages )

Sep 24 12:44:27 linhost010 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000008
Sep 24 12:44:27 linhost010 kernel:  printing eip:
Sep 24 12:44:27 linhost010 kernel: c01486a9
Sep 24 12:44:27 linhost010 kernel: *pde = 00000000
Sep 24 12:44:27 linhost010 kernel: Oops: 0000
Sep 24 12:44:27 linhost010 kernel: CPU:    1
Sep 24 12:44:27 linhost010 kernel: EIP:    0010:[path_walk+2089/2352]    Not
tainted
Sep 24 12:44:27 linhost010 kernel: EIP:    0010:[<c01486a9>]    Not tainted
Sep 24 12:44:27 linhost010 kernel: EFLAGS: 00010246
Sep 24 12:44:27 linhost010 kernel: eax: 00000000   ebx: 00000000   ecx:
c02dac38   edx: f4a4df9c
Sep 24 12:44:27 linhost010 kernel: esi: cfc969c0   edi: f4b253c0   ebp:
f4a4df9c   esp: f4a4df28
Sep 24 12:44:27 linhost010 kernel: ds: 0018   es: 0018   ss: 0018
Sep 24 12:44:27 linhost010 kernel: Process fuser (pid: 9849, stackpage=f4a4d000)
Sep 24 12:44:27 linhost010 kernel: Stack: 00000009 00000000 cfc969c0 000041ed
0000001f 00000282 00000000 00000000
Sep 24 12:44:27 linhost010 kernel:        00001000 fffffff4 c3038000 c3038000
00000003 00241703 0804ae84 00000000
Sep 24 12:44:27 linhost010 kernel:        c3038000 f4a4df9c bfffd7a8 c0148d9a
c3038000 f4a4df9c f4a4c000 00000295
Sep 24 12:44:27 linhost010 kernel: Call Trace: [__user_walk+58/96] __user_walk
[kernel] 0x3a
Sep 24 12:44:27 linhost010 kernel: Call Trace: [<c0148d9a>] __user_walk [kernel]
0x3a
Sep 24 12:44:27 linhost010 kernel: [sys_stat64+19/112] sys_stat64 [kernel] 0x13
Sep 24 12:44:27 linhost010 kernel: [<c01454a3>] sys_stat64 [kernel] 0x13
Sep 24 12:44:27 linhost010 kernel: [system_call+51/56] system_call [kernel] 0x33
Sep 24 12:44:27 linhost010 kernel: [<c010719b>] system_call [kernel] 0x33
Sep 24 12:44:27 linhost010 kernel:
Sep 24 13:22:01 linhost010 syslogd 1.4.1: restart.
Sep 24 13:22:01 linhost010 syslog: syslogd startup succeeded
Sep 24 13:22:01 linhost010 kernel: klogd 1.4.1, log source = /proc/kmsg started.
            No option but a hard reset. Please help me out,

Comment 1 Arjan van de Ven 2002-09-24 09:24:58 UTC
Clearcase is a binary only kernel module and as a result, unsupported.
I also recommend that you upgrade to the latest erratum kernel we released for
7.2 (2.4.9-34 right now) since quite a few bugs have been fixed.


Comment 2 Need Real Name 2002-11-26 14:53:12 UTC
I have this same exact problem. 

Hardware: IBM x330 pIII 1.2ghz
RedHat 7.2 
Rational Clearcase v4.2
LSF 4.10

Error message:

Unable to handle kernel NULL point
er dereference at virtual address 00000098 
  printing eip: 
 f8d0c094 
 *pde = 00000000 
 Oops: 0000 
 CPU:    0 
 EIP:    0010:[<f8d0c094>] 
 EFLAGS: 00010286 
 eax: f01478b4   ebx: 00000000   ecx: 00000000   edx: f01478b4 
 esi: f01478b4   edi: f0147800   ebp: d5b39cb8   esp: d5b39ca0 
 ds: 0018   es: 0018   ss: 0018 
 Process vlib (pid: 2227, stackpage=d5b39000) 
 Stack: 00000000 e6e017a0 d5b39d2c c99a1be0 c013f830 ed62c5e0 d5b39d48 f8cec010  
        00000000 e6e017a0 f01478b4 
00000000 effa8da0 000000f0 d5b39cf8 f8d105ac  
        00000000 f0147800 00000000 
f8d0bb32 c99a1be0 00000000 00000000 00000000  
 Call Trace: [path_release+16/48] [
<f8cec010>] [<f8d105ac>] [<f8d0bb32>] [<f8d264fc>]  
 Call Trace: [<c013f830>] [<f8cec01
0>] [<f8d105ac>] [<f8d0bb32>] [<f8d264fc>]  
    [<f8d01ad4>] [<f8d25f84>] [<f8c
ec658>] [<f8cec7b1>] [<f8cf04b8>] [<f8ce6fae>]  
    [<f8ce6f88>] [<f8cf02da>] [<f8d
0b1c1>] [<f8d0b1b7>] [<f8ce6bed>] [<f8d08988>]  
    [notify_change+94/288] [<f8d25f
c8>] [do_truncate+107/160] [<f8d0a897>] [lookup_hash+106/144] [open_namei+1109/1
456]  
    [<c014a20e>] [<f8d25fc8>] [<c01
33e0b>] [<f8d0a897>] [<c014063a>] [<c0140c75>]  
    [<f8cce04d>] [filp_open+54/96] 
[sys_open+54/176] [system_call+51/56]  
    [<f8cce04d>] [<c0134d16>] [<c01
35006>] [<c0106f0b>]  
  
 Code: 83 bb 98 00 00 00 00 74 1a 8
b 93 98 00 00 00 83 7a 34 00 74  

I would like to add that ksymoops reports back that this is indeed an issue with
the mvfs.o kernel module. Upgrading to latest RedHat erratta kernel is not an
option to some sites due to the fact that the binary only kernel module does
checksumming before loading . So unless they carefully craft their kernel, it
will not work.

There have been reports of custom kernels being built and used with the mvfs.o
module (after rebuilding it). Specifically, one of our remote sites have these
combinations working:

2.4.9-13 with Clearcase Patch Clearcase_p4.2-17 and Clearcase_p4.2-18
2.4.9-31 with Clearcase Patch Clearcase_p4.2-17 and Clearcase_p4.2-18

These kernel/mvfs module combinations work and the bug does not seem to be
present in them.

regards,
Ladd Hebert
mailto: lhebert.ti.com

Comment 3 Arjan van de Ven 2002-11-26 14:57:27 UTC
the_end: your problem is running an unsupported configuration.
Configurations that use binary only kernel modules are not supported for several
reasons, one of them is that you can't move to a fixed kernel.