Bug 110245

Summary: Kernel panic: ext3 Unable to handle kernel NULL pointer dereference
Product: Red Hat Enterprise Linux 2.1 Reporter: Kambiz Aghaiepour <kambiz>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-01-08 22:01:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kambiz Aghaiepour 2003-11-17 15:09:01 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922

Description of problem:
Any ideas?

<Nov/14 05:40 am>Red Hat Linux Advanced Server release 2.1AS/i686
(Pensacola)
<Nov/15 04:30 am>ccmdb1.redhat.com login: Unable to handle kernel NULL
pointer dereference at virtual address 00000043
<Nov/15 04:30 am>*pde = 00000000
<Nov/15 04:30 am>Oops: 0002
<Nov/15 04:30 am>Kernel 2.4.9-e.27smp
<Nov/15 04:30 am><Nov/15 04:30 am>CPU:    1
<Nov/15 04:30 am>EIP:    0010:[<e1107e63>]    Not tainted
<Nov/15 04:30 am>EFLAGS: 00010286
<Nov/15 04:30 am>EIP is at ___strtok_Rsmp_29805c13 [] 0x20cf824f
<Nov/15 04:30 am>eax: 00000043   ebx: cc484320   ecx: c15bdf8f
edx: f02ca3c0<Nov/15 04:30 am><Nov/15 04:30 am>esi: 00000001   edi:
c15bdf90   ebp: 00000000   esp: f02e7e08
<Nov/15 04:30 am>ds: 0018   es: 0018   ss: 0018
<Nov/15 04:30 am>Process oracle (pid: 3651, stackpage=f02e7000)
<Nov/15 04:30 am>Stack: 00000043 f8870fe2 f887c68f 00000352 f0daef00
                        00000001 f0daef00 d59e0000
<Nov/15 04:30 am>       f02e7e38 00001000 0000001e f0daef00 00000070
                        00000014 f8867f30 f8864cd7
<Nov/15 04:30 am>       00000014 00000070 c15bdf90 f02ca3c0 cc486820
                        c15bdf90 c0149832 f02ca3c0
<Nov/15 04:30 am>Call Trace: [<f8870fe2>] ext3_getblk [ext3] 0x52
<Nov/15 04:30 am>[<f887c68f>] .LC9 [ext3] 0xaf
<Nov/15 04:30 am>[<f8867f30>] .LC13 [jbd] 0x0
<Nov/15 04:30 am>[<f8864cd7>] __jbd_kmalloc [jbd] 0x27
<Nov/15 04:30 am>[<c0149832>] block_prepare_write [kernel] 0x22
<Nov/15 04:30 am>[<f8870f20>] ext3_get_block [ext3] 0x0
<Nov/15 04:30 am>[<f88714b1>] ext3_prepare_write [ext3] 0xb1
<Nov/15 04:30 am>[<f8870f20>] ext3_get_block [ext3] 0x0
<Nov/15 04:30 am>[<c01309bb>] add_to_page_cache_unique [kernel] 0xcb
<Nov/15 04:30 am>[<c0133cc4>] generic_file_write [kernel] 0x434
<Nov/15 04:30 am>[<c0131b77>] generic_file_new_read [kernel] 0x67
<Nov/15 04:30 am>[<c01319f0>] file_read_actor [kernel] 0x0
<Nov/15 04:30 am>[<f886ecd2>] ext3_file_write [ext3] 0x22
<Nov/15 04:30 am>[<c014635a>] sys_pwrite [kernel] 0xba
<Nov/15 04:30 am>[<c010c990>] sys_ipc [kernel] 0x40
<Nov/15 04:30 am>[<c012048b>] sys_gettimeofday [kernel] 0x1b
<Nov/15 04:30 am>[<c01073c3>] system_call [kernel] 0x33


<Nov/15 04:30 am>Code: 00 00 00 00 00 05 09 a4 81 01 00 00 00 fb 00
00 00 fa 00 00
<Nov/15 04:30 am> <0>Kernel panic: not continuing


Version-Release number of selected component (if applicable):
kernel-smp-2.4.9-e.27

How reproducible:
Didn't try

Steps to Reproduce:
1. This is a production system.
2. Run it for a while, and sometimes it crashes.
    

Additional info:

Comment 1 Stephen Tweedie 2004-01-06 18:36:31 UTC
"sometimes it crashes."  That implies multiple crashes.  Could we see
some of the other oops messages? 

This oops looks like we've got a return address on the stack that
points at a "ud2a" undefined instruction --- ie. the middle of a BUG()
call.  That would imply that we should see other diagnostics about the
bug just prior to the oops --- please include those.  But EIP is
pointing to totally bogus Code: (the 00 00 00 00 00 garbage is
interpreted as "add    %al,(%eax)" hence the oops dereferencing %eax).
 That implies that the real problem is elsewhere, possibly hardware.  

More info will be needed to take this further.

Comment 2 Kambiz Aghaiepour 2004-01-06 19:40:37 UTC
After what seems like endless problems with this server, filesystem
corruption, and rpm database corruption (disk blocks shared with other
files on the filesystem!!!), the hardware has been replaced, and
reinstalled and patched with the latest (hence running e.34smp).  Feel
free to close this bugzilla.  If we run into future issues we'll open
a new ticket.  (The old hardware passes all hw diags, either dell, or
memtest86).

Thanks
Kambiz

Comment 3 Stephen Tweedie 2004-01-08 22:01:04 UTC
OK, thanks for letting us know!