From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922 Description of problem: Any ideas? <Nov/14 05:40 am>Red Hat Linux Advanced Server release 2.1AS/i686 (Pensacola) <Nov/15 04:30 am>ccmdb1.redhat.com login: Unable to handle kernel NULL pointer dereference at virtual address 00000043 <Nov/15 04:30 am>*pde = 00000000 <Nov/15 04:30 am>Oops: 0002 <Nov/15 04:30 am>Kernel 2.4.9-e.27smp <Nov/15 04:30 am><Nov/15 04:30 am>CPU: 1 <Nov/15 04:30 am>EIP: 0010:[<e1107e63>] Not tainted <Nov/15 04:30 am>EFLAGS: 00010286 <Nov/15 04:30 am>EIP is at ___strtok_Rsmp_29805c13 [] 0x20cf824f <Nov/15 04:30 am>eax: 00000043 ebx: cc484320 ecx: c15bdf8f edx: f02ca3c0<Nov/15 04:30 am><Nov/15 04:30 am>esi: 00000001 edi: c15bdf90 ebp: 00000000 esp: f02e7e08 <Nov/15 04:30 am>ds: 0018 es: 0018 ss: 0018 <Nov/15 04:30 am>Process oracle (pid: 3651, stackpage=f02e7000) <Nov/15 04:30 am>Stack: 00000043 f8870fe2 f887c68f 00000352 f0daef00 00000001 f0daef00 d59e0000 <Nov/15 04:30 am> f02e7e38 00001000 0000001e f0daef00 00000070 00000014 f8867f30 f8864cd7 <Nov/15 04:30 am> 00000014 00000070 c15bdf90 f02ca3c0 cc486820 c15bdf90 c0149832 f02ca3c0 <Nov/15 04:30 am>Call Trace: [<f8870fe2>] ext3_getblk [ext3] 0x52 <Nov/15 04:30 am>[<f887c68f>] .LC9 [ext3] 0xaf <Nov/15 04:30 am>[<f8867f30>] .LC13 [jbd] 0x0 <Nov/15 04:30 am>[<f8864cd7>] __jbd_kmalloc [jbd] 0x27 <Nov/15 04:30 am>[<c0149832>] block_prepare_write [kernel] 0x22 <Nov/15 04:30 am>[<f8870f20>] ext3_get_block [ext3] 0x0 <Nov/15 04:30 am>[<f88714b1>] ext3_prepare_write [ext3] 0xb1 <Nov/15 04:30 am>[<f8870f20>] ext3_get_block [ext3] 0x0 <Nov/15 04:30 am>[<c01309bb>] add_to_page_cache_unique [kernel] 0xcb <Nov/15 04:30 am>[<c0133cc4>] generic_file_write [kernel] 0x434 <Nov/15 04:30 am>[<c0131b77>] generic_file_new_read [kernel] 0x67 <Nov/15 04:30 am>[<c01319f0>] file_read_actor [kernel] 0x0 <Nov/15 04:30 am>[<f886ecd2>] ext3_file_write [ext3] 0x22 <Nov/15 04:30 am>[<c014635a>] sys_pwrite [kernel] 0xba <Nov/15 04:30 am>[<c010c990>] sys_ipc [kernel] 0x40 <Nov/15 04:30 am>[<c012048b>] sys_gettimeofday [kernel] 0x1b <Nov/15 04:30 am>[<c01073c3>] system_call [kernel] 0x33 <Nov/15 04:30 am>Code: 00 00 00 00 00 05 09 a4 81 01 00 00 00 fb 00 00 00 fa 00 00 <Nov/15 04:30 am> <0>Kernel panic: not continuing Version-Release number of selected component (if applicable): kernel-smp-2.4.9-e.27 How reproducible: Didn't try Steps to Reproduce: 1. This is a production system. 2. Run it for a while, and sometimes it crashes. Additional info:
"sometimes it crashes." That implies multiple crashes. Could we see some of the other oops messages? This oops looks like we've got a return address on the stack that points at a "ud2a" undefined instruction --- ie. the middle of a BUG() call. That would imply that we should see other diagnostics about the bug just prior to the oops --- please include those. But EIP is pointing to totally bogus Code: (the 00 00 00 00 00 garbage is interpreted as "add %al,(%eax)" hence the oops dereferencing %eax). That implies that the real problem is elsewhere, possibly hardware. More info will be needed to take this further.
After what seems like endless problems with this server, filesystem corruption, and rpm database corruption (disk blocks shared with other files on the filesystem!!!), the hardware has been replaced, and reinstalled and patched with the latest (hence running e.34smp). Feel free to close this bugzilla. If we run into future issues we'll open a new ticket. (The old hardware passes all hw diags, either dell, or memtest86). Thanks Kambiz
OK, thanks for letting us know!