Bug 169363

Summary:	Unable to handle kernel NULL pointer dereference at virtual address 0000000c
Product:	Red Hat Enterprise Linux 4	Reporter:	Need Real Name <ltsai>
Component:	kernel	Assignee:	Stephen Tweedie <sct>
Status:	CLOSED CANTFIX	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.0	CC:	davej, jbaron
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-09-28 11:06:33 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Need Real Name 2005-09-27 14:55:51 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050910 Red Hat/1.7.10-1.1.3.2.centos3

Description of problem:
After installed the RHEL 4 on a Dell poweredge 2650, which use megaraid level 5 (no LVM), the system serves as a mysql database server for 2 weeks and then suddenly got a kernel panic as logged in the /var/log/message as follows:

Sep 27 05:51:29 dbhost kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000000c
Sep 27 05:51:29 dbhost kernel:  printing eip:
Sep 27 05:51:29 dbhost kernel: c01b6959
Sep 27 05:51:29 dbhost kernel: *pde = 2f141001
Sep 27 05:51:29 dbhost kernel: Oops: 0000 [#1]
Sep 27 05:51:29 dbhost kernel: SMP
Sep 27 05:51:29 dbhost kernel: Modules linked in: ipt_REJECT iptable_filter ip_tables nfsd exportfs lockd sunrpc md5 ipv6 dm_mod button battery ac ohci_hcd tg3 floppy aic7xxx sg ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod
Sep 27 05:51:29 dbhost kernel: CPU:    2
Sep 27 05:51:29 dbhost kernel: EIP:    0060:[<c01b6959>]    Not tainted VLI
Sep 27 05:51:29 dbhost kernel: EFLAGS: 00010246   (2.6.9-11.ELsmp)
Sep 27 05:51:29 dbhost kernel: EIP is at rb_insert_color+0x19/0xc1
Sep 27 05:51:29 dbhost kernel: eax: c2266208   ebx: c2266f48   ecx: 45965f46   edx: f74d1480
Sep 27 05:51:29 dbhost kernel: esi: 00000000   edi: c2266208   ebp: f74d1480   esp: cbd4be28
Sep 27 05:51:29 dbhost kernel: ds: 007b   es: 007b   ss: 0068
Sep 27 05:51:29 dbhost kernel: Process bash (pid: 1718, threadinfo=cbd4b000 task=f7734930)
Sep 27 05:51:29 dbhost kernel: Stack: c2266200 c2266f40 c226622a c2266f54 f889da7d f74d1480 c2266f48 f067e55e
Sep 27 05:51:29 dbhost kernel:        45965f46 cf8f90c0 d364df84 cf8f9ff8 cbd4bec0 f88a2b3c cf8f90c0 0000000a
Sep 27 05:51:29 dbhost kernel:        f4518b80 00000000 00000000 f09520f8 f74d1480 f4518b80 f88a2bd5 cbd4bec0
Sep 27 05:51:29 dbhost kernel: Call Trace:
Sep 27 05:51:29 dbhost kernel:  [<f889da7d>] ext3_htree_store_dirent+0x147/0x151 [ext3]
Sep 27 05:51:29 dbhost kernel:  [<f88a2b3c>] htree_dirblock_to_tree+0x78/0xb6 [ext3]
Sep 27 05:51:29 dbhost kernel:  [<f88a2bd5>] ext3_htree_fill_tree+0x5b/0x176 [ext3]
Sep 27 05:51:29 dbhost kernel:  [<f889dc4e>] ext3_dx_readdir+0x112/0x198 [ext3]
Sep 27 05:51:29 dbhost kernel:  [<c0165ee0>] filldir64+0x0/0x11a
Sep 27 05:51:29 dbhost kernel:  [<f889d551>] ext3_readdir+0x8c/0x3a0 [ext3]
Sep 27 05:51:29 dbhost kernel:  [<c0165ee0>] filldir64+0x0/0x11a
Sep 27 05:51:29 dbhost kernel:  [<c0165ee0>] filldir64+0x0/0x11a
Sep 27 05:51:29 dbhost kernel:  [<c0165c1d>] vfs_readdir+0x7d/0xa5
Sep 27 05:51:29 dbhost kernel:  [<c016605f>] sys_getdents64+0x65/0x9f
Sep 27 05:51:29 dbhost kernel:  [<c02c7377>] syscall_call+0x7/0xb
Sep 27 05:51:29 dbhost kernel: Code: 75 05 89 50 08 eb 07 89 50 0c eb 02 89 13 89 11 5b c3 55 89 d5 57 89 c7 56 53 e9 9b 00 00 00 83 7b 04 00 0f 85 9b 00 00 00 8b 33 <8b> 46 0c 39 c3 75 3a 8b 46 08 85 c0 74 06 83 78 04 00 74 37 39
Sep 27 05:51:29 dbhost kernel:  <0>Fatal exception: panic in 5 seconds



Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-11.EL

How reproducible:
Didn't try

Steps to Reproduce:
1.reboot it 
2.wait it to crash again
3.view the error from log and console
  

Additional info:

Comment 1 Stephen Tweedie 2005-09-28 11:06:33 UTC

The oops is coming from the rbtree code, in rb_insert_color():

void rb_insert_color(struct rb_node *node, struct rb_root *root)
{
	struct rb_node *parent, *gparent;

	while ((parent = node->rb_parent) && parent->rb_color == RB_RED)
	{
		gparent = parent->rb_parent;

		if (parent == gparent->rb_left)

and "gparent" is NULL.  This is a corruption in the core rbtree data structure;
it's not obviously ext3's fault, as the rbtree code is entirely independent of ext3.

This could be bad hardware; it could be some other kernel code stomping on the
memory; or a genuine bug; but I've not seen such a bug reported before, nor has
there been any change recently in upstream kernels in this area.

To diagnose this further will need support help; there isn't enough information
here to start any sort of engineering fix.  

For official Red Hat Enterprise Linux support, please log into the Red Hat
support website at http://www.redhat.com/support and file a support ticket,
or alternatively contact Red Hat Global Support Services at 1-888-RED-HAT1
to speak directly with a support associate and escalate an issue.