Bug 245159

Summary:	[RHEL5 RT] Oops - Null pointer, wakeup_next_waiter
Product:	Red Hat Enterprise MRG	Reporter:	Jeff Burke <jburke>
Component:	realtime-kernel	Assignee:	Steven Rostedt <srostedt>
Status:	CLOSED WONTFIX	QA Contact:
Severity:	low	Docs Contact:
Priority:	low
Version:	1.0	CC:	bhu, duck, epollard, lgoncalv, williams
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
URL:	http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=194400
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-07-07 13:29:37 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jeff Burke 2007-06-21 12:39:28 UTC

Description of problem:
 During system bootup a panic Oops ocurred: Unable to handle kernel NULL pointer
dereference at 0000000000000040 RIP: [<ffffffff802ab0e9>]
wakeup_next_waiter+0x35/0x19c

Version-Release number of selected component (if applicable):
 2.6.21-32.el5rt

How reproducible:
 Unknown

Steps to Reproduce:
1. It happened while booting. Just finished an install of RHEL5 GA. Then install
ed the rt kernel on that distro base, rebooted and boom!
  
Actual results:
Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP: 
 [<ffffffff802ab0e9>] wakeup_next_waiter+0x35/0x19c
PGD 0 
Oops: 0000 [1] PREEMPT SMP 
CPU 6 
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.21-32.el5rt #1
RIP: 0010:[<ffffffff802ab0e9>]  [<ffffffff802ab0e9>] wakeup_next_waiter+0x35/0x19c
RSP: 0000:ffff8101430c1b80  EFLAGS: 00010097
RAX: ffff8101430c1fd8 RBX: ffff810143045e00 RCX: 0000000000000000
RDX: ffff8101430c1fd8 RSI: 0000000000000001 RDI: ffff81013fc6d710
RBP: ffff8101430c1bb0 R08: 0000000000000000 R09: ffff81013fb9fbc8
R10: 0000000000000000 R11: ffff810143251740 R12: ffff810143045e00
R13: ffffffffffffffe8 R14: 00000000000041ed R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff810143251740(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000040 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff8101430c0000, task ffff81013fc6d040)
Stack:  00000000000000d0 ffff810143045e00 0000000000000207 00000000000000d0
 00000000000041ed ffffffff80310dd4 ffff8101430c1bd0 ffffffff802654c9
 ffff81013fb9f1c0 ffff81014307c180 ffff8101430c1be0 ffffffff80265f1d
Call Trace:
 [<ffffffff802654c9>] rt_spin_lock_slowunlock+0x3e/0x5c
 [<ffffffff80265f1d>] rt_spin_unlock+0x28/0x2a
 [<ffffffff8020a823>] kmem_cache_alloc+0xd1/0xe2
 [<ffffffff80226fde>] alloc_inode+0x2b/0x1a6
 [<ffffffff80241c9d>] new_inode+0x21/0xb9
 [<ffffffff8030f381>] sysfs_new_inode+0x1f/0x10d
 [<ffffffff8030f4b8>] sysfs_create+0x49/0xdd
 [<ffffffff80310a96>] create_dir+0xb3/0x1e2
 [<ffffffff80310c1e>] sysfs_create_dir+0x59/0x77
 [<ffffffff8034fef6>] kobject_shadow_add+0xf4/0x1af
 [<ffffffff8034ffbc>] kobject_add+0xb/0xd
 [<ffffffff80350108>] kobject_register+0x22/0x3e
 [<ffffffff8035046c>] kobject_add_dir+0x52/0x7b
 [<ffffffff8030d13f>] register_disk+0x13d/0x22c
 [<ffffffff80347ec7>] add_disk+0x3c/0x49
 [<ffffffff809a418b>] rd_init+0x140/0x19f
 [<ffffffff809879e5>] init+0x1c3/0x394
 [<ffffffff8025ff68>] child_rip+0xa/0x12


Code: 4d 39 65 58 74 04 0f 0b eb fe 49 8d 74 24 08 4c 89 ef e8 87 
RIP  [<ffffffff802ab0e9>] wakeup_next_waiter+0x35/0x19c
 RSP <ffff8101430c1b80>
CR2: 0000000000000040

Expected results:
 System _should_ boot normally

Additional info:
 Last thing on the serial console before the Oops:

<snip>
Real Time Clock Driver v1.12ac
Non-volatile memory driver v1.2
Linux agpgart interface v0.102 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
erial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:03: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
stopped custom tracer.
</snip>

This was also after the Oops:
Kernel panic - not syncing: Attempted to kill init!

Call Trace:
 [<ffffffff8026d4e9>] dump_trace+0xaa/0x32a
 [<ffffffff8026d7aa>] show_trace+0x41/0x5c
 [<ffffffff8026d7da>] dump_stack+0x15/0x17
 [<ffffffff802920f9>] panic+0xaf/0x169
 [<ffffffff8021558e>] do_exit+0xb4/0x8b5
 [<ffffffff80268c12>] do_page_fault+0x786/0x814
 [<ffffffff80266acd>] error_exit+0x0/0x84
 [<ffffffff802ab0e9>] wakeup_next_waiter+0x35/0x19c
 [<ffffffff802654c9>] rt_spin_lock_slowunlock+0x3e/0x5c
 [<ffffffff80265f1d>] rt_spin_unlock+0x28/0x2a
 [<ffffffff8020a823>] kmem_cache_alloc+0xd1/0xe2
 [<ffffffff80226fde>] alloc_inode+0x2b/0x1a6
 [<ffffffff80241c9d>] new_inode+0x21/0xb9
 [<ffffffff8030f381>] sysfs_new_inode+0x1f/0x10d
 [<ffffffff8030f4b8>] sysfs_create+0x49/0xdd
 [<ffffffff80310a96>] create_dir+0xb3/0x1e2
 [<ffffffff80310c1e>] sysfs_create_dir+0x59/0x77
 [<ffffffff8034fef6>] kobject_shadow_add+0xf4/0x1af
 [<ffffffff8034ffbc>] kobject_add+0xb/0xd
 [<ffffffff80350108>] kobject_register+0x22/0x3e
 [<ffffffff8035046c>] kobject_add_dir+0x52/0x7b
 [<ffffffff8030d13f>] register_disk+0x13d/0x22c
 [<ffffffff80347ec7>] add_disk+0x3c/0x49
 [<ffffffff809a418b>] rd_init+0x140/0x19f
 [<ffffffff809879e5>] init+0x1c3/0x394
 [<ffffffff8025ff68>] child_rip+0xa/0x12

 On the same RHTS job this same system ibm-lewis-01.lab.boston.redhat.com ran
the 2.6.21-32.el5rtvanilla kernel without issue.

Comment 1 Steven Rostedt 2007-07-06 02:43:36 UTC

I spent a ton of time analyzing this bug. And unfortunately, I'm thinking that
this may not be solvable without finding a reproducable method.

Here's where I'm at...

We took the oops at the BUG_ON in:

rt_mutex_top_waiter(struct rt_mutex *lock)
{
	struct rt_mutex_waiter *w;

	w = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter,
			       list_entry);
	BUG_ON(w->lock != lock);

	return w;
}


But the BUG_ON test wasn't what took the oops. It was a bad pointer reference
due to the w->lock in the bug on. Which means we got a bogus pointer for w from
plist_first_entry.  The bad address was at 0x40 (or 64 dec), but using Arnaldo's
pahole program, I see that the lock is at offset 88 dec. Which means that the
pointer returned to us for w is -24 (88 - 64 = 24). Which looks correct from the
output:

0xffffffff802ab0e9 <wakeup_next_waiter+53>:     cmp    %r12,0x58(%r13)

0x58 == 88 dec

R13: ffffffffffffffe8 == -24.

Writing a simple program to see how this can happen:

int main (int argc, char **argv)
{
        struct rt_mutex *lock = NULL;
        struct rt_mutex_waiter *w;

        lock = malloc(sizeof(*lock));
        memset(lock, 0, sizeof(*lock));

        w = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter,
                               list_entry);

        printf("w=%p\n", w);
        return 0;
}

Returns:

    w=0xffffffffffffffe8

So somehow the plist in the lock became NULL.  One thought was a bad
initialization, or race with per_cpu structures, since this is a per_cpu_lock:

from __cache_alloc:
	slab_irq_restore(save_flags, this_cpu);

Where:
    # define slab_irq_restore(flags, cpu) \
	do { slab_irq_enable(cpu); (void) (flags); } while (0)

and:
   # define slab_irq_enable(cpu)	put_cpu_var_locked(slab_irq_locks, cpu)

Were we lock a percpu lock defined in the put_cpu_var_locked.

But this didn't make sense either. But this was on bootup and then I thought
that this could be a case where we entered the rt_spin_lock_slowunlock path
before initializing the plist. The plist is initialized the first time there is
contention, and if we get into the slowunlock path without initializing the
lock, we will crash.  So I thought there might be a bug with the initializing of
the lists, but I can't find a way to get into the slowpath without having the
lists first initialized.

Finally, I guess that something might have corrupted the lock. It could simply
touch the "owner" and on unlock we fail and go into the slowlock without ever
having the lists initialized.  Could be simply a case of some bad memory, that
hit us once. But this machine is used for several tests, so that doesn't seem to
be the case (but I'm still not ruling it out).  Or it simply could be something
unrelated that wrote over this memory causing the bug. In which case, we can not
debug this further with the information at hand.

So I am going to move this to NEED_INFO, since we are stuck at the moment until
we can reproduce this again.

Comment 2 Jeff Burke 2007-07-06 13:14:38 UTC

Opened a ticket with Engineering operations to have the memory on that system
verified.
https://engineering.redhat.com/rt3/Ticket/Display.html?id=13589

Comment 3 Konrad Rzeszutek 2007-07-09 17:23:26 UTC

Jeff,

Running memtest on it box halts it. With 2GB of RAM in the box (from the
original 6GB) the memtests works.

Comment 4 Jeff Burke 2007-07-09 18:09:28 UTC

Konrad,
   OK it sounds like there is in issue with the hardware. Are you planning on
replacing that memmory or are you going to leave that system at 2 Gig? 

Jeff

Comment 5 Konrad Rzeszutek 2007-07-30 14:27:14 UTC

The plan is to get a new planar and memory.

Comment 6 Tim Burke 2007-10-12 14:20:12 UTC

put into needinfo state till hw fixed

Comment 7 Jeff Burke 2007-10-15 20:23:53 UTC

Konrad,
   Any update on this issue? have the parts for
ibm-lewis-01.lab.boston.redhat.com come in?

Thanks Jeff

Comment 8 Konrad Rzeszutek 2007-10-15 21:22:14 UTC

I received new processors. No sign of the memory but then the tests were
inconclusive. I am going to aim to run the diagnostic BIOS test this week by
reserving it, but if I don't get to it, and you have the time to do it, please
go ahead and hit 'F2' during bootup - that will drop you in the BIOS diagnostic
code.

Comment 9 Clark Williams 2008-02-04 22:52:39 UTC

Where are we on this bug?

Comment 10 Jeff Burke 2008-02-06 15:38:48 UTC

Clark,
   Konrad is no longer with Red Hat(as an IBM on-site). I am going to add the
new X server IBM on-site contact.

Comment 11 Ed Pollard 2008-02-06 15:52:21 UTC

This blade is unreliable and we are attempting to get it repaired/replaced. My
apologies for this taking so long. I didn't know there was a bug waiting on it.

Comment 12 Clark Williams 2008-02-06 16:44:55 UTC

Good deal. We'll keep this bug open until we get a h/w change then see if the
problem is resolved. 

Thanks,
Clark

Comment 13 Clark Williams 2008-04-23 21:14:19 UTC

has the hardware been updated? Should we close this bug?

Clark

Comment 14 Steven Rostedt 2008-05-07 14:23:32 UTC

Is this bug been seen on a working box? If this failure is only on faulty
hardware then we should close this bug. Please test on another box and see if we
can still see this bug.

Comment 15 Jeff Burke 2008-05-07 20:07:27 UTC

I think we should wait until Ed gets that system fixed before we close this
issue. Although it has only been seen when booting on this system. It still may
be a "quirk" type issue in the RT kernel. But seems unlikely at this point.

Comment 16 Jeff Burke 2008-05-07 20:09:19 UTC

Ed Pollard
   Do you happen to have a status on this system?

Thanks,
Jeff

Comment 17 Ed Pollard 2008-05-19 17:34:20 UTC

That blade is unreliable unfortunately. I'm still trying to get a replacement
for it. Now that the IBM systems are mostly back up and running in the new lab I
will put more attention into this.

Comment 18 Beth Uptagrafft 2008-06-11 21:43:23 UTC

an update from Ed...
The system isn't available due to hardware issues. He has an open call with IBM
to get it repaired.

Comment 19 Clark Williams 2008-11-01 14:00:09 UTC

any update on h/w for this system?