Bug 318261 - [RHEL5 RT] kernel BUG at kernel/rtmutex.c:659!
[RHEL5 RT] kernel BUG at kernel/rtmutex.c:659!
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
1.0
x86_64 Linux
high Severity low
: ---
: ---
Assigned To: Steven Rostedt
http://rhts.lab.boston.redhat.com/cgi...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-04 09:30 EDT by Jeff Burke
Modified: 2008-02-27 14:58 EST (History)
0 users

See Also:
Fixed In Version: 2.6.21-53
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-30 11:06:16 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
fix alternate_node_alloc this_cpu (1.88 KB, patch)
2007-11-26 15:20 EST, Steven Rostedt
no flags Details | Diff

  None (edit)
Description Jeff Burke 2007-10-04 09:30:47 EDT
Description of problem:
 While testing the rttracer kernel the system
ibm-wildhorse-01.rhts.boston.redhat.com had a kernel panic.

Version-Release number of selected component (if applicable):
 2.6.21-39.el5rttrace

How reproducible:
 Often

Steps to Reproduce:
1. Install RHEL5.1 tree RHEL5.1-Server-20070920.1 x86_64. Then install the
current rttrace kernel.
2. Reboot serveral times.
  
Actual results:
Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP: 
 [<ffffffff802abdd1>] wakeup_next_waiter+0x35/0x19c
PGD 0 
Oops: 0000 [1] PREEMPT SMP 
CPU 2 
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.21-39.el5rttrace #1
RIP: 0010:[<ffffffff802abdd1>]  [<ffffffff802abdd1>] wakeup_next_waiter+0x35/0x19c
RSP: 0000:ffff8100067c7d20  EFLAGS: 00010097
RAX: 0000000000000002 RBX: ffff810003d7d000 RCX: ffffffff8026667a
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff802abdc8
RBP: ffff8100067c7d50 R08: 0000000000000000 R09: ffffffff80a5f380
R10: ffff81013fc92040 R11: ffff81013fcf9940 R12: ffff810003d7d000
R13: ffffffffffffffe8 R14: ffff81007ff57e00 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff81013fcf9940(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000040 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff8100067c6000, task ffff81013fc92040)
Stack:  ffff81007ff57e00 ffff810003d7d000 0000000000000207 00000000000000d0
 ffff81007ff57e00 0000000000000001 ffff8100067c7d70 ffffffff8026580f
 ffff81013eda90c0 ffff81000679b080 ffff8100067c7d80 ffffffff802662dd
Call Trace:
 [<ffffffff8026580f>] rt_spin_lock_slowunlock+0x3e/0x5c
 [<ffffffff802662dd>] rt_spin_unlock+0x28/0x2a
 [<ffffffff8020a81d>] kmem_cache_alloc+0xd1/0xe2
 [<ffffffff803a877d>] con_insert_unipair+0x40/0xda
 [<ffffffff803a8b27>] con_set_default_unimap+0xbc/0x131
 [<ffffffff809abab0>] console_map_init+0x31/0x43
 [<ffffffff809abc58>] vty_init+0xf3/0xf7
 [<ffffffff809ab6bb>] tty_init+0x1c1/0x1c5
 [<ffffffff80990a03>] init+0x1c3/0x425
 [<ffffffff802601d8>] child_rip+0xa/0x12


Code: 4d 39 65 58 74 04 0f 0b eb fe 49 8d 74 24 08 4c 89 ef e8 a2 
RIP  [<ffffffff802abdd1>] wakeup_next_waiter+0x35/0x19c
 RSP <ffff8100067c7d20>
CR2: 0000000000000040
Kernel panic - not syncing: Attempted to kill init!

Call Trace:
 [<ffffffff8026dad8>] dump_trace+0xaa/0x32a
 [<ffffffff8026dd99>] show_trace+0x41/0x64
 [<ffffffff8026ddd1>] dump_stack+0x15/0x17
 [<ffffffff80292b69>] panic+0xaf/0x16e

Expected results:
 system should no panic on normal boot.

Additional info:
 The URL is a link to a test kernel that this was originally seen on but it was
reproduced with the standard rttrace kernel.
Comment 1 Steven Rostedt 2007-11-26 15:12:24 EST
This is a nasty bug, and happens to be fixed upstream.

The cause of this bug was alternate_node_alloc would use its own this_cpu
variable. kmem_cache_alloc would grab the per_cpu slab lock with its own
this_cpu, and then call alternate_node_alloc. This would then pass its own
this_cpu to cache_grow, which would unlock and lock the per_cpu slab lock. If we
happen to change CPUS while this happened, we would be locking and unlocking the
wrong locks.

I'll attach a patch to fix this.
Comment 2 Steven Rostedt 2007-11-26 15:20:49 EST
Created attachment 269261 [details]
fix alternate_node_alloc this_cpu

patch to replace the local this_cpu from alternate_node_alloc to a cpu pointer
that is passed in. This will allow the proper slab locks from being locked and 

unlocked.
Comment 3 Clark Williams 2007-11-30 11:06:16 EST
Fixed in 2..6.21-53

Note You need to log in before you can comment on or make changes to this bug.