Problem Description: ======================== Release testing with debug kernel 2.6.24.7-108ibmrt2.1.01debug in ZPro resulted soft lockup as shown below. ============================================= [ INFO: possible recursive locking detected ] [ 2.6.24.7-108ibmrt2.1.01debug #1 --------------------------------------------- sirq-timer/1/19 is trying to acquire lock: ((raw_spinlock_t *)(&rq->lock)/1){....}, at: [<ffffffff810325f6>] double_lock_balance+0x61/0x6a but task is already holding lock: ((raw_spinlock_t *)(&rq->lock)/1){....}, at: [<ffffffff810325e2>] double_lock_balance+0x4d/0x6a other info that might help us debug this: 1 lock held by sirq-timer/1/19: #0: ((raw_spinlock_t *)(&rq->lock)/1){....}, at: [<ffffffff810325e2>] double_lock_balance+0x4d/0x6a stack backtrace: Pid: 19, comm: sirq-timer/1 Not tainted 2.6.24.7-108ibmrt2.1.01debug #1 Call Trace: [<ffffffff8106079d>] __lock_acquire+0x1ee/0xcdc [<ffffffff810325f6>] ? double_lock_balance+0x61/0x6a [<ffffffff81061319>] lock_acquire+0x8e/0xb2 [<ffffffff810325f6>] ? double_lock_balance+0x61/0x6a [<ffffffff812a655f>] __spin_lock_nested+0x38/0x69 [<ffffffff810325f6>] double_lock_balance+0x61/0x6a [<ffffffff812a2f20>] __schedule+0x2f9/0x7ee [<ffffffff8105ff96>] ? trace_hardirqs_on+0xd/0xf [<ffffffff812a376d>] schedule+0xe4/0x109 [<ffffffff810448b6>] ksoftirqd+0xb7/0x26c [<ffffffff8105ff5e>] ? trace_hardirqs_on_caller+0x11c/0x147 [<ffffffff810447ff>] ? ksoftirqd+0x0/0x26c [<ffffffff810537b6>] kthread+0x49/0x77 [<ffffffff8100d358>] child_rip+0xa/0x12 [<ffffffff8100ca43>] ? restore_args+0x0/0x30 [<ffffffff8105376d>] ? kthread+0x0/0x77 [<ffffffff8100d34e>] ? child_rip+0x0/0x12 INFO: lockdep is turned off. --------------------------- | preempt count: 00000003 ] | 3-level deep critical section nesting: ---------------------------------------- .. [<ffffffff812a2c59>] .... __schedule+0x32/0x7ee .....[<ffffffff812a376d>] .. ( <= schedule+0xe4/0x109) .. [<ffffffff812a65a4>] .... __spin_lock+0x14/0x5e .....[<ffffffff810325d5>] .. ( <= double_lock_balance+0x40/0x6a) .. [<ffffffff812a6543>] .... __spin_lock_nested+0x1c/0x69 .....[<ffffffff810325f6>] .. ( <= double_lock_balance+0x61/0x6a) warning: process `sysctl01' used the deprecated sysctl system call with 1.1. warning: process `sysctl01' used the deprecated sysctl system call with 1.2. warning: process `sysctl04' used the deprecated sysctl system call with warning: process `sysctl04' used the deprecated sysctl system call with warning: process `sysctl05' used the deprecated sysctl system call with 1.2. pthcli[20070]: segfault at 1 rip 3be7060757 rsp 7ffffec20e90 error 4 I believe, warnings below the error came from ltp tests as part of release testing which are not relevant for the bug. Hardware Used: elm3b160 (ZPro) Kernel Used: 2.6.24.7-108ibmrt2.1.01debug on RHEL5.2 =Comment: #1================================================= Gowrishankar Muthukrishnan <gowrishankar.m.com> - Looks like error came from one of the tests in ltp, by looking at timestamps inside /var/log/messages. Attaching below the complete report. Attachment: soft lockup found while running release testing with 2.6.24.7-108ibmrt2.1.01debug in ZPro =Comment: #3================================================= Gowrishankar Muthukrishnan <gowrishankar.m.com> - This error is being shown in e326 and x3455 also, recently tested. Attachment: recursive lock in e326 Attachment: recursive lock in x3455 =Comment: #6================================================= Gowrishankar Muthukrishnan <gowrishankar.m.com> - Looks like issue is getting common across platforms. I can see in LS21 as well and hence removing ZPro in summary. =Comment: #7================================================= Venkateswarara Jujjuri <jvrao.com> - > Looks like error came from one of the tests in ltp, by looking at > timestamps inside /var/log/messages. Hi Gowri, Do we know what test is this? Is this a new test in LTP? Do we have any data of this test's behavior on the previous kernels? (R2 etc?) If this test is a old test, any major changes to the test in the recent past?
Created attachment 342436 [details] recursive lock in x3455
Created attachment 342437 [details] soft lockup found while running release testing with 2.6.24.7-108ibmrt2.1.01debug in ZPro
Created attachment 342438 [details] recursive lock in e326
------- Comment From gowrishankar.m.com 2009-05-18 08:36 EDT------- Similar warning I have got in e326m as well. But it is slightly different from stack trace origin.. Just using the same bug assuming the fix for all platforms works in e326m as well. Or else I ll file new one.
Created attachment 344430 [details] warning at rt_read_slowunlock() in e326m ------- Comment on attachment From gowrishankar.m.com 2009-05-18 08:37 EDT------- warning at rt_read_slowunlock() in e326m
------- Comment From gowrishankar.m.com 2009-07-22 05:36 EDT------- I have verified with 2.6.29.5-26.el5rtdebug (mrg1.2 beta) in LS21 and the specified warning is not reproduced.
------- Comment From sripathik.com 2009-07-31 09:28 EDT------- (In reply to comment #23) > I have verified with 2.6.29.5-26.el5rtdebug (mrg1.2 beta) in LS21 and the > specified warning > is not reproduced. > Hmm... I guess we can close this as FIXED, then. Closing.