Bug 448574 - [MRG] Hit BUG: MAX_STACK_TRACE_ENTRIES too low! when booting kernel-rt-debug-2.6.24.4-32ibmrt2.2
[MRG] Hit BUG: MAX_STACK_TRACE_ENTRIES too low! when booting kernel-rt-debug-...
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
beta
x86_64 All
low Severity medium
: 1.0.3
: ---
Assigned To: Red Hat Real Time Maintenance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-27 13:40 EDT by IBM Bug Proxy
Modified: 2008-10-07 15:21 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-10-07 15:21:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
The "dmesg" output showing the BUG() and resulting stack trace (55.61 KB, text/plain)
2008-05-27 13:41 EDT, IBM Bug Proxy
no flags Details
Patch to increase MAX_STACK_TRACE_ENTRIES (722 bytes, patch)
2008-08-19 12:34 EDT, Clark Williams
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 44149 None None None Never

  None (edit)
Description IBM Bug Proxy 2008-05-27 13:40:57 EDT
=Comment: #0=================================================
TIMOTHY R. CHAVEZ <chavezt@us.ibm.com> - 2008-04-16 16:12 EDT
Problem description:
During a test-boot of a diskless LS21 using the 2.6.24.4-32ibmrt2.2debug kernel
with a modified LSI MPP/RDAC driver, I got a "BUG: MAX_STACK_TRACE_ENTRIES too
low!" followed by a stack trace (attached).  With the exception of the RDAC
driver, the 2.6.24.4-32ibmrt2.2debug kernel is effectively the same kernel as
the standard MRG 2.6.24.4-32debug kernel (no custom patches).  However, it
should be noted that a standard MRG 2.6.24.5-32debut kernel has not been
test-booted, yet.  The machine does not hang and appears to be operational /
responsive.

If this is not an installation problem,
       Describe any custom patches installed.

No custom patches applied to kernel.  However, a custom LSI/MPP RDAC driver was
built and installed for this kernel.

       Provide output from "uname -a", if possible:

Linux elm3c31 2.6.24.4-32ibmrt2.2debug #1 SMP PREEMPT RT Wed Apr 16 00:47:35 EDT
2008 x86_64 x86_64 x86_64 GNU/Linux


Hardware Environment
    Machine type (p650, x235, SF2, etc.): LS21
    Cpu type (Power4, Power5, IA-64, etc.): Dual-Core AMD Opteron(tm) Processor
    Describe any special hardware you think might be relevant to this problem:
Possibly the dual QLogic 4GB HBA cards attached the machine(?), but I've not
test-booted the debug kernel on any other configuration, so...


Please provide contact information if the submitter is not the primary contact.
tim.chavez@linux.vnet.ibm.com
512-838-1317


Is this reproducible? Yes
    If so, how long does it (did it) take to reproduce it?
    Describe the steps:
Boot the system with the 2.6.24.4-32ibmrt2.2debug kernel.

Is the system (not just the application) hung? No
    If so, describe how you determined this:



Additional information:

This environment is an LS21 attached to a DS4700 via a couple QLogic 4GB HBA
cards (thus the need for RDAC) and has no local storage.
=Comment: #1=================================================
TIMOTHY R. CHAVEZ <chavezt@us.ibm.com> - 2008-04-16 16:22 EDT

The "dmesg" output showing the BUG() and resulting stack trace

=Comment: #2=================================================
TIMOTHY R. CHAVEZ <chavezt@us.ibm.com> - 2008-04-16 17:29 EDT
I booted the vanilla, trace, and rt kernels on this same system / hardware
configuration without hitting this bug.  I'll attempt to boot the debug kernel
on a system with a local storage configuration tomorrow morning and report my
findings.
=Comment: #3=================================================
TIMOTHY R. CHAVEZ <chavezt@us.ibm.com> - 2008-04-22 10:44 EDT
Just a note,

Red Hat also seeing this in testing

From Clark Williams @ Red Hat:

This isn't a CONFIG_ option. Its a value defined in lockdep_internals.h and
currently is defined as:

#define MAX_STACK_TRACE_ENTRIES 262144UL

That's pretty big...
Comment 1 IBM Bug Proxy 2008-05-27 13:41:02 EDT
Created attachment 306805 [details]
The &quot;dmesg&quot; output showing the BUG() and resulting stack trace
Comment 2 IBM Bug Proxy 2008-06-18 19:08:41 EDT
------- Comment From jstultz@us.ibm.com 2008-06-18 19:02 EDT-------
Has this issue been seen recently?
Comment 3 IBM Bug Proxy 2008-06-30 12:08:45 EDT
------- Comment From chavezt@us.ibm.com 2008-06-30 12:00 EDT-------
I haven't see it, but then again, I haven't been booting from SAN recently.
Maybe Keith has seen it?  I'm adding him to CC list.
Comment 4 IBM Bug Proxy 2008-08-04 07:00:33 EDT
I have seen this problem on a non-SAN machine while trying to recreate bug
46204. The system took a really long time (45 minutes) to come up. BUG message
seen was:

BUG: MAX_STACK_TRACE_ENTRIES too low!
turning off the locking correctness validator.
Pid: 2112, comm: ip Not tainted 2.6.24.7-74ibmrt2.5debug #1

Call Trace:
[<ffffffff810146b5>] ? save_stack_trace+0x2a/0x49
[<ffffffff8105d851>] save_trace+0x93/0x9b
[<ffffffff8105d8d7>] add_lock_to_list+0x7e/0xac
[<ffffffff81060eb9>] __lock_acquire+0xb43/0xcdc
[<ffffffff81067443>] ? rt_mutex_slowtrylock+0x18/0x85
[<ffffffff810610e0>] lock_acquire+0x8e/0xb2
[<ffffffff81067443>] ? rt_mutex_slowtrylock+0x18/0x85
[<ffffffff812a7bd2>] __spin_lock_irqsave+0x40/0x73
[<ffffffff81067443>] rt_mutex_slowtrylock+0x18/0x85
[<ffffffff812a5694>] rt_mutex_trylock+0x9/0xb
[<ffffffff812a7105>] rt_spin_lock+0x31/0x56
[<ffffffff8127e194>] ip_mc_inc_group+0x176/0x232
[<ffffffff8127e296>] ip_mc_up+0x46/0x64
[<ffffffff81279947>] inetdev_event+0x263/0x470
[<ffffffff810882d0>] ? __rcu_read_unlock+0x8c/0x95
[<ffffffff812aa943>] notifier_call_chain+0x33/0x5b
[<ffffffff81058001>] __raw_notifier_call_chain+0x9/0xb
[<ffffffff81058012>] raw_notifier_call_chain+0xf/0x11
[<ffffffff812305c2>] call_netdevice_notifiers+0x16/0x18
[<ffffffff81231f2f>] dev_open+0x80/0x88
[<ffffffff8123072f>] dev_change_flags+0xaf/0x16b
[<ffffffff81279ee3>] devinet_ioctl+0x267/0x5f2
[<ffffffff8127a686>] inet_ioctl+0x82/0xa0
[<ffffffff81223dc1>] sock_ioctl+0x1e7/0x20c
[<ffffffff810ce955>] do_ioctl+0x2d/0x83
[<ffffffff810cec20>] vfs_ioctl+0x275/0x292
[<ffffffff812a6a5b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff810cec94>] sys_ioctl+0x57/0x7b
[<ffffffff8100c248>] ? system_call+0xb8/0xef
[<ffffffff8100c27f>] system_call_ret+0x0/0x6d

INFO: lockdep is turned off.
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff812a7bb5>] .... __spin_lock_irqsave+0x23/0x73
.....[<ffffffff81067443>] ..   ( <= rt_mutex_slowtrylock+0x18/0x85)
Comment 5 IBM Bug Proxy 2008-08-19 11:21:34 EDT
While working on bug #46204 (RH459478), Peter Zijlstra suggested trying a few
patches recently committed to Linus' tree to see if it helps solve this problem.
They did not. Then, he asked me to try higher values of MAX_STACK_TRACE_ENTRIES.
I changed MAX_STACK_TRACE_ENTRIES to 1.25 times (327680) it's
current value and I still saw the problem. When I made it 1.5 times
(393216), I did not see the problem. I have reported these to Peter in an e-mail
as well. He needs to decide whether it is okay to increase this value.
Comment 6 Clark Williams 2008-08-19 12:34:02 EDT
Created attachment 314559 [details]
Patch to increase MAX_STACK_TRACE_ENTRIES

Added patch to increase MAX_STACK_TRACE_ENTRIES by 1.5 (to 393216) as per Sripathi's tests. This should go into our -78 kernel build
Comment 8 David Sommerseth 2008-09-26 05:13:42 EDT
Verified that patch (https://bugzilla.redhat.com/attachment.cgi?id=314559) is implemented as mrg-rt.git commit 842bb285febde3ae296de13c8c50da52e56878f7.  Available in mrg-rt-2.6.24.7-81.

Bug reproduced using 2.6.24.7-74 and went away with 2.6.24.7-81.
Comment 10 errata-xmlrpc 2008-10-07 15:21:36 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0857.html

Note You need to log in before you can comment on or make changes to this bug.