448574 – [MRG] Hit BUG: MAX_STACK_TRACE_ENTRIES too low! when booting kernel-rt-debug-2.6.24.4-32ibmrt2.2

Bug 448574 - [MRG] Hit BUG: MAX_STACK_TRACE_ENTRIES too low! when booting kernel-rt-debug-2.6.24.4-32ibmrt2.2

Summary: [MRG] Hit BUG: MAX_STACK_TRACE_ENTRIES too low! when booting kernel-rt-debug-...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	realtime-kernel
Sub Component:
Version:	beta
Hardware:	x86_64
OS:	All
Priority:	low
Severity:	medium
Target Milestone:	1.0.3
Target Release:	---
Assignee:	Red Hat Real Time Maintenance
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-05-27 17:40 UTC by IBM Bug Proxy
Modified:	2008-10-07 19:21 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-10-07 19:21:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
The "dmesg" output showing the BUG() and resulting stack trace (55.61 KB, text/plain) 2008-05-27 17:41 UTC, IBM Bug Proxy	no flags	Details
Patch to increase MAX_STACK_TRACE_ENTRIES (722 bytes, patch) 2008-08-19 16:34 UTC, Clark Williams	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	44149	0	None	None	None	Never
Red Hat Product Errata	RHSA-2008:0857	0	normal	SHIPPED_LIVE	Important: kernel security and bug fix update	2008-10-07 19:18:59 UTC

Description IBM Bug Proxy 2008-05-27 17:40:57 UTC

=Comment: #0=================================================
TIMOTHY R. CHAVEZ <chavezt.com> - 2008-04-16 16:12 EDT
Problem description:
During a test-boot of a diskless LS21 using the 2.6.24.4-32ibmrt2.2debug kernel
with a modified LSI MPP/RDAC driver, I got a "BUG: MAX_STACK_TRACE_ENTRIES too
low!" followed by a stack trace (attached).  With the exception of the RDAC
driver, the 2.6.24.4-32ibmrt2.2debug kernel is effectively the same kernel as
the standard MRG 2.6.24.4-32debug kernel (no custom patches).  However, it
should be noted that a standard MRG 2.6.24.5-32debut kernel has not been
test-booted, yet.  The machine does not hang and appears to be operational /
responsive.

If this is not an installation problem,
       Describe any custom patches installed.

No custom patches applied to kernel.  However, a custom LSI/MPP RDAC driver was
built and installed for this kernel.

       Provide output from "uname -a", if possible:

Linux elm3c31 2.6.24.4-32ibmrt2.2debug #1 SMP PREEMPT RT Wed Apr 16 00:47:35 EDT
2008 x86_64 x86_64 x86_64 GNU/Linux


Hardware Environment
    Machine type (p650, x235, SF2, etc.): LS21
    Cpu type (Power4, Power5, IA-64, etc.): Dual-Core AMD Opteron(tm) Processor
    Describe any special hardware you think might be relevant to this problem:
Possibly the dual QLogic 4GB HBA cards attached the machine(?), but I've not
test-booted the debug kernel on any other configuration, so...


Please provide contact information if the submitter is not the primary contact.
tim.chavez.ibm.com
512-838-1317


Is this reproducible? Yes
    If so, how long does it (did it) take to reproduce it?
    Describe the steps:
Boot the system with the 2.6.24.4-32ibmrt2.2debug kernel.

Is the system (not just the application) hung? No
    If so, describe how you determined this:



Additional information:

This environment is an LS21 attached to a DS4700 via a couple QLogic 4GB HBA
cards (thus the need for RDAC) and has no local storage.
=Comment: #1=================================================
TIMOTHY R. CHAVEZ <chavezt.com> - 2008-04-16 16:22 EDT

The "dmesg" output showing the BUG() and resulting stack trace

=Comment: #2=================================================
TIMOTHY R. CHAVEZ <chavezt.com> - 2008-04-16 17:29 EDT
I booted the vanilla, trace, and rt kernels on this same system / hardware
configuration without hitting this bug.  I'll attempt to boot the debug kernel
on a system with a local storage configuration tomorrow morning and report my
findings.
=Comment: #3=================================================
TIMOTHY R. CHAVEZ <chavezt.com> - 2008-04-22 10:44 EDT
Just a note,

Red Hat also seeing this in testing

From Clark Williams @ Red Hat:

This isn't a CONFIG_ option. Its a value defined in lockdep_internals.h and
currently is defined as:

#define MAX_STACK_TRACE_ENTRIES 262144UL

That's pretty big...

Comment 1 IBM Bug Proxy 2008-05-27 17:41:02 UTC

Created attachment 306805 [details]
The &quot;dmesg&quot; output showing the BUG() and resulting stack trace

Comment 2 IBM Bug Proxy 2008-06-18 23:08:41 UTC

------- Comment From jstultz.com 2008-06-18 19:02 EDT-------
Has this issue been seen recently?

Comment 3 IBM Bug Proxy 2008-06-30 16:08:45 UTC

------- Comment From chavezt.com 2008-06-30 12:00 EDT-------
I haven't see it, but then again, I haven't been booting from SAN recently.
Maybe Keith has seen it?  I'm adding him to CC list.

Comment 4 IBM Bug Proxy 2008-08-04 11:00:33 UTC

I have seen this problem on a non-SAN machine while trying to recreate bug
46204. The system took a really long time (45 minutes) to come up. BUG message
seen was:

BUG: MAX_STACK_TRACE_ENTRIES too low!
turning off the locking correctness validator.
Pid: 2112, comm: ip Not tainted 2.6.24.7-74ibmrt2.5debug #1

Call Trace:
[<ffffffff810146b5>] ? save_stack_trace+0x2a/0x49
[<ffffffff8105d851>] save_trace+0x93/0x9b
[<ffffffff8105d8d7>] add_lock_to_list+0x7e/0xac
[<ffffffff81060eb9>] __lock_acquire+0xb43/0xcdc
[<ffffffff81067443>] ? rt_mutex_slowtrylock+0x18/0x85
[<ffffffff810610e0>] lock_acquire+0x8e/0xb2
[<ffffffff81067443>] ? rt_mutex_slowtrylock+0x18/0x85
[<ffffffff812a7bd2>] __spin_lock_irqsave+0x40/0x73
[<ffffffff81067443>] rt_mutex_slowtrylock+0x18/0x85
[<ffffffff812a5694>] rt_mutex_trylock+0x9/0xb
[<ffffffff812a7105>] rt_spin_lock+0x31/0x56
[<ffffffff8127e194>] ip_mc_inc_group+0x176/0x232
[<ffffffff8127e296>] ip_mc_up+0x46/0x64
[<ffffffff81279947>] inetdev_event+0x263/0x470
[<ffffffff810882d0>] ? __rcu_read_unlock+0x8c/0x95
[<ffffffff812aa943>] notifier_call_chain+0x33/0x5b
[<ffffffff81058001>] __raw_notifier_call_chain+0x9/0xb
[<ffffffff81058012>] raw_notifier_call_chain+0xf/0x11
[<ffffffff812305c2>] call_netdevice_notifiers+0x16/0x18
[<ffffffff81231f2f>] dev_open+0x80/0x88
[<ffffffff8123072f>] dev_change_flags+0xaf/0x16b
[<ffffffff81279ee3>] devinet_ioctl+0x267/0x5f2
[<ffffffff8127a686>] inet_ioctl+0x82/0xa0
[<ffffffff81223dc1>] sock_ioctl+0x1e7/0x20c
[<ffffffff810ce955>] do_ioctl+0x2d/0x83
[<ffffffff810cec20>] vfs_ioctl+0x275/0x292
[<ffffffff812a6a5b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff810cec94>] sys_ioctl+0x57/0x7b
[<ffffffff8100c248>] ? system_call+0xb8/0xef
[<ffffffff8100c27f>] system_call_ret+0x0/0x6d

INFO: lockdep is turned off.
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff812a7bb5>] .... __spin_lock_irqsave+0x23/0x73
.....[<ffffffff81067443>] ..   ( <= rt_mutex_slowtrylock+0x18/0x85)

Comment 5 IBM Bug Proxy 2008-08-19 15:21:34 UTC

While working on bug #46204 (RH459478), Peter Zijlstra suggested trying a few
patches recently committed to Linus' tree to see if it helps solve this problem.
They did not. Then, he asked me to try higher values of MAX_STACK_TRACE_ENTRIES.
I changed MAX_STACK_TRACE_ENTRIES to 1.25 times (327680) it's
current value and I still saw the problem. When I made it 1.5 times
(393216), I did not see the problem. I have reported these to Peter in an e-mail
as well. He needs to decide whether it is okay to increase this value.

Comment 6 Clark Williams 2008-08-19 16:34:02 UTC

Created attachment 314559 [details]
Patch to increase MAX_STACK_TRACE_ENTRIES

Added patch to increase MAX_STACK_TRACE_ENTRIES by 1.5 (to 393216) as per Sripathi's tests. This should go into our -78 kernel build

Comment 8 David Sommerseth 2008-09-26 09:13:42 UTC

Verified that patch (https://bugzilla.redhat.com/attachment.cgi?id=314559) is implemented as mrg-rt.git commit 842bb285febde3ae296de13c8c50da52e56878f7.  Available in mrg-rt-2.6.24.7-81.

Bug reproduced using 2.6.24.7-74 and went away with 2.6.24.7-81.

Comment 10 errata-xmlrpc 2008-10-07 19:21:36 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0857.html

Note You need to log in before you can comment on or make changes to this bug.