Bug 151226 - stack overflow message is alarmist and confusing
stack overflow message is alarmist and confusing
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Ingo Molnar
Brian Brock
: 151295 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2005-03-15 23:35 EST by craig harmer
Modified: 2012-06-20 12:03 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-06-20 12:03:34 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description craig harmer 2005-03-15 23:35:19 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b)

Description of problem:
oh boy!  i know how this bug report is going to be received ...

the message produced by the linux kernel when we may be in danger of a
stack overflow looks like:

        do_IRQ: stack overflow: 4968
         [<c0106e0f>] dump_stack+0x16/0x18
         [<c0108934>] do_IRQ+0x4f/0x1b5
         [<c02dbe9c>] common_interrupt+0x18/0x20
         [<fa90cef9>] xted_unlockmap_check+0x18a/0x192 [vxfs]
         [<fa77a5f2>] vx_unlockmap+0x23/0x38b [vxfs]
         [<fa77a4b7>] vx_holdmap+0x177/0x17f [vxfs]
         [<fa755853>] vx_extmaptran+0x8b/0x96 [vxfs]
         [<fa755797>] vx_extmapchange+0x24e/0x27f [vxfs]
         [<fa75182a>] vx_extfind+0x3d5/0x3e1 [vxfs]
there are two problems here.  the first is that we haven't actually
overflowed, we're only at risk of overflowing.  the second is that the
stack traceback omits useful information for investigating the problem.
i'd like to the message look like something like:
        do_IRQ: stack overflow risk: 4968 bytes left
         [<c0106e0f>] [<0xd9277950>] dump_stack+0x16/0x18
         [<c0108934>] [<0xd9277a04>] do_IRQ+0x4f/0x1b5
where the initial message makes clear that we're at risk of stack
overflow with 4968 bytes left, but have not actually had a stack 

the stack "trace" includes the address in the stack where each
function call was found as an aid to estimating stack consumption of
each function. it's also useful when trying to decipher stack traces
and skip over stale symbols that appear in the stack, since if you
know that a particular function appears in the stack trace and the
approximate size of the stack frame of the function it's easier to
skip over stale symbols in the stack trace that lie within that area.

(suggestions for alternative formats are welcome).
the reason the message was produced with 4968 bytes left is that we've
"cranked up" both the stack size and the warning level in the kernels
we use internally at veritas.

the code change necessary to effect this change would be (as
pseudo-diffs since our code base is further modified):
arch/i386/kernel/irq.c in do_IRQ():
<                       printk("do_IRQ: stack overflow: %ld\n",
>                       printk("do_IRQ: stack overflow risk: %ld bytes
                                esp - sizeof(struct thread_info));
./arch/i386/kernel/traps.c in print_context_stack():

        while (valid_stack_ptr(tinfo, (void *)ebp)) {
                addr = *(unsigned long *)(ebp + 4);
<               printk(" [<%08lx>] ", addr);
>               printk(" [<%08lx>] [<%08lx>] ", addr, ebp + 4);
                print_symbol("%s", addr);
                ebp = *(unsigned long *)ebp;
        while (valid_stack_ptr(tinfo, stack)) {
                addr = *stack++;
                if (__kernel_text_address(addr)) {
<                       printk(" [<%08lx>]", addr);
>                       printk(" [<%08lx>] [<%08lx>]", addr, stack - 4);
                        print_symbol(" %s", addr);

this change only affects the x86 kernel.  do we need to produce a
similiar patch for other kernel architectures that Red Hat supports?

i have not investigated the effect of this output change on ksymoops.
 we would need to take that into account and choose a suitable format.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. load a driver that uses a large but not excessive amount of stack space
2. run the driver and wait for an interrupt to occur while the stack
is deep
3. wait for the customer to call customer support and try to explain
to them that they haven't had an actual stack overflow, just close.

Actual Results:  i see the output that i included at the beginning of
the description.

Expected Results:  i would have liked to see the output i included in
the middle of the description.

Additional info:

we'd like to see this changed, since it will help our debugging and
anyone else who stares at deep stack messages.  but it's not a hot
issue for us.
Comment 1 Dave Jones 2005-03-16 14:22:20 EST
*** Bug 151295 has been marked as a duplicate of this bug. ***
Comment 2 Eric Sandeen 2008-09-24 21:36:08 EDT
The other problem with this is that at least on 4k stacks, the stack overflow warning message pretty much ensures that you will *actually* overflow (it consumes the remaining stack, and more).  But I think there's another bug that is addressing this at least.

I agree that the stack warning message could be improved; it should probably get fixed upstream first.
Comment 3 Jiri Pallich 2012-06-20 12:03:34 EDT
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.

Note You need to log in before you can comment on or make changes to this bug.