Bug 151295 - stack overflow message is alarmist and confusing
stack overflow message is alarmist and confusing
Status: CLOSED DUPLICATE of bug 151226
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity low
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-03-16 14:20 EST by craig harmer
Modified: 2015-01-04 17:17 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-03-16 14:22:06 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description craig harmer 2005-03-16 14:20:13 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050217

Description of problem:
oh boy!  i know how this bug report is going to be received ...

the message produced by the linux kernel when we may be in danger of a stack
overflow looks like:

        do_IRQ: stack overflow: 4968
         [<c0106e0f>] dump_stack+0x16/0x18
         [<c0108934>] do_IRQ+0x4f/0x1b5
         [<c02dbe9c>] common_interrupt+0x18/0x20
         [<fa90cef9>] xted_unlockmap_check+0x18a/0x192 [vxfs]
         [<fa77a5f2>] vx_unlockmap+0x23/0x38b [vxfs]
         [<fa77a4b7>] vx_holdmap+0x177/0x17f [vxfs]
         [<fa755853>] vx_extmaptran+0x8b/0x96 [vxfs]
         [<fa755797>] vx_extmapchange+0x24e/0x27f [vxfs]
         [<fa75182a>] vx_extfind+0x3d5/0x3e1 [vxfs]
         ...
                                                                                
there are two problems here.  the first is that we haven't actually overflowed,
we're only at risk of overflowing.  the second is that the stack traceback omits
useful information for investigating the problem.
                                                                                
i'd like to the message look like something like:
                                                                                
        do_IRQ: stack overflow risk: 4968 bytes left
         [<c0106e0f>] [<0xd9277950>] dump_stack+0x16/0x18
         [<c0108934>] [<0xd9277a04>] do_IRQ+0x4f/0x1b5
         ...
                                                                                
where the initial message makes clear that we're at risk of stack overflow with
4968 bytes left, but have not actually had a stack  overflow.

the stack "trace" includes the address in the stack where each function call was
found as an aid to estimating stack consumption of each function. it's also
useful when trying to decipher stack traces and skip over stale symbols that
appear in the stack, since if you know that a particular function appears in the
stack trace and the approximate size of the stack frame of the function it's
easier to skip over stale symbols in the stack trace that lie within that area.

(suggestions for alternative formats are welcome).
                                                                                
the reason the message was produced with 4968 bytes left is that we've "cranked
up" both the stack size and the warning level in the kernels we use internally
at veritas.

the code change necessary to effect this change would be (as pseudo-diffs since
our code base is further modified):
                                                                                
arch/i386/kernel/irq.c in do_IRQ():
<                       printk("do_IRQ: stack overflow: %ld\n",
>                       printk("do_IRQ: stack overflow risk: %ld bytes left\n",
                                esp - sizeof(struct thread_info));
                                                                                
./arch/i386/kernel/traps.c in print_context_stack():

#ifdef  CONFIG_FRAME_POINTER
        while (valid_stack_ptr(tinfo, (void *)ebp)) {
                addr = *(unsigned long *)(ebp + 4);
<               printk(" [<%08lx>] ", addr);
>               printk(" [<%08lx>] [<%08lx>] ", addr, ebp + 4);
                print_symbol("%s", addr);
                printk("\n");
                ebp = *(unsigned long *)ebp;
        }
#else
        while (valid_stack_ptr(tinfo, stack)) {
                addr = *stack++;
                if (__kernel_text_address(addr)) {
<                       printk(" [<%08lx>]", addr);
>                       printk(" [<%08lx>] [<%08lx>]", addr, stack - 4);
                        print_symbol(" %s", addr);
                        printk("\n");
                }

this change only affects the x86 kernel.  do we need to produce a similiar patch
for other kernel architectures that Red Hat supports?

i have not investigated the effect of this output change on ksymoops.  we would
need to take that into account and choose a suitable format.
                                                                                
                                                                    


Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1. load a driver that uses a large but not excessive amount of stack space
2. run the driver and wait for an interrupt to occur while the stack is deep
3. wait for the customer to call customer support and try to explain to them
that they haven't had an actual stack overflow, just close.
    

Actual Results:  i see the output that i included at the beginning of the
description.

Expected Results:  i would have liked to see the output i included in the middle
of the description.

Additional info:

we'd like to see this changed, since it will help our debugging and anyone else
who stares at deep stack messages.  but it's not a hot issue for us.
Comment 1 Dave Jones 2005-03-16 14:22:06 EST

*** This bug has been marked as a duplicate of 151226 ***

Note You need to log in before you can comment on or make changes to this bug.