From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050217 Description of problem: oh boy! i know how this bug report is going to be received ... the message produced by the linux kernel when we may be in danger of a stack overflow looks like: do_IRQ: stack overflow: 4968 [<c0106e0f>] dump_stack+0x16/0x18 [<c0108934>] do_IRQ+0x4f/0x1b5 [<c02dbe9c>] common_interrupt+0x18/0x20 [<fa90cef9>] xted_unlockmap_check+0x18a/0x192 [vxfs] [<fa77a5f2>] vx_unlockmap+0x23/0x38b [vxfs] [<fa77a4b7>] vx_holdmap+0x177/0x17f [vxfs] [<fa755853>] vx_extmaptran+0x8b/0x96 [vxfs] [<fa755797>] vx_extmapchange+0x24e/0x27f [vxfs] [<fa75182a>] vx_extfind+0x3d5/0x3e1 [vxfs] ... there are two problems here. the first is that we haven't actually overflowed, we're only at risk of overflowing. the second is that the stack traceback omits useful information for investigating the problem. i'd like to the message look like something like: do_IRQ: stack overflow risk: 4968 bytes left [<c0106e0f>] [<0xd9277950>] dump_stack+0x16/0x18 [<c0108934>] [<0xd9277a04>] do_IRQ+0x4f/0x1b5 ... where the initial message makes clear that we're at risk of stack overflow with 4968 bytes left, but have not actually had a stack overflow. the stack "trace" includes the address in the stack where each function call was found as an aid to estimating stack consumption of each function. it's also useful when trying to decipher stack traces and skip over stale symbols that appear in the stack, since if you know that a particular function appears in the stack trace and the approximate size of the stack frame of the function it's easier to skip over stale symbols in the stack trace that lie within that area. (suggestions for alternative formats are welcome). the reason the message was produced with 4968 bytes left is that we've "cranked up" both the stack size and the warning level in the kernels we use internally at veritas. the code change necessary to effect this change would be (as pseudo-diffs since our code base is further modified): arch/i386/kernel/irq.c in do_IRQ(): < printk("do_IRQ: stack overflow: %ld\n", > printk("do_IRQ: stack overflow risk: %ld bytes left\n", esp - sizeof(struct thread_info)); ./arch/i386/kernel/traps.c in print_context_stack(): #ifdef CONFIG_FRAME_POINTER while (valid_stack_ptr(tinfo, (void *)ebp)) { addr = *(unsigned long *)(ebp + 4); < printk(" [<%08lx>] ", addr); > printk(" [<%08lx>] [<%08lx>] ", addr, ebp + 4); print_symbol("%s", addr); printk("\n"); ebp = *(unsigned long *)ebp; } #else while (valid_stack_ptr(tinfo, stack)) { addr = *stack++; if (__kernel_text_address(addr)) { < printk(" [<%08lx>]", addr); > printk(" [<%08lx>] [<%08lx>]", addr, stack - 4); print_symbol(" %s", addr); printk("\n"); } this change only affects the x86 kernel. do we need to produce a similiar patch for other kernel architectures that Red Hat supports? i have not investigated the effect of this output change on ksymoops. we would need to take that into account and choose a suitable format. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-5.EL How reproducible: Always Steps to Reproduce: 1. load a driver that uses a large but not excessive amount of stack space 2. run the driver and wait for an interrupt to occur while the stack is deep 3. wait for the customer to call customer support and try to explain to them that they haven't had an actual stack overflow, just close. Actual Results: i see the output that i included at the beginning of the description. Expected Results: i would have liked to see the output i included in the middle of the description. Additional info: we'd like to see this changed, since it will help our debugging and anyone else who stares at deep stack messages. but it's not a hot issue for us.
*** This bug has been marked as a duplicate of 151226 ***