From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030611 Description of problem: If a kernel crash occurs while running on any x86_64 exception stack other than the NMI exception stack, the "bt" command fails to find the proper starting point of the back trace. Version-Release number of selected component (if applicable): crash 4.0-2.15 How reproducible: Always Steps to Reproduce: 1. Looking at an x86_64 crash that has occurred on an exception stack other than the NMI exception stack, run the "bt" command on that panic task. Actual Results: The "bt" output will either show an invalid (stale) backtrace, or will indicate: "bt: cannot determine starting stack pointer" Expected Results: The "bt" output should show the backtrace activity on the relevant exception stack, including the linkage back to the process stack. Additional info: This bug has been fixed in the upstream version of the crash utility.
Here is an example of a crash that occurred on the STACKFAULT exception stack, using crash version 4.0-2.15, and the output of "bt" on the panicking task: # crash vmlinux-2.6.9-22.0.1.EL.TEST.81052.1smp carmen_vmcore_5 crash 4.0-2.15 Copyright (C) 2002, 2003, 2004, 2005 Red Hat, Inc. Copyright (C) 2004, 2005 IBM Corporation Copyright (C) 1999-2005 Hewlett-Packard Co Copyright (C) 2005 Fujitsu Limited Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: vmlinux-2.6.9-22.0.1.EL.TEST.81052.1smp DUMPFILE: carmen_vmcore_5 CPUS: 4 DATE: Fri Nov 18 01:34:07 2005 UPTIME: 4 days, 12:36:02 LOAD AVERAGE: 23.73, 23.70, 18.17 TASKS: 216 NODENAME: livingston RELEASE: 2.6.9-22.0.1.EL.TEST.81052.1smp VERSION: #1 SMP Fri Nov 11 12:32:07 EST 2005 MACHINE: x86_64 (3591 Mhz) MEMORY: 7 GB PANIC: "" PID: 17108 COMMAND: "pdflush" TASK: 101b72f17f0 [THREAD_INFO: 100be7a0000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> bt PID: 17108 TASK: 101b72f17f0 CPU: 0 COMMAND: "pdflush" #0 [100be7a1ac8] schedule at ffffffff80304536 #1 [100be7a1fd8] kernel_thread at ffffffff80110c9b crash> This has been fixed in the "upstream" version of crash, version 4.0-2.16. This description is from the crash changelog file: 4.0-2.16 Fix for the x86_64 backtrace code to search all of the exception stacks for the origin of the active tasks' backtrace when the information is not available in the dumpfile header. Up until now, the search was made in the process stack, the per-cpu IRQ stack, and the per-cpu NMI exception stack; this patch looks at all 3 exception stacks in 2.4 kernels (NMI, STACKFAULT and DOUBLEFAULT), and all 5 exception stacks in 2.6 kernels (NMI, STACKFAULT, DOUBLEFAULT, DEBUG and MCE). And when running crash on the same vmcore as above, the proper trace is shown: # crash vmlinux-2.6.9-22.0.1.EL.TEST.81052.1smp carmen_vmcore_5 crash 4.0-2.16 Copyright (C) 2002, 2003, 2004, 2005 Red Hat, Inc. Copyright (C) 2004, 2005 IBM Corporation Copyright (C) 1999-2005 Hewlett-Packard Co Copyright (C) 2005 Fujitsu Limited Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: vmlinux-2.6.9-22.0.1.EL.TEST.81052.1smp DUMPFILE: carmen_vmcore_5 CPUS: 4 DATE: Fri Nov 18 01:34:07 2005 UPTIME: 4 days, 12:36:02 LOAD AVERAGE: 23.73, 23.70, 18.17 TASKS: 216 NODENAME: livingston RELEASE: 2.6.9-22.0.1.EL.TEST.81052.1smp VERSION: #1 SMP Fri Nov 11 12:32:07 EST 2005 MACHINE: x86_64 (3591 Mhz) MEMORY: 7 GB PANIC: "" PID: 17108 COMMAND: "pdflush" TASK: 101b72f17f0 [THREAD_INFO: 100be7a0000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> bt PID: 17108 TASK: 101b72f17f0 CPU: 0 COMMAND: "pdflush" #0 [ffffffff8044d3b0] start_disk_dump at ffffffffa013c28f #1 [ffffffff8044d3e0] try_crashdump at ffffffff8014a8be #2 [ffffffff8044d3f0] die at ffffffff8011195c #3 [ffffffff8044d410] do_stack_segment at ffffffff8011208e #4 [ffffffff8044d450] stack_segment at ffffffff80111101 [exception RIP: origin_map+375] RIP: ffffffffa008f05a RSP: 00000100be7a18d8 RFLAGS: 00010202 RAX: 6b6b6b6b6b6b6b2b RBX: 00000101b69cba28 RCX: 0000000300000000 RDX: ffffffffa008e9aa RSI: 0000000000000246 RDI: 0000000000000001 RBP: 6b6b6b6b6b6b6b2b R8: 0000010018480fc8 R9: 0000010153bbfc20 R10: 0000000000000246 R11: 0000000000000246 R12: 00000100607df1f0 R13: 0000010153bbfc20 R14: 0000010197dd3f18 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <exception stack> --- #5 [100be7a18d8] origin_map at ffffffffa008f05a #6 [100be7a1960] __map_bio at ffffffffa003c1f6 #7 [100be7a1990] __split_bio at ffffffffa003c4b7 #8 [100be7a19e0] __down_read at ffffffff80304d42 #9 [100be7a1a20] dm_request at ffffffffa003c76e #10 [100be7a1a40] generic_make_request at ffffffff8024abd2 #11 [100be7a1a60] recalc_task_prio at ffffffff801313c1 #12 [100be7a1a90] submit_bio at ffffffff8024acde #13 [100be7a1ac0] bio_alloc at ffffffff8017c708 #14 [100be7a1af0] submit_bh at ffffffff8017a63a #15 [100be7a1b20] __block_write_full_page at ffffffff8017b511 #16 [100be7a1b70] ext3_ordered_writepage at ffffffffa0063b46 #17 [100be7a1ba0] mpage_writepages at ffffffff80197851 #18 [100be7a1c80] thread_return at ffffffff80304560 #19 [100be7a1d50] dm_table_any_congested at ffffffffa003e358 #20 [100be7a1db0] __writeback_single_inode at ffffffff80196681 #21 [100be7a1df0] sync_sb_inodes at ffffffff80196d0e #22 [100be7a1e30] writeback_inodes at ffffffff80196fa5 #23 [100be7a1e50] background_writeout at ffffffff8015d828 #24 [100be7a1ed0] pdflush at ffffffff8015e358 #25 [100be7a1f20] kthread at ffffffff8014a133 #26 [100be7a1f50] kernel_thread at ffffffff80110ca3 crash> Note that the backtrace starts on the STACKFAULT exception stack, and transitions back to the process stack.
This issue is on Red Hat Engineering's list of planned work items for the upcoming Red Hat Enterprise Linux 4.4 release. Engineering resources have been assigned and barring unforeseen circumstances, Red Hat intends to include this item in the 4.4 release.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0478.html