Bug 178701

Summary: "bt" fails to find backtrace on x86_64 exception stacks
Product: Red Hat Enterprise Linux 3 Reporter: Dave Anderson <anderson>
Component: crashAssignee: Dave Anderson <anderson>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0457 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-20 14:54:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 178694    
Bug Blocks: 181405    

Description Dave Anderson 2006-01-23 16:45:27 UTC
+++ This bug was initially created as a clone of Bug #178694 +++

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030611

Description of problem:

If a kernel crash occurs while running on any x86_64 exception stack
other than the NMI exception stack, the "bt" command fails to find
the proper starting point of the back trace.

Version-Release number of selected component (if applicable):
crash 4.0-2.15

How reproducible:
Always

Steps to Reproduce:
1. Looking at an x86_64 crash that has occurred on an exception stack other
than the NMI exception stack, run the "bt" command on that panic task.

  

Actual Results:  
The "bt" output will either show an invalid (stale) backtrace, or
will indicate: "bt: cannot determine starting stack pointer"

Expected Results:  
The "bt" output should show the backtrace activity on the relevant
exception stack, including the linkage back to the process stack.

Additional info:


This bug has been fixed in the upstream version of the crash utility.

Comment 1 Dave Anderson 2006-01-23 17:03:53 UTC
This is the complete description, taken from clone RHEL4 bugzilla #178694.

The example is from a RHEL4 vmcore, but the crash utility code paths
used are identical.  The only difference is that RHEL3 uses 
NMI, STACKFAULT and DOUBLEFAULT exception stacks, while RHEL4
uses NMI, STACKFAULT, DOUBLEFAULT plus the DEBUG and MCE exception stacks.
The example below is from a crash occuring on a STACKFAULT exception
stack, common to both RHEL3 and RHEL4:

------------------------------------------------------------------------

Here is an example of a crash that occurred on the STACKFAULT
exception stack, using crash version 4.0-2.15, and the output
of "bt" on the panicking task:

# crash vmlinux-2.6.9-22.0.1.EL.TEST.81052.1smp carmen_vmcore_5

crash 4.0-2.15
Copyright (C) 2002, 2003, 2004, 2005  Red Hat, Inc.
Copyright (C) 2004, 2005  IBM Corporation
Copyright (C) 1999-2005  Hewlett-Packard Co
Copyright (C) 2005  Fujitsu Limited
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: vmlinux-2.6.9-22.0.1.EL.TEST.81052.1smp
    DUMPFILE: carmen_vmcore_5
        CPUS: 4
        DATE: Fri Nov 18 01:34:07 2005
      UPTIME: 4 days, 12:36:02
LOAD AVERAGE: 23.73, 23.70, 18.17
       TASKS: 216
    NODENAME: livingston
     RELEASE: 2.6.9-22.0.1.EL.TEST.81052.1smp
     VERSION: #1 SMP Fri Nov 11 12:32:07 EST 2005
     MACHINE: x86_64  (3591 Mhz)
      MEMORY: 7 GB
       PANIC: ""
         PID: 17108
     COMMAND: "pdflush"
        TASK: 101b72f17f0  [THREAD_INFO: 100be7a0000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 17108  TASK: 101b72f17f0       CPU: 0   COMMAND: "pdflush"
 #0 [100be7a1ac8] schedule at ffffffff80304536
 #1 [100be7a1fd8] kernel_thread at ffffffff80110c9b
crash> 

This has been fixed in the "upstream" version of crash, version 4.0-2.16.
This description is from the crash changelog file:

4.0-2.16   Fix for the x86_64 backtrace code to search all of the exception
           stacks for the origin of the active tasks' backtrace when the
           information is not available in the dumpfile header.  Up until now,
           the search was made in the process stack, the per-cpu IRQ stack,
           and the per-cpu NMI exception stack; this patch looks at all 3 
           exception stacks in 2.4 kernels (NMI, STACKFAULT and DOUBLEFAULT), 
           and all 5 exception stacks in 2.6 kernels (NMI, STACKFAULT, 
           DOUBLEFAULT, DEBUG and MCE).

And when running crash on the same vmcore as above, the proper trace is shown:

# crash vmlinux-2.6.9-22.0.1.EL.TEST.81052.1smp carmen_vmcore_5

crash 4.0-2.16
Copyright (C) 2002, 2003, 2004, 2005  Red Hat, Inc.
Copyright (C) 2004, 2005  IBM Corporation
Copyright (C) 1999-2005  Hewlett-Packard Co
Copyright (C) 2005  Fujitsu Limited
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: vmlinux-2.6.9-22.0.1.EL.TEST.81052.1smp
    DUMPFILE: carmen_vmcore_5
        CPUS: 4
        DATE: Fri Nov 18 01:34:07 2005
      UPTIME: 4 days, 12:36:02
LOAD AVERAGE: 23.73, 23.70, 18.17
       TASKS: 216
    NODENAME: livingston
     RELEASE: 2.6.9-22.0.1.EL.TEST.81052.1smp
     VERSION: #1 SMP Fri Nov 11 12:32:07 EST 2005
     MACHINE: x86_64  (3591 Mhz)
      MEMORY: 7 GB
       PANIC: ""
         PID: 17108
     COMMAND: "pdflush"
        TASK: 101b72f17f0  [THREAD_INFO: 100be7a0000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 17108  TASK: 101b72f17f0       CPU: 0   COMMAND: "pdflush"
 #0 [ffffffff8044d3b0] start_disk_dump at ffffffffa013c28f
 #1 [ffffffff8044d3e0] try_crashdump at ffffffff8014a8be
 #2 [ffffffff8044d3f0] die at ffffffff8011195c
 #3 [ffffffff8044d410] do_stack_segment at ffffffff8011208e
 #4 [ffffffff8044d450] stack_segment at ffffffff80111101
    [exception RIP: origin_map+375]
    RIP: ffffffffa008f05a  RSP: 00000100be7a18d8  RFLAGS: 00010202
    RAX: 6b6b6b6b6b6b6b2b  RBX: 00000101b69cba28  RCX: 0000000300000000
    RDX: ffffffffa008e9aa  RSI: 0000000000000246  RDI: 0000000000000001
    RBP: 6b6b6b6b6b6b6b2b   R8: 0000010018480fc8   R9: 0000010153bbfc20
    R10: 0000000000000246  R11: 0000000000000246  R12: 00000100607df1f0
    R13: 0000010153bbfc20  R14: 0000010197dd3f18  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <exception stack> ---
 #5 [100be7a18d8] origin_map at ffffffffa008f05a
 #6 [100be7a1960] __map_bio at ffffffffa003c1f6
 #7 [100be7a1990] __split_bio at ffffffffa003c4b7
 #8 [100be7a19e0] __down_read at ffffffff80304d42
 #9 [100be7a1a20] dm_request at ffffffffa003c76e
#10 [100be7a1a40] generic_make_request at ffffffff8024abd2
#11 [100be7a1a60] recalc_task_prio at ffffffff801313c1
#12 [100be7a1a90] submit_bio at ffffffff8024acde
#13 [100be7a1ac0] bio_alloc at ffffffff8017c708
#14 [100be7a1af0] submit_bh at ffffffff8017a63a
#15 [100be7a1b20] __block_write_full_page at ffffffff8017b511
#16 [100be7a1b70] ext3_ordered_writepage at ffffffffa0063b46
#17 [100be7a1ba0] mpage_writepages at ffffffff80197851
#18 [100be7a1c80] thread_return at ffffffff80304560
#19 [100be7a1d50] dm_table_any_congested at ffffffffa003e358
#20 [100be7a1db0] __writeback_single_inode at ffffffff80196681
#21 [100be7a1df0] sync_sb_inodes at ffffffff80196d0e
#22 [100be7a1e30] writeback_inodes at ffffffff80196fa5
#23 [100be7a1e50] background_writeout at ffffffff8015d828
#24 [100be7a1ed0] pdflush at ffffffff8015e358
#25 [100be7a1f20] kthread at ffffffff8014a133
#26 [100be7a1f50] kernel_thread at ffffffff80110ca3
crash>

Note that the backtrace starts on the STACKFAULT exception stack,
and transitions back to the process stack.



Comment 10 Red Hat Bugzilla 2006-07-20 14:54:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0457.html