Bug 649051

Summary: KVM dumpfiles: x86 backtrace fixes on live and crashed systems
Product: Red Hat Enterprise Linux 6 Reporter: Dave Anderson <anderson>
Component: crashAssignee: Dave Anderson <anderson>
Status: CLOSED ERRATA QA Contact: Kernel Dump QE <kernel-dump-qe>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: phan, qcai
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: crash-5.1.1-1.el6 Doc Type: Bug Fix
Doc Text:
When analyzing a KVM dump file from an x86 guest system, the crash utility was unable to determine the starting EIP and ESP hooks, and produced an invalid backtrace. With this update, the crash utility has been updated to use the 64-bit CPU device format in x86 KVM dump files by default, and only use the 32-bit format when it is determined that the host machine was running a 32-bit kernel. As a result, running the "bt" command when analyzing such a dump file now produces a correct backtrace.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 13:04:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 649070    
Bug Blocks:    

Description Dave Anderson 2010-11-02 20:29:35 UTC
Description of problem:

The crash utility requires three significant x86 backtrace-related
fixes that have been applied upstream in crash version 5.0.9:

 - Fix to utilize the correct "cpu" device format in x86 KVM dumpfiles
   Without the patch, the x86 registers were read in a 32-bit format, 
   which is only true if the host machine was running a 32-bit kernel.
   With the patch, the format defaults to the 64-bit format, and is
   switched to the 32-bit format if it can be determined that the host
   machine was running a 32-bit kernel.
   (hutao.com, anderson)

 - Save the per-cpu register contents stored in the "cpu" devices of
   x86 KVM dumpfiles, and use their contents for x86 backtrace ESP and
   EIP hooks in the case of KVM "live dumps", i.e., where the guest
   system was not in a crashed state when the "virsh dump" operation
   was done on the KVM host.  If an active task was running in user
   space when a live dump was taken, that will be indicated by the
   "bt" output, along with the user-space register contents.  The saved
   x86 register set for each cpu may also be displayed with the
   "help -[D|n]" command.
   (hutao.com, anderson)

 - Fix for the x86 "bt" command to correctly find the starting backtrace
   EIP and ESP hooks for the active tasks in KVM dumpfiles where the
   kernel had crashed.
   (anderson)

Version-Release number of selected component (if applicable):

crash-5.0.0-23.el6

How reproducible:

Always.

Steps to Reproduce:

1. Forcibly crash an x86 guest system, then do a "virsh dump" on the guest 
2. Perform a "live dump" on a running x86 guest system by doing a "virsh dump"
   on the guest (without crashing the guest.)
3. Run crash on the dumpfile, and execute the "bt -a" command
  
Actual results:

On a sample live-system dump taken with "virsh dump", the two active
user-mode tasks on cpus 0 and 2 and the two active idle tasks on 
cpus 1 and 3 all incorrectly show that thay have blocked in schedule(),
because crash 5.0.0-23.el6 cannot determine the starting ESP/EIP
hooks for the backtraces:
  
  crash> bt -a
  PID: 20931  TASK: f27baa90  CPU: 0   COMMAND: "gs"
   #0 [d2411f30] schedule at c080b372
   #1 [d2411fb0] ia32_sysenter_target at c0409a6e
      EAX: 00001000  EBX: 00000006  ECX: b73bd000  EDX: 00001000 
      DS:  007b      ESI: 08f04288  ES:  007b      EDI: 00000800
      SS:  007b      ESP: bff6c4b8  EBP: bff6c4f4  GS:  0033
      CS:  0073      EIP: 0086f424  ERR: 00000003  EFLAGS: 00000246 
  
  PID: 0      TASK: f7069030  CPU: 1   COMMAND: "swapper"
   #0 [f709fefc] schedule at c080b372
   #1 [f709ff7c] cpu_idle at c040887e
  
  PID: 20971  TASK: e72e0030  CPU: 2   COMMAND: "lpc"
   #0 [d255fe80] schedule at c080b372
   #1 [d255ff08] handle_mm_fault at c04fa81d
   #2 [d255ff38] do_page_fault at c080f8eb
   #3 [d255ffb0] error_code (via page_fault) at c080d809
      EAX: 00000000  EBX: 00cb0ff4  ECX: 01d55008  EDX: 00000021 
      DS:  007b      ESI: 01d54fe8  ES:  007b      EDI: 00010ff9
      SS:  007b      ESP: bf9b28e0  EBP: bf9b2968  GS:  0033
      CS:  0073      EIP: 00b9b527  ERR: ffffffff  EFLAGS: 00010246 
  
  PID: 0      TASK: f70b1560  CPU: 3   COMMAND: "swapper"
   #0 [f70d3efc] schedule at c080b372
   #1 [f70d3f7c] cpu_idle at c040887e
  crash>

Expected results:

crash 5.0.9 uses the saved registers to display the precise location
where the 4 tasks were running when the "virsh dump" was performed:

  crash> bt -a
  PID: 20931  TASK: f27baa90  CPU: 0   COMMAND: "gs"
      EAX: 08dea56c  EBX: 04f57024  ECX: 08dea564  EDX: 08dea564
      DS:  007b      ESI: 08dea56c  ES:  007b      EDI: 08df5d34
      SS:  007b      ESP: bff6c6bc  EBP: bff6c6d8  GS:  0033
      CS:  0073      EIP: 04ade1f0  EFLAGS: 00000206
   #0 [user space]
  
  PID: 0      TASK: f7069030  CPU: 1   COMMAND: "swapper"
      EAX: f709e000  EBX: 00000000  ECX: 00000000  EDX: 00000000
      DS:  007b      ESI: 00000001  ES:  007b      EDI: 00000000
      SS:  0068      ESP: f709ff74  EBP: f709ffb0  GS:  00e0
      CS:  0060      EIP: c042eb62  EFLAGS: 00000246
   #0 [f709ff74] native_safe_halt at c042eb62
   #1 [f709ff7c] cpu_idle at c040887e
  
  PID: 20971  TASK: e72e0030  CPU: 2   COMMAND: "lpc"
      EAX: 00001108  EBX: 00e0c388  ECX: 00000003  EDX: bf9b2a01
      DS:  007b      ESI: 01d4a788  ES:  007b      EDI: 01d48ac8
      SS:  007b      ESP: bf9b29e0  EBP: bf9b2bb8  GS:  0033
      CS:  0073      EIP: 00e065cb  EFLAGS: 00000202
   #0 [user space]
  
  PID: 0      TASK: f70b1560  CPU: 3   COMMAND: "swapper"
      EAX: f70d2000  EBX: 00000000  ECX: 00000000  EDX: 00000000
      DS:  007b      ESI: 00000003  ES:  007b      EDI: 00000000
      SS:  0068      ESP: f70d3f74  EBP: f70d3fb0  GS:  00e0
      CS:  0060      EIP: c042eb62  EFLAGS: 00000246
   #0 [f70d3f74] native_safe_halt at c042eb62
   #1 [f70d3f7c] cpu_idle at c040887e
  crash>

Actual results:

When an actual crash occurred prior to taking the "virsh dump",
crash 5.0.0-23.el6 displays incorrect backtraces for the panic
task on cpu 4 and the other active tasks on cpus 0, 1 and 2:

  # crash vmlinux guest32-crash2
  
  crash 5.0.0-23.el6
  
  ... [ cut ] ...
  
  crash> bt -a
  PID: 8860   TASK: f12a8a90  CPU: 0   COMMAND: "spell"
   #0 [f12c5ee0] schedule at c080b372
   #1 [f12c5f68] remove_vma at c04fd0f7
   #2 [f12c5f80] copy_to_user at c05eb51e
   #3 [f12c5f98] audit_syscall_exit at c04a9651
   #4 [f12c5fb0] error_code at c080d809
      EAX: 003f7580  EBX: ffffffff  ECX: 00000000  EDX: 00000001 
      DS:  007b      ESI: 003f7580  ES:  007b      EDI: 00000000
      SS:  007b      ESP: bfaf3eb0  EBP: bfaf3ed8  GS:  0033
      CS:  0073      EIP: 0806f8da  ERR: ffffffff  EFLAGS: 00010286 
  
  PID: 8847   TASK: f23f8030  CPU: 1   COMMAND: "gunzip"
   #0 [f12d3f30] schedule at c080b372
   #1 [f12d3fb0] error_code at c080d809
      EAX: 0026851c  EBX: 00267fc4  ECX: 00000000  EDX: 00000000 
      DS:  007b      ESI: 00000028  ES:  007b      EDI: 00000000
      SS:  007b      ESP: bfdff800  EBP: bfdff848  GS:  0000
      CS:  0073      EIP: 00252490  ERR: ffffffff  EFLAGS: 00010246 
  
  PID: 8857   TASK: f1176560  CPU: 2   COMMAND: "gsdj500"
   #0 [f11a3fb4] ret_from_fork at c0409920
   #1 [f11a3fb0] error_code at c080d809
      EAX: 003f885c  EBX: 003f6ff4  ECX: 003f83a0  EDX: 00000000 
      DS:  007b      ESI: 00000000  ES:  007b      EDI: 00268000
      SS:  007b      ESP: bfc5a730  EBP: bfc5a768  GS:  0033
      CS:  0073      EIP: 0030cb4b  ERR: ffffffff  EFLAGS: 00010246 
  
  PID: 5021   TASK: f20f6a90  CPU: 3   COMMAND: "bash"
   #0 [f114be00] schedule at c080b372
   #1 [f114beac] mntput_no_expire at c0537a00
   #2 [f114bec0] do_filp_open at c052c118
   #3 [f114bf74] vfs_write at c051de4e
   #4 [f114bf94] sys_write at c051e8cc
   #5 [f114bfb0] ia32_sysenter_target at c04099f4
      EAX: 00000004  EBX: 00000001  ECX: b788e000  EDX: 00000002 
      DS:  007b      ESI: 00000002  ES:  007b      EDI: b788e000
      SS:  007b      ESP: bfcd5f2c  EBP: bfcd5f64  GS:  0033
      CS:  0073      EIP: 00ef7424  ERR: 00000004  EFLAGS: 00000246 
  crash> 

Expected results:

crash 5.0.9 displays them all correctly:
  
  # crash vmlinux guest32-crash2
  
  crash 5.0.9
  
  ... [ cut ] ...
  
  crash> bt -a
  PID: 8860   TASK: f12a8a90  CPU: 0   COMMAND: "spell"
   #0 [f12c5e34] stop_this_cpu at c0410bb9
   #1 [f12c5e3c] reboot_interrupt at c040a200
      EAX: f12c4000  EBX: c23987c0  ECX: 00000163  EDX: 0000000f  EBP: c2191bc0 
      DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000163  GS:  00e0
      CS:  0060      EIP: c0436b28  ERR: ffffff07  EFLAGS: 00000286 
   #2 [f12c5e70] kmap_atomic_prot at c0436b28
   #3 [f12c5e98] kmap_atomic at c0436c77
   #4 [f12c5ea8] handle_pte_fault at c04fa4d1
   #5 [f12c5f08] handle_mm_fault at c04fa81d
   #6 [f12c5f38] do_page_fault at c080f85f
   #7 [f12c5fb0] error_code (via page_fault) at c080d809
      EAX: 003f7580  EBX: ffffffff  ECX: 00000000  EDX: 00000001 
      DS:  007b      ESI: 003f7580  ES:  007b      EDI: 00000000
      SS:  007b      ESP: bfaf3eb0  EBP: bfaf3ed8  GS:  0033
      CS:  0073      EIP: 0806f8da  ERR: ffffffff  EFLAGS: 00010286 
  
  PID: 8847   TASK: f23f8030  CPU: 1   COMMAND: "gunzip"
   #0 [f12d3dfc] stop_this_cpu at c0410bb9
   #1 [f12d3e04] reboot_interrupt at c040a200
      EAX: fff83000  EBX: fff83000  ECX: c000ac18  EDX: fff83000  EBP: c23c9b80 
      DS:  007b      ESI: 0007c000  ES:  007b      EDI: 00000000  GS:  00e0
      CS:  0060      EIP: c042f1e3  ERR: ffffff07  EFLAGS: 00000287 
   #2 [f12d3e38] native_flush_tlb_single at c042f1e3
   #3 [f12d3e44] kunmap_atomic at c0436ac2
   #4 [f12d3e50] __do_fault at c04f8ade
   #5 [f12d3ea8] handle_pte_fault at c04f9a18
   #6 [f12d3f08] handle_mm_fault at c04fa81d
   #7 [f12d3f38] do_page_fault at c080f85f
   #8 [f12d3fb0] error_code (via page_fault) at c080d809
      EAX: 0026851c  EBX: 00267fc4  ECX: 00000000  EDX: 00000000 
      DS:  007b      ESI: 00000028  ES:  007b      EDI: 00000000
      SS:  007b      ESP: bfdff800  EBP: bfdff848  GS:  0000
      CS:  0073      EIP: 00252490  ERR: ffffffff  EFLAGS: 00010246 
  
  PID: 8857   TASK: f1176560  CPU: 2   COMMAND: "gsdj500"
   #0 [f11a3e54] stop_this_cpu at c0410bb9
   #1 [f11a3e5c] reboot_interrupt at c040a200
      EAX: fff5c000  EBX: fff5c000  ECX: c000aae0  EDX: fff5c000  EBP: 00000000 
      DS:  007b      ESI: 000a3000  ES:  007b      EDI: 00000000  GS:  00e0
      CS:  0060      EIP: c042f1e3  ERR: ffffff07  EFLAGS: 00000293 
   #2 [f11a3e90] native_flush_tlb_single at c042f1e3
   #3 [f11a3e9c] kunmap_atomic at c0436ac2
   #4 [f11a3ea8] handle_pte_fault at c04fa138
   #5 [f11a3f08] handle_mm_fault at c04fa81d
   #6 [f11a3f38] do_page_fault at c080f85f
   #7 [f11a3fb0] error_code (via page_fault) at c080d809
      EAX: 003f885c  EBX: 003f6ff4  ECX: 003f83a0  EDX: 00000000 
      DS:  007b      ESI: 00000000  ES:  007b      EDI: 00268000
      SS:  007b      ESP: bfc5a730  EBP: bfc5a768  GS:  0033
      CS:  0073      EIP: 0030cb4b  ERR: ffffffff  EFLAGS: 00010246 
  
  PID: 5021   TASK: f20f6a90  CPU: 3   COMMAND: "bash"
   #0 [f114be10] panic at c080adbe
   #1 [f114be20] oops_end at c080e427
   #2 [f114be34] no_context at c043089d
   #3 [f114be58] bad_area at c0430b26
   #4 [f114be6c] do_page_fault at c080fb9b
   #5 [f114bee4] error_code (via page_fault) at c080d809
      EAX: 00000063  EBX: 00000063  ECX: c09e1c8c  EDX: 00000000  EBP: 00000000 
      DS:  007b      ESI: c0a09ca0  ES:  007b      EDI: 00000286  GS:  00e0
      CS:  0060      EIP: c068124f  ERR: ffffffff  EFLAGS: 00010096 
   #6 [f114bf18] sysrq_handle_crash at c068124f
   #7 [f114bf24] __handle_sysrq at c0681469
   #8 [f114bf48] write_sysrq_trigger at c068150a
   #9 [f114bf54] proc_reg_write at c0569eb2
  #10 [f114bf74] vfs_write at c051de4e
  #11 [f114bf94] sys_write at c051e8cc
  #12 [f114bfb0] ia32_sysenter_target at c04099f4
      EAX: 00000004  EBX: 00000001  ECX: b788e000  EDX: 00000002 
      DS:  007b      ESI: 00000002  ES:  007b      EDI: b788e000
      SS:  007b      ESP: bfcd5f2c  EBP: bfcd5f64  GS:  0033
      CS:  0073      EIP: 00ef7424  ERR: 00000004  EFLAGS: 00000246 
  crash> 
 
Additional info:

Comment 5 Jaromir Hradilek 2011-04-27 19:20:02 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When analyzing a KVM dump file from an x86 guest system, the crash utility was unable to determine the starting EIP and ESP hooks, and produced an invalid backtrace. With this update, the crash utility has been updated to use the 64-bit CPU device format in x86 KVM dump files by default, and only use the 32-bit format when it is determined that the host machine was running a 32-bit kernel. As a result, running the "bt" command when analyzing such a dump file now produces a correct backtrace.

Comment 6 errata-xmlrpc 2011-05-19 13:04:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0561.html