Bug 517952

Summary: crash utility fails to open vmcore in 2.6.31-rc5.2.el5rt
Product: Red Hat Enterprise MRG Reporter: IBM Bug Proxy <bugproxy>
Component: realtime-utilitiesAssignee: John Kacur <jkacur>
Status: CLOSED DUPLICATE QA Contact: David Sommerseth <davids>
Severity: medium Docs Contact:
Priority: low    
Version: 1.2CC: bhu, lgoncalv, ovasik, williams
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-01 11:45:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description IBM Bug Proxy 2009-08-18 06:30:39 UTC
=Comment: #0=================================================
GOWRISHANKAR MUTHUKRISHNAN <gowrishankar.m.com> - 
Problem:

crash (version 4.0-7.2.3) fails to open vmcore generated by kdump,
over the kernel 2.6.31-rc5.2.el5rt

[root@elm9m93 ~]# crash /usr/lib/debug/lib/modules//vmlinux
/var/crash/2009-08-12-09\:08/vmcore 

crash: invalid structure size: x8664_pda
       FILE: x86_64.c  LINE: 561  FUNCTION: x86_64_cpu_pda_init()

[/usr/bin/crash] error trace: 449bf6 => 4ca449 => 4cbbe2 => 5030bc

  5030bc: SIZE_verify+168
  4cbbe2: (undetermined)
  4ca449: x86_64_init+3205
  449bf6: main_loop+147


--------------------------------------------------------------------------
Kernel info:

[root@elm9m93 ~]# uname -a
Linux elm9m93 2.6.31-rc5.2.el5rt #1 SMP PREEMPT RT Thu Aug 6 06:42:47 EDT 2009 x86_64 x86_64 x86_64
GNU/Linux

[root@elm9m93 ~]# cat /proc/cmdline 
root=/dev/sda3 console=ttyS1,19200 crashkernel=128M@32M

[root@elm9m93 ~]# chkconfig kdump --list 
kdump          	0:off	1:off	2:off	3:on	4:off	5:on	6:off

[root@elm9m93 ~]# cat /proc/iomem 
00000000-0009cfff : System RAM
0009d000-0009ffff : reserved
000e0000-000fffff : reserved
00100000-cffbcdbf : System RAM
  01000000-013e29ed : Kernel code
  013e29ee-015e988f : Kernel data
  01689000-0174023f : Kernel bss
  02000000-09ffffff : Crash kernel
cffbcdc0-cffcffff : ACPI Tables
cffd0000-cfffffff : reserved
d0000000-d01fffff : PCI Bus 0000:02
  d0000000-d01fffff : 0000:02:00.0
d0200000-d02fffff : PCI Bus 0000:0c
  d0200000-d027ffff : 0000:0c:00.0
  d0280000-d02fffff : 0000:0c:00.1
d1000000-d6ffffff : PCI Bus 0000:0c
  d2000000-d3ffffff : 0000:0c:00.1
    d2000000-d3ffffff : bnx2x
  d4000000-d5ffffff : 0000:0c:00.0
    d4000000-d5ffffff : bnx2x
  d6000000-d67fffff : 0000:0c:00.1
    d6000000-d67fffff : bnx2x
  d6800000-d6ffffff : 0000:0c:00.0
    d6800000-d6ffffff : bnx2x
d7000000-d9ffffff : PCI Bus 0000:05
  d7000000-d9ffffff : PCI Bus 0000:06
    d8000000-d9ffffff : 0000:06:00.0
      d8000000-d9ffffff : bnx2
da000000-dcffffff : PCI Bus 0000:03
  da000000-dcffffff : PCI Bus 0000:04
    da000000-dbffffff : 0000:04:00.0
      da000000-dbffffff : bnx2
dd000000-deffffff : PCI Bus 0000:02
  defe0000-defeffff : 0000:02:00.0
    defe0000-defeffff : mpt
  deffc000-deffffff : 0000:02:00.0
    deffc000-deffffff : mpt
e0000000-efffffff : reserved
  e0000000-efffffff : pnp 00:0b
    e0000000-e0ffffff : PCI MMCONFIG 0 [00-0f]
f0000000-f7ffffff : PCI Bus 0000:01
  f0000000-f7ffffff : 0000:01:01.0
f8000000-f8ffffff : PCI Bus 0000:01
  f8000000-f800ffff : 0000:01:01.0
  f8020000-f803ffff : 0000:01:01.0
f9000000-f90003ff : 0000:00:1d.7
  f9000000-f90003ff : ehci_hcd
fe000000-fe01ffff : pnp 00:0b
  fe000000-fe01ffff : i5k_amb
fe020000-fe6fffff : pnp 00:0b
fe700000-fe7003ff : 0000:00:08.0
fe700400-febfffff : pnp 00:0b
fec00000-ffffffff : reserved
  fec00000-fec00fff : IOAPIC 0
  fec80000-fec80fff : IOAPIC 1
  fed1c000-fed1ffff : pnp 00:0b
  fee00000-fee00fff : Local APIC
  fff00000-ffffffff : pnp 00:0b
100000000-42fffffff : System RAM
[root@elm9m93 ~]# 


[root@elm9m93 ~]# cat /proc/meminfo 
MemTotal:       16339700 kB
MemFree:        14245988 kB
Buffers:           38692 kB
Cached:          1467384 kB
SwapCached:            0 kB
Active:           409692 kB
Inactive:        1120988 kB
Active(anon):      28620 kB
Inactive(anon):        0 kB
Active(file):     381072 kB
Inactive(file):  1120988 kB
Unevictable:        4996 kB
Mlocked:            4996 kB
SwapTotal:       2096472 kB
SwapFree:        2096472 kB
Dirty:                 4 kB
Writeback:             0 kB
AnonPages:         29604 kB
Mapped:            10688 kB
Slab:             427472 kB
SReclaimable:     124000 kB
SUnreclaim:       303472 kB
PageTables:         3304 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    10266320 kB
Committed_AS:      75224 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       79492 kB
VmallocChunk:   34359658459 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7920 kB
DirectMap2M:    16769024 kB

--------------------------------------------------------------------------
Hardware used:
HS21, LS21

--------------------------------------------------------------------------
Steps to reproduce:
o pass crashkernel param to kernel
o start kdump service
   /etc/init.d/kdump start
o trigger panic
   echo 1 > /proc/sys/kernel/panic
   echo 1 > /proc/sys/kernel/panic_on_oops
   echo c > /proc/sysrq-trigger
o install debug info kernel of current version
o run "crash <debug kernel> <vmcore>
=Comment: #1=================================================

SRIPATHI KODI <sripathik.com> - 
Could you please try:
1) crash from RHEL5.3, in case you have tried this on RHEL5.2
2) crash from http://people.redhat.com/anderson/crash-4.0-8.12.tar.gz
=Comment: #2=================================================
GOWRISHANKAR MUTHUKRISHNAN <gowrishankar.m.com> - 
(In reply to comment #1)
> Could you please try:
> 1) crash from RHEL5.3, in case you have tried this on RHEL5.2

It was RHEL5.3 on which the problem was reported with RT kernel

> 2) crash from http://people.redhat.com/anderson/crash-4.0-8.12.tar.gz
> 

Checking now with this version of crash!

=Comment: #3=================================================
GOWRISHANKAR MUTHUKRISHNAN <gowrishankar.m.com> - 

crash-4.0-8.12 solved the problem! I can able to get right call stack for panic
and run other crash commands.

[root@elm9m93 ~]# crash /usr/lib/debug/lib/modules//vmlinux
/var/crash/2009-08-17-12\:50/vmcore 

crash 4.0-8.12
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.31-rc5.3.el5rt/vmlinux
    DUMPFILE: /var/crash/2009-08-17-12:50/vmcore
        CPUS: 8
        DATE: Mon Aug 17 12:49:38 2009
      UPTIME: 00:26:06
LOAD AVERAGE: 0.85, 0.75, 0.37
       TASKS: 293
    NODENAME: elm9m93
     RELEASE: 2.6.31-rc5.3.el5rt
     VERSION: #1 SMP PREEMPT RT Tue Aug 11 08:52:22 EDT 2009
     MACHINE: x86_64  (2833 Mhz)
      MEMORY: 16 GB
       PANIC: "Oops: 0002 [#1] PREEMPT SMP " (check log for details)
         PID: 6675
     COMMAND: "bash"
        TASK: ffff88040298e3c0  [THREAD_INFO: ffff8803fc404000]
         CPU: 7
       STATE: TASK_RUNNING (PANIC)

crash> kmem -i
              PAGES        TOTAL      PERCENTAGE
 TOTAL MEM  4084925      15.6 GB         ----
      FREE  3804207      14.5 GB   93% of TOTAL MEM
      USED   280718       1.1 GB    6% of TOTAL MEM
    SHARED        0            0    0% of TOTAL MEM
   BUFFERS     7575      29.6 MB    0% of TOTAL MEM
    CACHED   143008     558.6 MB    3% of TOTAL MEM
      SLAB    89772     350.7 MB    2% of TOTAL MEM

TOTAL SWAP   524118         2 GB         ----
 SWAP USED        1         4 KB    0% of TOTAL SWAP
 SWAP FREE   524117         2 GB   99% of TOTAL SWAP
crash> bt
PID: 6675   TASK: ffff88040298e3c0  CPU: 7   COMMAND: "bash"
 #0 [ffff8803fc405dc8] sysrq_handle_crash at ffffffff81285031
 #1 [ffff8803fc405df0] __sysrq_get_key_op at ffffffff812850b3
 #2 [ffff8803fc405e10] __handle_sysrq at ffffffff812852d1
 #3 [ffff8803fc405e60] write_sysrq_trigger at ffffffff812853b5
 #4 [ffff8803fc405e90] proc_reg_write at ffffffff8116c207
 #5 [ffff8803fc405ef0] vfs_write at ffffffff8111630f
 #6 [ffff8803fc405f30] sys_write at ffffffff8111646d
 #7 [ffff8803fc405f80] system_call_fastpath at ffffffff8100c0ab
    RIP: 0000003f178c56a0  RSP: 00007fff777ddef0  RFLAGS: 00010246
    RAX: 0000000000000001  RBX: ffffffff8100c0ab  RCX: 0000000000000001
    RDX: 0000000000000002  RSI: 00007fd990cc9000  RDI: 0000000000000001
    RBP: 00007fd990cc9000   R8: 00000000ffffffff   R9: 00007fd990cba6e0
    R10: 0000003f17b51a30  R11: 0000000000000246  R12: 0000000000000000
    R13: 0000000000000002  R14: 0000003f17b50780  R15: 0000000000000002
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> 

=Comment: #7=================================================
SRIPATHI KODI <sripathik.com> - 

Note to RH: With MRG moving to 2.6.31 based kernels we will need an even newer version of crash to
open it's system dumps.
http://people.redhat.com/anderson/crash-4.0-8.12.tar.gz works.

Comment 1 Clark Williams 2010-03-29 20:59:07 UTC
Need to verify this failure with the 2.6.33-rt kernel

Comment 2 IBM Bug Proxy 2010-03-29 22:50:45 UTC
------- Comment From vernux.com 2010-03-29 18:45 EDT-------
[root@elm9m94 ~]# uname -r
2.6.33.1-rt11.9.el5rt
[root@elm9m94 ~]# crash /usr/lib/debug/lib/modules/2.6.33.1-rt11.9.el5rt/vmlinux /var/crash/2010-03-29-22\:25/vmcore

crash 4.0-8.9.1.el5
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

crash: invalid structure size: x8664_pda
FILE: x86_64.c  LINE: 584  FUNCTION: x86_64_cpu_pda_init()

[/usr/bin/crash] error trace: 44a0ff => 4ced4d => 4d0607 => 5098a5

5098a5: SIZE_verify+168
4d0607: (undetermined)
4ced4d: x86_64_init+3205
44a0ff: main_loop+152

I built my own package of crash-5.0.1 that works just fine.  Can we upgrade that?

Comment 3 IBM Bug Proxy 2010-03-30 17:01:23 UTC
------- Comment From amitarora.com 2010-03-30 12:57 EDT-------
*** Bug 57442 has been marked as a duplicate of this bug. ***

Comment 4 Dave Anderson 2010-03-30 17:07:07 UTC
*** Bug 532024 has been marked as a duplicate of this bug. ***

Comment 5 Luis Claudio R. Goncalves 2011-02-01 11:45:19 UTC

*** This bug has been marked as a duplicate of bug 530325 ***