Bug 456852 - [5.2][Kdump] Capture Kernel Panic with CCISS
Summary: [5.2][Kdump] Capture Kernel Panic with CCISS
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: ia64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Doug Chapman
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-07-28 07:53 UTC by Qian Cai
Modified: 2008-08-29 16:52 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-29 16:52:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Qian Cai 2008-07-28 07:53:47 UTC
Description of problem:
Kdump is not working on an IA64 box (hp-bl860c-01.rhts.bos.redhat.com). Capture
Kernel failed right after loading cciss module. It is not a regression though,
as it has been masked by another CCISS driver bug (hung in MSI mode), which
seems has just been fixed in RHEL5.2.

HP CISS Driver (v 3.6.20-RH1)
Loading cciss.ko module
cciss: using PCI PM to reset controller
Entered OS INIT handler. PSP=fff301a0 cpu=0 monarch=1
Delaying for 5 seconds...
Entered OS MCA handler. PSP=20010000fff21120 cpu=0 monarch=1
All OS MCA slaves have reached rendezvous
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
Delaying for 5 seconds...
INIT 440[0]: bugcheck! 0 [1]
Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth
sunrpc ipv6 xfrm_nalgo crypto_api vfat fat dm_multipath button parport_pc lp
parport joydev tg3 shpchp sg dm_snapshot dm_zero dm_mirror dm_mod qla2xxx
scsi_transport_fc cciss mptsas mptscsih mptbase scsi_transport_sas sd_mod
scsi_mod raid0 ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 0, CPU 0, comm:             INIT 440
psr : 0000101808022010 ifs : 8000000000000004 ip  : [<e000000008db8000>]    Not
tainted
ip is at 0xe000000008db8000
unat: 0000000000000000 pfs : 000000000000038b rsc : 0000000000000000
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000000000553b1d
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000100059bc0 b6  : e000000008db8000 b7  : a0000001004e2be0
f6  : 000000000000000000000 f7  : 1003e8208208208208209
f8  : 1003e000000000000001b f9  : 1003efffffffffffffffe
f10 : 1003e0000000000000003 f11 : 1003e8208208208208209
r1  : 0000000000000000 r2  : e000000028000b28 r3  : e000000100008348
r8  : e00000010000f950 r9  : 0000000000010000 r10 : 0000000000010000
r11 : 0000000000000000 r12 : e00000010000f760 r13 : e000000100008000
r14 : 000000000000000f r15 : 0000000000000000 r16 : e00000010000f7e8
r17 : e00000010000f778 r18 : e000000008db8000 r19 : a0000001009f82b8
r20 : a0000001009f82b8 r21 : e0000100eb630ca0 r22 : ffffffffff000000
r23 : a000000100735ca8 r24 : a0000001004e2be0 r25 : a0000001009fb580
r26 : a0000001009fb430 r27 : a0000001009fb430 r28 : e0000100fae75598
r29 : a0000001002bb780 r30 : 0000000000000000 r31 : 000000000000000c

Call Trace:
 [<a000000100013ae0>] show_stack+0x40/0xa0
                                sp=e00000010000f2f0 bsp=e0000001000093a8
 [<a0000001000143e0>] show_regs+0x840/0x880
                                sp=e00000010000f4c0 bsp=e000000100009350
 [<a000000100037bc0>] die+0x1c0/0x2c0
                                sp=e00000010000f4c0 bsp=e000000100009308
 [<a000000100037d10>] die_if_kernel+0x50/0x80
                                sp=e00000010000f4e0 bsp=e0000001000092d8
 [<a000000100633610>] ia64_bad_break+0x270/0x4a0
                                sp=e00000010000f4e0 bsp=e0000001000092b0
 [<a00000010000c020>] __ia64_leave_kernel+0x0/0x280
                                sp=e00000010000f590 bsp=e0000001000092b0
 [<e000000008db8000>] 0xe000000008db8000
                                sp=e00000010000f760 bsp=e000000100009290
 [<a000000100059bc0>] ia64_machine_kexec+0x360/0x3a0
                                sp=e00000010000f760 bsp=e000000100009258
 [<a00000010000c6d0>] unw_init_running+0x70/0xa0
                                sp=e00000010000f770 bsp=e000000100009230
 [<a000000100059760>] machine_kexec+0x80/0xa0
                                sp=e00000010000fb50 bsp=e000000100009210
 [<a00000010005e7a0>] machine_kdump_on_init+0x60/0x80
                                sp=e00000010000fb50 bsp=e0000001000091f0
 [<a00000010005eaa0>] kdump_init_notifier+0x2e0/0x340
                                sp=e00000010000fb50 bsp=e0000001000091b8
 [<a0000001006365f0>] notifier_call_chain+0x50/0xc0
                                sp=e00000010000fb50 bsp=e000000100009180
 [<a00000010009b910>] atomic_notifier_call_chain+0x30/0x60
                                sp=e00000010000fb50 bsp=e000000100009150
 [<a000000100048a20>] ia64_init_handler+0x940/0xa40
                                sp=e00000010000fb50 bsp=e0000001000090d0
 [<a000000100049500>] ia64_os_init_virtual_begin+0x40/0x140
                                sp=e00000010000fb80 bsp=e0000001000090d0
 <0>Kernel panic - not syncing: Fatal exception


Version-Release number of selected component (if applicable):
kernel-2.6.18-92.el5
kexec-tools-1.102pre-21.el5

How reproducible:
always

Steps to Reproduce:
1. configure Kdump with 512M@256M
2. SysRq-C
  
Additional info:
- full serial log,
http://rhts.redhat.com/testlogs/26065/95903/810686/serial.log

- system information,
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3766900

- BIOS information,
* ROM Version : 03.01
* ROM Date    : 09/26/2007
* BMC Version :  05.20

Comment 1 Doug Chapman 2008-08-28 17:26:33 UTC
I will take a look at this.  I have tested kdump on several other HP ia64 systems w/cciss and have not seen any issues so if it is broken here it might be due to the specific model of cciss card or possibly just out of date cciss firmware which I can update.

We also have an updated version of the cciss driver posted but not yet pulled into RHEL5.3.  I will try that also.

Comment 2 Doug Chapman 2008-08-28 21:59:20 UTC
OK, wasn't as simple as I thought.  I tried all the above with no luck.  reassigning to myself and I will investigate.

Comment 3 Luming Yu 2008-08-29 06:32:56 UTC
The crash message seems to indicate "kexec -e" doesn't work...
Could you verify if "kexec -e" works on this box with cciss module loaded?

Thanks,
Luming

Comment 4 Doug Chapman 2008-08-29 16:52:09 UTC
This is broken hardware.  I tested an identical system back at HP with identical firmware revs and it works properly.  Also I have tested kdump on every other HP ia64 system I can get my hands on and this is the only place it fails.  I will work on getting the hardware fixed next week.


Note You need to log in before you can comment on or make changes to this bug.