Description of problem: Kdump is not working on an IA64 box (hp-bl860c-01.rhts.bos.redhat.com). Capture Kernel failed right after loading cciss module. It is not a regression though, as it has been masked by another CCISS driver bug (hung in MSI mode), which seems has just been fixed in RHEL5.2. HP CISS Driver (v 3.6.20-RH1) Loading cciss.ko module cciss: using PCI PM to reset controller Entered OS INIT handler. PSP=fff301a0 cpu=0 monarch=1 Delaying for 5 seconds... Entered OS MCA handler. PSP=20010000fff21120 cpu=0 monarch=1 All OS MCA slaves have reached rendezvous mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail. Delaying for 5 seconds... INIT 440[0]: bugcheck! 0 [1] Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api vfat fat dm_multipath button parport_pc lp parport joydev tg3 shpchp sg dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc cciss mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod raid0 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, CPU 0, comm: INIT 440 psr : 0000101808022010 ifs : 8000000000000004 ip : [<e000000008db8000>] Not tainted ip is at 0xe000000008db8000 unat: 0000000000000000 pfs : 000000000000038b rsc : 0000000000000000 rnat: 0000000000000000 bsps: 0000000000000000 pr : 0000000000553b1d ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000100059bc0 b6 : e000000008db8000 b7 : a0000001004e2be0 f6 : 000000000000000000000 f7 : 1003e8208208208208209 f8 : 1003e000000000000001b f9 : 1003efffffffffffffffe f10 : 1003e0000000000000003 f11 : 1003e8208208208208209 r1 : 0000000000000000 r2 : e000000028000b28 r3 : e000000100008348 r8 : e00000010000f950 r9 : 0000000000010000 r10 : 0000000000010000 r11 : 0000000000000000 r12 : e00000010000f760 r13 : e000000100008000 r14 : 000000000000000f r15 : 0000000000000000 r16 : e00000010000f7e8 r17 : e00000010000f778 r18 : e000000008db8000 r19 : a0000001009f82b8 r20 : a0000001009f82b8 r21 : e0000100eb630ca0 r22 : ffffffffff000000 r23 : a000000100735ca8 r24 : a0000001004e2be0 r25 : a0000001009fb580 r26 : a0000001009fb430 r27 : a0000001009fb430 r28 : e0000100fae75598 r29 : a0000001002bb780 r30 : 0000000000000000 r31 : 000000000000000c Call Trace: [<a000000100013ae0>] show_stack+0x40/0xa0 sp=e00000010000f2f0 bsp=e0000001000093a8 [<a0000001000143e0>] show_regs+0x840/0x880 sp=e00000010000f4c0 bsp=e000000100009350 [<a000000100037bc0>] die+0x1c0/0x2c0 sp=e00000010000f4c0 bsp=e000000100009308 [<a000000100037d10>] die_if_kernel+0x50/0x80 sp=e00000010000f4e0 bsp=e0000001000092d8 [<a000000100633610>] ia64_bad_break+0x270/0x4a0 sp=e00000010000f4e0 bsp=e0000001000092b0 [<a00000010000c020>] __ia64_leave_kernel+0x0/0x280 sp=e00000010000f590 bsp=e0000001000092b0 [<e000000008db8000>] 0xe000000008db8000 sp=e00000010000f760 bsp=e000000100009290 [<a000000100059bc0>] ia64_machine_kexec+0x360/0x3a0 sp=e00000010000f760 bsp=e000000100009258 [<a00000010000c6d0>] unw_init_running+0x70/0xa0 sp=e00000010000f770 bsp=e000000100009230 [<a000000100059760>] machine_kexec+0x80/0xa0 sp=e00000010000fb50 bsp=e000000100009210 [<a00000010005e7a0>] machine_kdump_on_init+0x60/0x80 sp=e00000010000fb50 bsp=e0000001000091f0 [<a00000010005eaa0>] kdump_init_notifier+0x2e0/0x340 sp=e00000010000fb50 bsp=e0000001000091b8 [<a0000001006365f0>] notifier_call_chain+0x50/0xc0 sp=e00000010000fb50 bsp=e000000100009180 [<a00000010009b910>] atomic_notifier_call_chain+0x30/0x60 sp=e00000010000fb50 bsp=e000000100009150 [<a000000100048a20>] ia64_init_handler+0x940/0xa40 sp=e00000010000fb50 bsp=e0000001000090d0 [<a000000100049500>] ia64_os_init_virtual_begin+0x40/0x140 sp=e00000010000fb80 bsp=e0000001000090d0 <0>Kernel panic - not syncing: Fatal exception Version-Release number of selected component (if applicable): kernel-2.6.18-92.el5 kexec-tools-1.102pre-21.el5 How reproducible: always Steps to Reproduce: 1. configure Kdump with 512M@256M 2. SysRq-C Additional info: - full serial log, http://rhts.redhat.com/testlogs/26065/95903/810686/serial.log - system information, http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3766900 - BIOS information, * ROM Version : 03.01 * ROM Date : 09/26/2007 * BMC Version : 05.20
I will take a look at this. I have tested kdump on several other HP ia64 systems w/cciss and have not seen any issues so if it is broken here it might be due to the specific model of cciss card or possibly just out of date cciss firmware which I can update. We also have an updated version of the cciss driver posted but not yet pulled into RHEL5.3. I will try that also.
OK, wasn't as simple as I thought. I tried all the above with no luck. reassigning to myself and I will investigate.
The crash message seems to indicate "kexec -e" doesn't work... Could you verify if "kexec -e" works on this box with cciss module loaded? Thanks, Luming
This is broken hardware. I tested an identical system back at HP with identical firmware revs and it works properly. Also I have tested kdump on every other HP ia64 system I can get my hands on and this is the only place it fails. I will work on getting the hardware fixed next week.