Bug 441662 - [5.2][kdump] capture kernel failed to start at crash point BUG in INT_HARDWARE_ENTRY on RX2600
[5.2][kdump] capture kernel failed to start at crash point BUG in INT_HARDWAR...
Status: CLOSED DUPLICATE of bug 441657
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kexec-tools (Show other bugs)
5.2
ia64 Linux
low Severity low
: rc
: ---
Assigned To: Neil Horman
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-09 08:07 EDT by CAI Qian
Modified: 2008-04-15 07:08 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-04-15 07:08:51 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
sosreport (2.15 MB, application/octet-stream)
2008-04-09 08:19 EDT, CAI Qian
no flags Details
capture kernel hung (4.48 KB, text/plain)
2008-04-12 07:16 EDT, Cai Qian
no flags Details
success to capture vmcore (18.34 KB, text/plain)
2008-04-12 07:17 EDT, Cai Qian
no flags Details

  None (edit)
Description CAI Qian 2008-04-09 08:07:43 EDT
Description of problem:
Capture kernel failed to start when kernel panic at,

hp-lp1.rhts.boston.redhat.com login: kernel BUG at /tmp/kdump/lib/lkdtm/lkdtm.c:260!
insmod[2503]: bugcheck! 0 [1]
lkdtm : Crash point INT_HARDWARE_ENTRY of type BUG hit
Modules linked in: lkdtm(U) ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap
bluetooth sunrpc vfat fat dm_multipath button parport_pc lp parport e100 sg mii
ide_cd cdrom tg3 dm_snapshot dm_zero dm_mirror dm_mod mptspi mptscsih mptbase
scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 2503, CPU 0, comm:               insmod
psr : 0000101008022018 ifs : 8000000000000207 ip  : [<a000000200bc0310>]   
Tainted: G     
ip is at lkdtm_handler+0x190/0x2a0 [lkdtm]
unat: 0000000000000000 pfs : 0000000000000207 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 96a629a665a56565
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000200bc0310 b6  : a000000100037880 b7  : a00000010024a940
f6  : 1003e0000003e4e6888b7 f7  : 1003e0000000000000384
f8  : 1003e0000003e4e688533 f9  : 1003e0000000000000001
f10 : 0fffcccccccccc8c00000 f11 : 1003e0000000000000000
r1  : a000000100be0270 r2  : a0000001009f80e8 r3  : a0000001009e1530
r8  : 0000000000000033 r9  : a0000001009f8118 r10 : a0000001009f8118
r11 : 0000000000000000 r12 : e00000407389fc40 r13 : e000004073898000
r14 : a0000001009f80e8 r15 : 0000000000000000 r16 : 0000000000000001
r17 : c0000000f8050001 r18 : 000000000000000d r19 : c0000000f8050000
r20 : a000000100835280 r21 : a0000001009e08a8 r22 : a0000001009f80f0
r23 : a0000001009f80f0 r24 : a000000100928fe0 r25 : a000000100928fe0
r26 : a0000001009e0a10 r27 : 0000000000000000 r28 : 0000000000000034
r29 : 0000000000000034 r30 : 0000000000000000 r31 : a0000001009f8474

Call Trace:
 [<a000000100013ae0>] show_stack+0x40/0xa0
                                sp=e00000407389f7d0 bsp=e0000040738993b8
 [<a0000001000143e0>] show_regs+0x840/0x880
                                sp=e00000407389f9a0 bsp=e000004073899360
 [<a000000100037bc0>] die+0x1c0/0x2c0
                                sp=e00000407389f9a0 bsp=e000004073899318
 [<a000000100037d10>] die_if_kernel+0x50/0x80
                                sp=e00000407389f9c0 bsp=e0000040738992e8
 [<a0000001006333d0>] ia64_bad_break+0x270/0x4a0
                                sp=e00000407389f9c0 bsp=e0000040738992c0
 [<a00000010000c020>] __ia64_leave_kernel+0x0/0x280
                                sp=e00000407389fa70 bsp=e0000040738992c0
 [<a000000200bc0310>] lkdtm_handler+0x190/0x2a0 [lkdtm]
                                sp=e00000407389fc40 bsp=e000004073899288
 [<a000000200bc0480>] jp_do_irq+0x20/0x40 [lkdtm]
                                sp=e00000407389fc40 bsp=e000004073899270
 [<a0000001006358a0>] jprobe_inst_return+0x0/0x20
                                sp=e00000407389fc40 bsp=e000004073899240
 [<a000000200bc0310>] lkdtm_handler+0x190/0x2a0 [lkdtm]
                                sp=e00000407389fc40 bsp=e000004073899210
 [<e0000040666c9000>] 0xe0000040666c9000
                                sp=e00000407389fc40 bsp=e000004073899128
 <0>Kernel panic - not syncing: Fatal exception
 BUG: warning at kernel/panic.c:137/panic() (Tainted: G     )

Call Trace:
 [<a000000100013ae0>] show_stack+0x40/0xa0
                                sp=e00000407389f780 bsp=e0000040738993e0
 [<a000000100013b70>] dump_stack+0x30/0x60
                                sp=e00000407389f950 bsp=e0000040738993c8
 [<a000000100079980>] panic+0x420/0x440
                                sp=e00000407389f950 bsp=e000004073899360
 [<a000000100037c90>] die+0x290/0x2c0
                                sp=e00000407389f9a0 bsp=e000004073899318
 [<a000000100037d10>] die_if_kernel+0x50/0x80
                                sp=e00000407389f9c0 bsp=e0000040738992e8
 [<a0000001006333d0>] ia64_bad_break+0x270/0x4a0
                                sp=e00000407389f9c0 bsp=e0000040738992c0
 [<a00000010000c020>] __ia64_leave_kernel+0x0/0x280
                                sp=e00000407389fa70 bsp=e0000040738992c0
 [<a000000200bc0310>] lkdtm_handler+0x190/0x2a0 [lkdtm]
                                sp=e00000407389fc40 bsp=e000004073899288
 [<a000000200bc0480>] jp_do_irq+0x20/0x40 [lkdtm]
                                sp=e00000407389fc40 bsp=e000004073899270
 [<a0000001006358a0>] jprobe_inst_return+0x0/0x20
                                sp=e00000407389fc40 bsp=e000004073899240
 [<a000000200bc0310>] lkdtm_handler+0x190/0x2a0 [lkdtm]
                                sp=e00000407389fc40 bsp=e000004073899210
 [<e0000040666c9000>] 0xe0000040666c9000
                                sp=e00000407389fc40 bsp=e000004073899128


Other IA64 boxes I tested are not affected.

Version-Release number of selected component (if applicable):
RHEL5.2-Server-20080402.0
kernel-2.6.18-88.el5
kexec-tools-1.102pre-20.el5

How reproducible:
Always on hp-lp1.rhts.boston.redhat.com

Steps to Reproduce:
-
Configured kdump and booted the kernel with "crashkernel=512M@256M".
-
wget
http://porkchop.devel.redhat.com/qa/rhts/lookaside/ltp-kdump-20080228.tar.gz;
tar zxvf ltp-kdump-20080228.tar.gz; cd kdump; export USE_SYMBOL_NAME=1; make
-
insmod lkdtm.ko cpoint_name=INT_HARDWARE_ENTRY cpoint_type=BUG cpoint_count=05
-
Comment 1 CAI Qian 2008-04-09 08:19:38 EDT
Created attachment 301793 [details]
sosreport
Comment 2 Neil Horman 2008-04-09 09:28:21 EDT
you know, I just realized that the lktdm tests use jprobes.  I've not done any
testing in conjunction with k/jprobes.  I'm not sure what effect they have on
kdump (obviously its nominally not a problem if other systems are unaffected). 
Still, Do the lkdtm tests have a variant in which k/jprobes are unused, so we
have something to compare against?
Comment 3 CAI Qian 2008-04-09 21:49:14 EDT
All LKDTM test cases use jprobes. I have seen one of test cases worked for this
machine,

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2511871

Unable to handle kernel NULL pointer dereference (address 0000000000000000)
swapper[0]: Oops 8804682956800 [1]
lkdtm : Crash point INT_TASKLET_ENTRY of type EXCEPTION hit
Modules linked in: lkdtm(U) ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap
bluetooth sunrpc vfat fat dm_multipath button parport_pc lp parport sg tg3 e100
mii ide_cd cdrom dm_snapshot dm_zero dm_mirror dm_mod mptspi mptscsih mptbase
scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 0, CPU 1, comm:              swapper
psr : 0000101008026018 ifs : 8000000000000207 ip  : [<a000000200c14850>]   
Tainted: G     
ip is at lkdtm_handler+0x1d0/0x2a0 [lkdtm]
unat: 0000000000000000 pfs : 0000000000000207 rsc : 0000000000000003
rnat: 80000000ff555665 bsps: a000000100165490 pr  : 80000000ff556565
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000200c14840 b6  : a000000200c14820 b7  : a000000100011820
f6  : 1003e00000000000000a0 f7  : 1003e20c49ba5e353f7cf
f8  : 1003e00000000000004e2 f9  : 1003e000000000fa00000
f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db
r1  : a000000200c253b8 r2  : a0000001009f80d8 r3  : a0000001009e1530
r8  : 0000000000000015 r9  : a0000001009f8108 r10 : a0000001009f8108
r11 : 0000000000000000 r12 : e000004065747bc0 r13 : e000004065740000
r14 : a0000001009f80d8 r15 : 0000000000000000 r16 : 0000000000000012
r17 : a000000100ca98e8 r18 : 000000000000000a r19 : a0000001009f69a0
r20 : a000000100835280 r21 : a0000001009e08a8 r22 : a0000001009f80e0
r23 : a0000001009f80e0 r24 : a000000100928fe0 r25 : 0000000000000000
r26 : a0000001009e0a10 r27 : 0000000000000000 r28 : 0000000000000036
r29 : 0000000000000036 r30 : 0000000000000000 r31 : a0000001009f8464

Call Trace:
 [<a000000100013ae0>] show_stack+0x40/0xa0
                                sp=e000004065747750 bsp=e000004065741408
 [<a0000001000143e0>] show_regs+0x840/0x880
                                sp=e000004065747920 bsp=e0000040657413a8
 [<a000000100037bc0>] die+0x1c0/0x2c0
                                sp=e000004065747920 bsp=e000004065741360
 [<a0000001006360c0>] ia64_do_page_fault+0x8e0/0xa20
                                sp=e000004065747940 bsp=e000004065741310
 [<a00000010000c020>] __ia64_leave_kernel+0x0/0x280
                                sp=e0000040657479f0 bsp=e000004065741310
 [<a000000200c14850>] lkdtm_handler+0x1d0/0x2a0 [lkdtm]
                                sp=e000004065747bc0 bsp=e0000040657412d8
 [<a000000200c14a00>] jp_tasklet_action+0x20/0x40 [lkdtm]
                                sp=e000004065747bc0 bsp=e0000040657412c0
 [<a000000100635740>] jprobe_inst_return+0x0/0x20
                                sp=e000004065747bc0 bsp=e000004065741248
 [<a000000200c14840>] lkdtm_handler+0x1c0/0x2a0 [lkdtm]
                                sp=e000004065747bc0 bsp=e0000040657411c8
 <0>Kernel panic - not syncing: Fatal exception
 Linux version 2.6.18-87.el5 (brewbuilder@ia64-1.build.redhat.com) (gcc version
4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Mar 25 17:30:15 EDT 2008
Ignoring memory below 128MB
Ignoring memory above 640MB
EFI v1.10 by HP: SALsystab=0x3fb38000 ACPI 2.0=0x3fb2e000 SMBIOS=0x3fb3a000
HCDP=0x3fb2c000
booting generic kernel on platform dig
PCDP: v0 at 0x3fb2c000
Early serial console at MMIO 0xf8050000 (options '9600n8')
rsvd_region[0]: [0xe000000008000000, 0xe000000008db2170)
rsvd_region[1]: [0xe000000008dc0000, 0xe000000008dc0048)
rsvd_region[2]: [0xe000000027b4c000, 0xe000000027fbd2d6)
rsvd_region[3]: [0xe000000027fc4000, 0xe000000027fc40af)
rsvd_region[4]: [0xe000000027fcc000, 0xe000000027fccbd0)
rsvd_region[5]: [0xe000000027fd4000, 0xe000000027fd4050)
rsvd_region[6]: [0xffffffffffffffff, 0xffffffffffffffff)

...

In addition, other kdump test cases using crasher module or SysRq-C work fine
just as most of ia64 machines.
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2514409
Comment 4 Neil Horman 2008-04-10 08:39:37 EDT
cai, could you please try the patch from bz 441657, comment #3 here, I'd like to
see if we're seeing a variation of the same thing.  Thanks!
Comment 5 Cai Qian 2008-04-12 07:15:29 EDT
After applied the patch, it worked most of time, but 2 of 8 attempts still hung
in capture kernel.
Comment 6 Cai Qian 2008-04-12 07:16:20 EDT
Created attachment 302205 [details]
capture kernel hung
Comment 7 Cai Qian 2008-04-12 07:17:40 EDT
Created attachment 302206 [details]
success to capture vmcore
Comment 8 Neil Horman 2008-04-14 13:28:17 EDT
Ok, I'm going to say then, that given the results that you have here indicate
we're chasing the same bug as in bz 441657.  Do you concur?  Shall we just track
this over on that bug?
Comment 9 CAI Qian 2008-04-14 23:54:54 EDT
Although this one seems made into the second kernel, while the other seems not,
they do look similar. I agree that we could just track BZ 441657, and then I
would retest and confirm it here if we have a fix there.
Comment 10 Neil Horman 2008-04-15 07:08:51 EDT
copy that.  thx.

*** This bug has been marked as a duplicate of 441657 ***

Note You need to log in before you can comment on or make changes to this bug.