Bug 441662
Summary: | [5.2][kdump] capture kernel failed to start at crash point BUG in INT_HARDWARE_ENTRY on RX2600 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Qian Cai <qcai> | ||||||||
Component: | kexec-tools | Assignee: | Neil Horman <nhorman> | ||||||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||||||
Severity: | low | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 5.2 | ||||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | ia64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-04-15 11:08:51 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Qian Cai
2008-04-09 12:07:43 UTC
Created attachment 301793 [details]
sosreport
you know, I just realized that the lktdm tests use jprobes. I've not done any testing in conjunction with k/jprobes. I'm not sure what effect they have on kdump (obviously its nominally not a problem if other systems are unaffected). Still, Do the lkdtm tests have a variant in which k/jprobes are unused, so we have something to compare against? All LKDTM test cases use jprobes. I have seen one of test cases worked for this machine, http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2511871 Unable to handle kernel NULL pointer dereference (address 0000000000000000) swapper[0]: Oops 8804682956800 [1] lkdtm : Crash point INT_TASKLET_ENTRY of type EXCEPTION hit Modules linked in: lkdtm(U) ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth sunrpc vfat fat dm_multipath button parport_pc lp parport sg tg3 e100 mii ide_cd cdrom dm_snapshot dm_zero dm_mirror dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, CPU 1, comm: swapper psr : 0000101008026018 ifs : 8000000000000207 ip : [<a000000200c14850>] Tainted: G ip is at lkdtm_handler+0x1d0/0x2a0 [lkdtm] unat: 0000000000000000 pfs : 0000000000000207 rsc : 0000000000000003 rnat: 80000000ff555665 bsps: a000000100165490 pr : 80000000ff556565 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000200c14840 b6 : a000000200c14820 b7 : a000000100011820 f6 : 1003e00000000000000a0 f7 : 1003e20c49ba5e353f7cf f8 : 1003e00000000000004e2 f9 : 1003e000000000fa00000 f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db r1 : a000000200c253b8 r2 : a0000001009f80d8 r3 : a0000001009e1530 r8 : 0000000000000015 r9 : a0000001009f8108 r10 : a0000001009f8108 r11 : 0000000000000000 r12 : e000004065747bc0 r13 : e000004065740000 r14 : a0000001009f80d8 r15 : 0000000000000000 r16 : 0000000000000012 r17 : a000000100ca98e8 r18 : 000000000000000a r19 : a0000001009f69a0 r20 : a000000100835280 r21 : a0000001009e08a8 r22 : a0000001009f80e0 r23 : a0000001009f80e0 r24 : a000000100928fe0 r25 : 0000000000000000 r26 : a0000001009e0a10 r27 : 0000000000000000 r28 : 0000000000000036 r29 : 0000000000000036 r30 : 0000000000000000 r31 : a0000001009f8464 Call Trace: [<a000000100013ae0>] show_stack+0x40/0xa0 sp=e000004065747750 bsp=e000004065741408 [<a0000001000143e0>] show_regs+0x840/0x880 sp=e000004065747920 bsp=e0000040657413a8 [<a000000100037bc0>] die+0x1c0/0x2c0 sp=e000004065747920 bsp=e000004065741360 [<a0000001006360c0>] ia64_do_page_fault+0x8e0/0xa20 sp=e000004065747940 bsp=e000004065741310 [<a00000010000c020>] __ia64_leave_kernel+0x0/0x280 sp=e0000040657479f0 bsp=e000004065741310 [<a000000200c14850>] lkdtm_handler+0x1d0/0x2a0 [lkdtm] sp=e000004065747bc0 bsp=e0000040657412d8 [<a000000200c14a00>] jp_tasklet_action+0x20/0x40 [lkdtm] sp=e000004065747bc0 bsp=e0000040657412c0 [<a000000100635740>] jprobe_inst_return+0x0/0x20 sp=e000004065747bc0 bsp=e000004065741248 [<a000000200c14840>] lkdtm_handler+0x1c0/0x2a0 [lkdtm] sp=e000004065747bc0 bsp=e0000040657411c8 <0>Kernel panic - not syncing: Fatal exception Linux version 2.6.18-87.el5 (brewbuilder.redhat.com) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Mar 25 17:30:15 EDT 2008 Ignoring memory below 128MB Ignoring memory above 640MB EFI v1.10 by HP: SALsystab=0x3fb38000 ACPI 2.0=0x3fb2e000 SMBIOS=0x3fb3a000 HCDP=0x3fb2c000 booting generic kernel on platform dig PCDP: v0 at 0x3fb2c000 Early serial console at MMIO 0xf8050000 (options '9600n8') rsvd_region[0]: [0xe000000008000000, 0xe000000008db2170) rsvd_region[1]: [0xe000000008dc0000, 0xe000000008dc0048) rsvd_region[2]: [0xe000000027b4c000, 0xe000000027fbd2d6) rsvd_region[3]: [0xe000000027fc4000, 0xe000000027fc40af) rsvd_region[4]: [0xe000000027fcc000, 0xe000000027fccbd0) rsvd_region[5]: [0xe000000027fd4000, 0xe000000027fd4050) rsvd_region[6]: [0xffffffffffffffff, 0xffffffffffffffff) ... In addition, other kdump test cases using crasher module or SysRq-C work fine just as most of ia64 machines. http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2514409 cai, could you please try the patch from bz 441657, comment #3 here, I'd like to see if we're seeing a variation of the same thing. Thanks! After applied the patch, it worked most of time, but 2 of 8 attempts still hung in capture kernel. Created attachment 302205 [details]
capture kernel hung
Created attachment 302206 [details]
success to capture vmcore
Ok, I'm going to say then, that given the results that you have here indicate we're chasing the same bug as in bz 441657. Do you concur? Shall we just track this over on that bug? Although this one seems made into the second kernel, while the other seems not, they do look similar. I agree that we could just track BZ 441657, and then I would retest and confirm it here if we have a fix there. copy that. thx. *** This bug has been marked as a duplicate of 441657 *** |