Bug 448572 - kdump not working on x3455
kdump not working on x3455
Status: CLOSED NOTABUG
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
beta
x86_64 All
low Severity urgent
: ---
: ---
Assigned To: Red Hat Real Time Maintenance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-27 13:40 EDT by IBM Bug Proxy
Modified: 2009-09-23 12:26 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-23 12:26:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Boot log of kdump kernel on x3455 machine with vanilla 2.6.24.7 kernel (21.61 KB, text/plain)
2008-07-04 05:00 EDT, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 45105 None None None Never

  None (edit)
Description IBM Bug Proxy 2008-05-27 13:40:38 EDT
=Comment: #0=================================================
Chirag H. Jog1 <chirag.jog@in.ibm.com> - 2008-05-27 06:45 EDT
Problem description:
Triggering kdump fails to provide a vmcore(dump).
Manual trigger ( echo c > /proc/sysrq-trigger ), makes the kdump kernel to boot
and then boots back to original kernel. But the dump is not generated.

[root@rt-ash ~]# uname -a
Linux rt-ash.austin.ibm.com 2.6.24.7-57ibmrt2.3 #1 SMP PREEMPT RT Wed May 21
19:51:04 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Machine is a x3455 . This is producible on both rt-alder and rt-ash.
=Comment: #1=================================================
Ankita Garg <ankigarg@in.ibm.com> - 2008-05-27 07:42 EDT
Took at a look at rt-alder .The system-level configuration bits look alright. I
changed the setting in /etc/kdump.conf to by default drop into a shell on kdump
kernel and not reboot. By doing this, atleast I could manually save the dump.
From kernel perspective, I am still thinking the issue points to user-space
setting. Got to look deeper.

To drop into a shell, in /etc/kdump.conf, uncomment 'default shell'.
=Comment: #2=================================================
Ankita Garg <ankigarg@in.ibm.com> - 2008-05-27 07:47 EDT
Maybe we should try once with the latest kexec-tools package ?
=Comment: #4=================================================
Chirag H. Jog1 <chirag.jog@in.ibm.com> - 2008-05-27 11:44 EDT
Newer version of kexec-tools from the RHEL5.2 repo doesn't solve the problem.
Comment 1 IBM Bug Proxy 2008-05-28 01:24:31 EDT
------- Comment From chirag.jog@in.ibm.com 2008-05-28 01:22 EDT-------
x3455 is a 4 way AMD Opteron x86_64 rack mounted machine.
Comment 2 IBM Bug Proxy 2008-07-04 03:56:32 EDT
------- Comment From sripathi@in.ibm.com 2008-07-04 03:55 EDT-------
Did the following:
Tested on rt-alder and confirmed that the problem continues to exist
Tested with acpi=noirq and found that it does not solve the problem
Currently suspecting a problem with ata/sata drivers
Confirmed that the problem doesn't exist with RHEL5.2 kernels
Going to try vanilla 2.6.24 based kernels.
Comment 3 IBM Bug Proxy 2008-07-04 05:00:34 EDT
------- Comment From sripathi@in.ibm.com 2008-07-04 04:58 EDT-------
I tried the vanilla 2.6.24.7 kernel and that too showed the same problem!

I booted with acpi=noirq. It shows some oops messages as soon as it begins
booting up. Later on, ata driver shows some error messages and the system never
boots up. I will attach the entire log to this bug.
Comment 4 IBM Bug Proxy 2008-07-04 05:00:37 EDT
Created attachment 311020 [details]
Boot log of kdump kernel on x3455 machine with vanilla 2.6.24.7 kernel
Comment 5 IBM Bug Proxy 2008-07-04 05:08:31 EDT
------- Comment From sripathi@in.ibm.com 2008-07-04 05:06 EDT-------
lspci output
============
00:01.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge
00:02.0 Host bridge: Broadcom BCM5785 [HT1000] Legacy South Bridge
00:02.1 IDE interface: Broadcom BCM5785 [HT1000] IDE
00:02.2 ISA bridge: Broadcom BCM5785 [HT1000] LPC
00:03.0 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01)
00:03.1 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01)
00:03.2 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01)
00:05.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
00:06.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:07.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:08.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:09.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:0a.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
01:0d.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge (rev c0)
01:0e.0 RAID bus controller: Broadcom BCM5785 [HT1000] SATA (Native SATA Mode)
02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit
Ethernet (rev 10)
02:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit
Ethernet (rev 10)

/proc/interrupts
================
CPU0       CPU1       CPU2       CPU3
0:        138          0          0          1   IO-APIC-edge      timer
1:          0          0          0          2   IO-APIC-edge      i8042
4:          0          0          4        297   IO-APIC-edge      serial
8:          0          0          0          0   IO-APIC-edge      rtc0
9:          0          0          0          0   IO-APIC-fasteoi   acpi
10:          0          0         14         66   IO-APIC-fasteoi
ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3
11:          0          0          1       5791   IO-APIC-fasteoi   sata_svw
12:          0          0          1          3   IO-APIC-edge      i8042
14:          0          0          0        297   IO-APIC-edge      libata
15:          0          0          0          0   IO-APIC-edge      libata
18:          0          0          0        661   IO-APIC-fasteoi   eth0
NMI:          0          0          0          0   Non-maskable interrupts
LOC:      11945      13234       8801      15151   Local timer interrupts
RES:       2362       1124        987        917   Rescheduling interrupts
CAL:        234        252        227        113   function call interrupts
TLB:        225        176        150        160   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
SPU:          0          0          0          0   Spurious interrupts
ERR:          0
Comment 6 IBM Bug Proxy 2008-07-04 05:40:30 EDT
------- Comment From sripathi@in.ibm.com 2008-07-04 05:38 EDT-------
I tried latest stable vanilla kernel: 2.6.25.10. It too has the same problem
with ata drivers, kdump kernel doesn't boot up. The oops messages seen during
early bootup are gone, though.

So this is a problem that exists in vanilla kernel even today. I think kdump had
worked during 2.6.21 days for us, so this problem was introduced some time after
that.
Comment 7 IBM Bug Proxy 2008-07-04 06:48:42 EDT
------- Comment From ankigarg@in.ibm.com 2008-07-04 06:43 EDT-------
(In reply to comment #15)
> I tried latest stable vanilla kernel: 2.6.25.10. It too has the same problem
> with ata drivers, kdump kernel doesn't boot up. The oops messages seen during
> early bootup are gone, though.
>
> So this is a problem that exists in vanilla kernel even today. I think kdump had
> worked during 2.6.21 days for us, so this problem was introduced some time after
> that.

This issue seems to be different from the one originally reported with R2/MRG
kernels. The kdump kernel used to boot fine but for some reason vmcore was not
saved and the system rebooted. So, maybe we could try with 2.6.24-rt1 kernel as
the kdump kernel and check if the behavior is the same or not ?
Comment 8 IBM Bug Proxy 2008-07-04 07:16:30 EDT
------- Comment From sripathi@in.ibm.com 2008-07-04 07:09 EDT-------
(In reply to comment #16)
> This issue seems to be different from the one originally reported with R2/MRG
> kernels. The kdump kernel used to boot fine but for some reason vmcore was not
> saved and the system rebooted. So, maybe we could try with 2.6.24-rt1 kernel as
> the kdump kernel and check if the behavior is the same or not ?

That happens until you pass "acpi=noirq". When you pass this option, you see
that the kdump kernel doesn't boot properly. Having to pass "acpi=noirq" is a
workaround, not a fix, but it is not as serious as being unable to boot kdump
kernel.
Comment 9 IBM Bug Proxy 2008-07-08 05:48:42 EDT
------- Comment From chirag.jog@in.ibm.com 2008-07-08 05:47 EDT-------
Using RHEL stock kernel as the kdump kernel works fine.
Comment 10 IBM Bug Proxy 2008-07-08 07:40:36 EDT
------- Comment From chirag.jog@in.ibm.com 2008-07-08 07:36 EDT-------
(In reply to comment #15)
> I tried latest stable vanilla kernel: 2.6.25.10. It too has the same problem
> with ata drivers, kdump kernel doesn't boot up. The oops messages seen during
> early bootup are gone, though.
>
2.6.25.10 works fine on x3455 (rt-ash). Probably a bad config
Comment 11 IBM Bug Proxy 2008-07-15 17:40:36 EDT
------- Comment From dvhltc@us.ibm.com 2008-07-15 17:32 EDT-------
Have we identified a solution here?  I have been seeing the same thing on an
LS21 using the MRG -65 kernel as the boot kernel and the RHEL5.2 stock kernel as
the kdump kernel.  Namely, when I issue the echo c > /proc/sysrq-trigger command
the ssh session stops, and I see nothing on the serial console until the machine
reboots into the MRG kernel again.
Comment 12 IBM Bug Proxy 2008-07-15 18:00:31 EDT
------- Comment From dvhltc@us.ibm.com 2008-07-15 17:53 EDT-------
(In reply to comment #20)
> Have we identified a solution here?  I have been seeing the same thing on an
> LS21 using the MRG -65 kernel as the boot kernel and the RHEL5.2 stock kernel as
> the kdump kernel.  Namely, when I issue the echo c > /proc/sysrq-trigger command
> the ssh session stops, and I see nothing on the serial console until the machine
> reboots into the MRG kernel again.

Hrm... nevermind, I thnk I must have mucked something up along the way.  I am
now seeing the cores in /var/crash... strange.
Comment 13 IBM Bug Proxy 2008-07-31 00:01:19 EDT
------- Comment From ankigarg@in.ibm.com 2008-07-30 23:58 EDT-------
Sent mail to the kexec list regarding this issue.

http://lists.infradead.org/pipermail/kexec/2008-July/002264.html
Comment 14 IBM Bug Proxy 2008-08-22 05:01:44 EDT
We have decided to go with the workaround of using RHEL kernel as kdump kernel
for the current release. Hence I am rejecting this bug as ALT_SOLUTION_AVAIL. We
are going to work on the problem of using real-time kernel as kdump kernel, but
we will raise a new bug for that.
Comment 15 Clark Williams 2009-09-23 12:26:13 EDT
closing

Note You need to log in before you can comment on or make changes to this bug.