Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 448572

Summary: kdump not working on x3455
Product: Red Hat Enterprise MRG Reporter: IBM Bug Proxy <bugproxy>
Component: realtime-kernelAssignee: Red Hat Real Time Maintenance <rt-maint>
Status: CLOSED NOTABUG QA Contact:
Severity: urgent Docs Contact:
Priority: low    
Version: betaCC: bhu, williams
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-23 16:26:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Boot log of kdump kernel on x3455 machine with vanilla 2.6.24.7 kernel none

Description IBM Bug Proxy 2008-05-27 17:40:38 UTC
=Comment: #0=================================================
Chirag H. Jog1 <chirag.jog.com> - 2008-05-27 06:45 EDT
Problem description:
Triggering kdump fails to provide a vmcore(dump).
Manual trigger ( echo c > /proc/sysrq-trigger ), makes the kdump kernel to boot
and then boots back to original kernel. But the dump is not generated.

[root@rt-ash ~]# uname -a
Linux rt-ash.austin.ibm.com 2.6.24.7-57ibmrt2.3 #1 SMP PREEMPT RT Wed May 21
19:51:04 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Machine is a x3455 . This is producible on both rt-alder and rt-ash.
=Comment: #1=================================================
Ankita Garg <ankigarg.com> - 2008-05-27 07:42 EDT
Took at a look at rt-alder .The system-level configuration bits look alright. I
changed the setting in /etc/kdump.conf to by default drop into a shell on kdump
kernel and not reboot. By doing this, atleast I could manually save the dump.
From kernel perspective, I am still thinking the issue points to user-space
setting. Got to look deeper.

To drop into a shell, in /etc/kdump.conf, uncomment 'default shell'.
=Comment: #2=================================================
Ankita Garg <ankigarg.com> - 2008-05-27 07:47 EDT
Maybe we should try once with the latest kexec-tools package ?
=Comment: #4=================================================
Chirag H. Jog1 <chirag.jog.com> - 2008-05-27 11:44 EDT
Newer version of kexec-tools from the RHEL5.2 repo doesn't solve the problem.

Comment 1 IBM Bug Proxy 2008-05-28 05:24:31 UTC
------- Comment From chirag.jog.com 2008-05-28 01:22 EDT-------
x3455 is a 4 way AMD Opteron x86_64 rack mounted machine.

Comment 2 IBM Bug Proxy 2008-07-04 07:56:32 UTC
------- Comment From sripathi.com 2008-07-04 03:55 EDT-------
Did the following:
Tested on rt-alder and confirmed that the problem continues to exist
Tested with acpi=noirq and found that it does not solve the problem
Currently suspecting a problem with ata/sata drivers
Confirmed that the problem doesn't exist with RHEL5.2 kernels
Going to try vanilla 2.6.24 based kernels.

Comment 3 IBM Bug Proxy 2008-07-04 09:00:34 UTC
------- Comment From sripathi.com 2008-07-04 04:58 EDT-------
I tried the vanilla 2.6.24.7 kernel and that too showed the same problem!

I booted with acpi=noirq. It shows some oops messages as soon as it begins
booting up. Later on, ata driver shows some error messages and the system never
boots up. I will attach the entire log to this bug.

Comment 4 IBM Bug Proxy 2008-07-04 09:00:37 UTC
Created attachment 311020 [details]
Boot log of kdump kernel on x3455 machine with vanilla 2.6.24.7 kernel

Comment 5 IBM Bug Proxy 2008-07-04 09:08:31 UTC
------- Comment From sripathi.com 2008-07-04 05:06 EDT-------
lspci output
============
00:01.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge
00:02.0 Host bridge: Broadcom BCM5785 [HT1000] Legacy South Bridge
00:02.1 IDE interface: Broadcom BCM5785 [HT1000] IDE
00:02.2 ISA bridge: Broadcom BCM5785 [HT1000] LPC
00:03.0 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01)
00:03.1 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01)
00:03.2 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01)
00:05.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
00:06.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:07.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:08.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:09.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:0a.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
01:0d.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge (rev c0)
01:0e.0 RAID bus controller: Broadcom BCM5785 [HT1000] SATA (Native SATA Mode)
02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit
Ethernet (rev 10)
02:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit
Ethernet (rev 10)

/proc/interrupts
================
CPU0       CPU1       CPU2       CPU3
0:        138          0          0          1   IO-APIC-edge      timer
1:          0          0          0          2   IO-APIC-edge      i8042
4:          0          0          4        297   IO-APIC-edge      serial
8:          0          0          0          0   IO-APIC-edge      rtc0
9:          0          0          0          0   IO-APIC-fasteoi   acpi
10:          0          0         14         66   IO-APIC-fasteoi
ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3
11:          0          0          1       5791   IO-APIC-fasteoi   sata_svw
12:          0          0          1          3   IO-APIC-edge      i8042
14:          0          0          0        297   IO-APIC-edge      libata
15:          0          0          0          0   IO-APIC-edge      libata
18:          0          0          0        661   IO-APIC-fasteoi   eth0
NMI:          0          0          0          0   Non-maskable interrupts
LOC:      11945      13234       8801      15151   Local timer interrupts
RES:       2362       1124        987        917   Rescheduling interrupts
CAL:        234        252        227        113   function call interrupts
TLB:        225        176        150        160   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
SPU:          0          0          0          0   Spurious interrupts
ERR:          0

Comment 6 IBM Bug Proxy 2008-07-04 09:40:30 UTC
------- Comment From sripathi.com 2008-07-04 05:38 EDT-------
I tried latest stable vanilla kernel: 2.6.25.10. It too has the same problem
with ata drivers, kdump kernel doesn't boot up. The oops messages seen during
early bootup are gone, though.

So this is a problem that exists in vanilla kernel even today. I think kdump had
worked during 2.6.21 days for us, so this problem was introduced some time after
that.

Comment 7 IBM Bug Proxy 2008-07-04 10:48:42 UTC
------- Comment From ankigarg.com 2008-07-04 06:43 EDT-------
(In reply to comment #15)
> I tried latest stable vanilla kernel: 2.6.25.10. It too has the same problem
> with ata drivers, kdump kernel doesn't boot up. The oops messages seen during
> early bootup are gone, though.
>
> So this is a problem that exists in vanilla kernel even today. I think kdump had
> worked during 2.6.21 days for us, so this problem was introduced some time after
> that.

This issue seems to be different from the one originally reported with R2/MRG
kernels. The kdump kernel used to boot fine but for some reason vmcore was not
saved and the system rebooted. So, maybe we could try with 2.6.24-rt1 kernel as
the kdump kernel and check if the behavior is the same or not ?

Comment 8 IBM Bug Proxy 2008-07-04 11:16:30 UTC
------- Comment From sripathi.com 2008-07-04 07:09 EDT-------
(In reply to comment #16)
> This issue seems to be different from the one originally reported with R2/MRG
> kernels. The kdump kernel used to boot fine but for some reason vmcore was not
> saved and the system rebooted. So, maybe we could try with 2.6.24-rt1 kernel as
> the kdump kernel and check if the behavior is the same or not ?

That happens until you pass "acpi=noirq". When you pass this option, you see
that the kdump kernel doesn't boot properly. Having to pass "acpi=noirq" is a
workaround, not a fix, but it is not as serious as being unable to boot kdump
kernel.

Comment 9 IBM Bug Proxy 2008-07-08 09:48:42 UTC
------- Comment From chirag.jog.com 2008-07-08 05:47 EDT-------
Using RHEL stock kernel as the kdump kernel works fine.

Comment 10 IBM Bug Proxy 2008-07-08 11:40:36 UTC
------- Comment From chirag.jog.com 2008-07-08 07:36 EDT-------
(In reply to comment #15)
> I tried latest stable vanilla kernel: 2.6.25.10. It too has the same problem
> with ata drivers, kdump kernel doesn't boot up. The oops messages seen during
> early bootup are gone, though.
>
2.6.25.10 works fine on x3455 (rt-ash). Probably a bad config

Comment 11 IBM Bug Proxy 2008-07-15 21:40:36 UTC
------- Comment From dvhltc.com 2008-07-15 17:32 EDT-------
Have we identified a solution here?  I have been seeing the same thing on an
LS21 using the MRG -65 kernel as the boot kernel and the RHEL5.2 stock kernel as
the kdump kernel.  Namely, when I issue the echo c > /proc/sysrq-trigger command
the ssh session stops, and I see nothing on the serial console until the machine
reboots into the MRG kernel again.

Comment 12 IBM Bug Proxy 2008-07-15 22:00:31 UTC
------- Comment From dvhltc.com 2008-07-15 17:53 EDT-------
(In reply to comment #20)
> Have we identified a solution here?  I have been seeing the same thing on an
> LS21 using the MRG -65 kernel as the boot kernel and the RHEL5.2 stock kernel as
> the kdump kernel.  Namely, when I issue the echo c > /proc/sysrq-trigger command
> the ssh session stops, and I see nothing on the serial console until the machine
> reboots into the MRG kernel again.

Hrm... nevermind, I thnk I must have mucked something up along the way.  I am
now seeing the cores in /var/crash... strange.

Comment 13 IBM Bug Proxy 2008-07-31 04:01:19 UTC
------- Comment From ankigarg.com 2008-07-30 23:58 EDT-------
Sent mail to the kexec list regarding this issue.

http://lists.infradead.org/pipermail/kexec/2008-July/002264.html

Comment 14 IBM Bug Proxy 2008-08-22 09:01:44 UTC
We have decided to go with the workaround of using RHEL kernel as kdump kernel
for the current release. Hence I am rejecting this bug as ALT_SOLUTION_AVAIL. We
are going to work on the problem of using real-time kernel as kdump kernel, but
we will raise a new bug for that.

Comment 15 Clark Williams 2009-09-23 16:26:13 UTC
closing