Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 440399

Summary: [5.2][kdump] capture kernel reset for IBM eServer x3455
Product: Red Hat Enterprise Linux 5 Reporter: Qian Cai <qcai>
Component: kernelAssignee: Ed Pollard <epollard>
Status: CLOSED NOTABUG QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 5.2CC: dzickus, peterm
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-22 10:34:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreport
none
Full serial console log none

Description Qian Cai 2008-04-03 11:56:07 UTC
Description of problem:
Capture kernel failed to capture a vmcore for IBM eServer x3455. It reseted to
BIOS after the following messages,

...
powernow-k8: Pre-initialization of ACPI failed
powernow-k8: Found 1 Dual-Core AMD Opteron(tm) Processor 2220 SE processors (1
cpu cores) (version 2.20.00)
powernow-k8: BIOS error - no PSB or ACPI _PSS objects
ACPI: (supports S0 S4 S5)
Freeing unused kernel memory: 196k freed
Write protecting the kernel read-only data: 475k
Mounting proc filesystem
Mounting sysfs filesystem
Creating /dev
Creating initial device nodes
Loading scsi_mod.ko module
SCSI subsystem initialized
Loading sd_mod.ko module
Loading libata.ko module
Loading sata_svw.ko module
ACPI: PCI Interrupt Link [LNKS] enabled at IRQ 10
ACPI: PCI Interrupt 0000:01:0e.0[A] -> Link [LNKS] -> GSI 10 (level, low) -> IRQ 10
scsi0 : sata_svw
scsi1 : sata_svw
scsi2 : sata_svw
scsi3 : sata_svw
ata1: SATA max UDMA/133 mmio m8192@0xd8100000 port 0xd8100000 irq 10
ata2: SATA max UDMA/133 mmio m8192@0xd8100000 port 0xd8100100 irq 10
ata3: SATA max UDMA/133 mmio m8192@0xd8100000 port 0xd8100200 irq 10
ata4: SATA max UDMA/133 mmio m8192@0xd8100000 port 0xd8100300 irq 10
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

I have tried "noacpi" to kdump kernel command-line, and RHEL5U1 version of
kexec-tools without luck. Note that the problem is only triggered by certain
crash scenarios. For example, LKDTM (Linux Kernel Dump Test Module)'s bug in
do_irq(). Simple "echo c >/proc/sysrq-trigger" works perfect fine without
problem. I have not tried this with i386 or RHEL5U1 kernel yet.

Version-Release number of selected component (if applicable):
RHEL5.2-Server-20080326.0
kernel-2.6.18-87.el5
kexec-tools-1.102pre-16.el5

How reproducible:
Always on ibm-pizzaro.rhts.boston.redhat.com

Steps to Reproduce:
1. configured kdump and booted the kernel with crashkernel=128M@16M.
2. wget
http://porkchop.devel.redhat.com/qa/rhts/lookaside/ltp-kdump-20080228.tar.gz; cd
kdump/lib/lkdtm; export USE_SYMBOL_NAME=1; make
3. insmod lkdtm.ko cpoint_name=INT_HARDWARE_ENTRY cpoint_type=BUG cpoint_count=05

Comment 1 Qian Cai 2008-04-03 11:56:08 UTC
Created attachment 300229 [details]
sosreport

Comment 2 Qian Cai 2008-04-03 11:59:01 UTC
Created attachment 300234 [details]
Full serial console log

Comment 3 Qian Cai 2008-04-03 12:00:34 UTC
Neil suggested the following patch might be helpful here.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4bfaaef01a1badb9e8ffb0c0a37cd2379008d21f

Comment 4 Qian Cai 2008-04-03 15:15:26 UTC
Same thing happened for i386 as well.

Comment 5 Qian Cai 2008-07-16 10:09:34 UTC
I have tried the following options to Kdump Kernel options, and it does not help.

"hda=noprobe hdb=noprobe hdc=noprobe hdd=noprobe"
"ide0=noprobe ide1=noprobe ide2=noprobe ide3=noprobe"

So, I suppose I'll wait this machine gets BIOS updated first.

Comment 6 Qian Cai 2008-10-22 10:34:02 UTC
I'll close this out, as using jprobe() to trigger artificial crashes probably not a good way to test Kdump. I'll create a new Kernel module to test those scenarios and open new BZs for any issue found.