Bug 440399 - [5.2][kdump] capture kernel reset for IBM eServer x3455
Summary: [5.2][kdump] capture kernel reset for IBM eServer x3455
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Ed Pollard
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-03 11:56 UTC by Qian Cai
Modified: 2013-08-06 00:04 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-10-22 10:34:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport (2.21 MB, application/octet-stream)
2008-04-03 11:56 UTC, Qian Cai
no flags Details
Full serial console log (41.01 KB, text/plain)
2008-04-03 11:59 UTC, Qian Cai
no flags Details

Description Qian Cai 2008-04-03 11:56:07 UTC
Description of problem:
Capture kernel failed to capture a vmcore for IBM eServer x3455. It reseted to
BIOS after the following messages,

...
powernow-k8: Pre-initialization of ACPI failed
powernow-k8: Found 1 Dual-Core AMD Opteron(tm) Processor 2220 SE processors (1
cpu cores) (version 2.20.00)
powernow-k8: BIOS error - no PSB or ACPI _PSS objects
ACPI: (supports S0 S4 S5)
Freeing unused kernel memory: 196k freed
Write protecting the kernel read-only data: 475k
Mounting proc filesystem
Mounting sysfs filesystem
Creating /dev
Creating initial device nodes
Loading scsi_mod.ko module
SCSI subsystem initialized
Loading sd_mod.ko module
Loading libata.ko module
Loading sata_svw.ko module
ACPI: PCI Interrupt Link [LNKS] enabled at IRQ 10
ACPI: PCI Interrupt 0000:01:0e.0[A] -> Link [LNKS] -> GSI 10 (level, low) -> IRQ 10
scsi0 : sata_svw
scsi1 : sata_svw
scsi2 : sata_svw
scsi3 : sata_svw
ata1: SATA max UDMA/133 mmio m8192@0xd8100000 port 0xd8100000 irq 10
ata2: SATA max UDMA/133 mmio m8192@0xd8100000 port 0xd8100100 irq 10
ata3: SATA max UDMA/133 mmio m8192@0xd8100000 port 0xd8100200 irq 10
ata4: SATA max UDMA/133 mmio m8192@0xd8100000 port 0xd8100300 irq 10
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

I have tried "noacpi" to kdump kernel command-line, and RHEL5U1 version of
kexec-tools without luck. Note that the problem is only triggered by certain
crash scenarios. For example, LKDTM (Linux Kernel Dump Test Module)'s bug in
do_irq(). Simple "echo c >/proc/sysrq-trigger" works perfect fine without
problem. I have not tried this with i386 or RHEL5U1 kernel yet.

Version-Release number of selected component (if applicable):
RHEL5.2-Server-20080326.0
kernel-2.6.18-87.el5
kexec-tools-1.102pre-16.el5

How reproducible:
Always on ibm-pizzaro.rhts.boston.redhat.com

Steps to Reproduce:
1. configured kdump and booted the kernel with crashkernel=128M@16M.
2. wget
http://porkchop.devel.redhat.com/qa/rhts/lookaside/ltp-kdump-20080228.tar.gz; cd
kdump/lib/lkdtm; export USE_SYMBOL_NAME=1; make
3. insmod lkdtm.ko cpoint_name=INT_HARDWARE_ENTRY cpoint_type=BUG cpoint_count=05

Comment 1 Qian Cai 2008-04-03 11:56:08 UTC
Created attachment 300229 [details]
sosreport

Comment 2 Qian Cai 2008-04-03 11:59:01 UTC
Created attachment 300234 [details]
Full serial console log

Comment 3 Qian Cai 2008-04-03 12:00:34 UTC
Neil suggested the following patch might be helpful here.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4bfaaef01a1badb9e8ffb0c0a37cd2379008d21f

Comment 4 Qian Cai 2008-04-03 15:15:26 UTC
Same thing happened for i386 as well.

Comment 5 Qian Cai 2008-07-16 10:09:34 UTC
I have tried the following options to Kdump Kernel options, and it does not help.

"hda=noprobe hdb=noprobe hdc=noprobe hdd=noprobe"
"ide0=noprobe ide1=noprobe ide2=noprobe ide3=noprobe"

So, I suppose I'll wait this machine gets BIOS updated first.

Comment 6 Qian Cai 2008-10-22 10:34:02 UTC
I'll close this out, as using jprobe() to trigger artificial crashes probably not a good way to test Kdump. I'll create a new Kernel module to test those scenarios and open new BZs for any issue found.


Note You need to log in before you can comment on or make changes to this bug.