Bug 454998

Summary: [kdump] not working on HP-XW9400
Product: Red Hat Enterprise Linux 5 Reporter: Qian Cai <qcai>
Component: kernelAssignee: Vivek Goyal <vgoyal>
Status: CLOSED DUPLICATE QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 5.2CC: dwa, rbinkhor, tcamuso, vgoyal
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-15 13:55:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qian Cai 2008-07-11 11:20:17 UTC
Description of problem:
On HP-XW9400 64bit box, Kdump does not work even if by adding a "noapic"
argument Kdump Kernel. Capture Kernel hangs,

hp-xw9400-01.rhts.bos.redhat.com login: SysRq : Trigger a crashdump
Linux version 2.6.18-92.el5 (brewbuilder.redhat.com) (gcc
version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Apr 29 13:16:15 EDT 2008
Command line: ro root=/dev/VolGroup00/LogVol00 console=ttyS0,115200  irqpoll
maxcpus=1 reset_devices memmap=exactmap memmap=640K@0K memmap=5116K@16384K
memmap=125300K@22140K elfcorehdr=147440K
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000100 - 000000000009d000 (usable)
 BIOS-e820: 000000000009d000 - 00000000000a0000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ffc7100 (usable)
 BIOS-e820: 000000007ffc7100 - 0000000080000000 (reserved)
 BIOS-e820: 00000000f0000000 - 00000000f8000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
user-defined physical RAM map:
 user: 0000000000000000 - 00000000000a0000 (usable)
 user: 0000000001000000 - 00000000014ff000 (usable)
 user: 000000000159f000 - 0000000008ffc000 (usable)
DMI 2.5 present.
ACPI: Unable to map XSDT header
Scanning NUMA topology in Northbridge 24
Number of nodes 2
Node 0 MemBase 0000000000000000 Limit 0000000008ffc000
Node 1 bogus settings 40000000-8ffc000.
Using node hash shift of 63
Bootmem setup node 0 0000000000000000-0000000008ffc000
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
disabling kdump
Nvidia board detected. Ignoring ACPI timer override.
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: HP       Product ID: Workstation  APIC at: 0xFEE00000
Processor #0 15:1 APIC version 16
Processor #1 15:1 APIC version 16
Processor #2 15:1 APIC version 16
Processor #3 15:1 APIC version 16
I/O APIC #8 Version 17 at 0xFEC00000.
I/O APIC #9 Version 17 at 0xEE400000.
Setting APIC routing to physical flat
Processors: 4
Nosave address range: 00000000000a0000 - 0000000001000000
Nosave address range: 00000000014ff000 - 000000000159f000
Allocating PCI resources starting at 10000000 (gap: 8ffc000:f7004000)
SMP: Allowing 8 CPUs, 4 hotplug CPUs
Built 1 zonelists.  Total pages: 32251
Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=ttyS0,115200 
irqpoll maxcpus=1 reset_devices memmap=exactmap memmap=640K@0K
memmap=5116K@16384K memmap=125300K@22140K elfcorehdr=147440K
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
Initializing CPU#0
PID hash table entries: 512 (order: 9, 4096 bytes)
irq 58, desc: ffffffff803b9d80, depth: 1, count: 0, unhandled: 0
->handle_irq():  ffffffff800b723d, handle_bad_irq+0x0/0x1f6
->chip(): ffffffff802f1b80, 0xffffffff802f1b80
->action(): 0000000000000000
  IRQ_DISABLED set
unexpected IRQ trap at vector 3a
Console: colour VGA+ 80x25
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
Checking aperture...
CPU 0: aperture @ 6fe8000000 size 32 MB
Aperture too small (32 MB)
No AGP bridge found
Memory: 118620k/147440k available (2457k kernel code, 12436k reserved, 1246k
data, 196k init)


Version-Release number of selected component (if applicable):
kernel-2.6.18-92.el5
kexec-tools-1.102pre-21.el5

How reproducible:
Always

Steps to Reproduce:
1. configure Kdump with 128M@16M.
2. SysRq-C

Additional info:
# dmidecode 2.7
SMBIOS 2.5 present.
105 structures occupying 2954 bytes.
Table at 0x000EC750.

Handle 0x0001, DMI type 0, 24 bytes.
BIOS Information
	Vendor: Hewlett-Packard
	Version: 786D6 v02.02
	Release Date: 11/03/2006
	Address: 0xE0000
	Runtime Size: 128 kB
	ROM Size: 1024 kB
	Characteristics:
		PCI is supported
		PNP is supported
		BIOS is upgradeable
		BIOS shadowing is allowed
		Boot from CD is supported
		Selectable boot is supported
		EDD is supported
		Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
		5.25"/360 KB floppy services are supported (int 13h)
		5.25"/1.2 MB floppy services are supported (int 13h)
		3.5"/720 KB floppy services are supported (int 13h)
		Print screen service is supported (int 5h)
		8042 keyboard services are supported (int 9h)
		Serial services are supported (int 14h)
		Printer services are supported (int 17h)
		ACPI is supported
		USB legacy is supported
		LS-120 boot is supported
		ATAPI Zip drive boot is supported
		BIOS boot specification is supported
		Function key-initiated network boot is supported
		Targeted content distribution is supported
	BIOS Revision: 2.2

Comment 1 Prarit Bhargava 2008-07-28 14:39:15 UTC
I've got a xw9400 in my cube ATM and I'm going to test.  I do know that this was
working at one point because ;) I've crashed my system before.

P.

Comment 2 Prarit Bhargava 2008-07-29 13:40:55 UTC
tcamuso,

I had jburke update the BIOS on this system to the latest-and-greatest version
(3.02 IIRC).

This problem is still reproducible after the BIOS upgrade.  Of interest is that
when I crash the system (via 'echo c > /proc/sysrq-trigger') my network in my
cube dies.  AFAICT, the xw9400 is generating a large amount of network traffic,
or more likely, a large amount of network interrupts during the system reboot.

jburke mentioned something about a "device reset" option.  I'm going to try that
and report back in this BZ.

P.

Comment 3 Prarit Bhargava 2008-10-15 13:55:51 UTC
The behavior of this failure changed in 2.6.18-98.el5.

This is now a dup of 456638.

P.

*** This bug has been marked as a duplicate of bug 456638 ***