Bug 441641 - [5.2][kdump] capture kernel can hang on IBM eServer x3105
[5.2][kdump] capture kernel can hang on IBM eServer x3105
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
low Severity low
: rc
: ---
Assigned To: Ed Pollard
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-09 06:07 EDT by CAI Qian
Modified: 2013-08-05 20:04 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-10-22 06:36:34 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sosreport (1.43 MB, application/octet-stream)
2008-04-09 06:07 EDT, CAI Qian
no flags Details

  None (edit)
Description CAI Qian 2008-04-09 06:07:29 EDT
Description of problem:
Sometimes, kdump does not work on IBM eServer x3105, the second kernel hangs
there. RHEL5U1 has the same problem too.

SysRq : Trigger a crashdump
Linux version 2.6.18-88.el5 (brewbuilder@hs20-bc2-2.build.redhat.com) (gcc
version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Apr 1 19:01:18 EDT 2008
Command line: ro root=LABEL=/ console=ttyS0,115200  irqpoll maxcpus=1
reset_devices memmap=exactmap memmap=640K@0K memmap=5116K@16384K
memmap=125300K@22140K elfcorehdr=147440K memmap=24K#523328K memmap=424K#523352K
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000100 - 000000000009dc00 (usable)
 BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001ff10000 (usable)
 BIOS-e820: 000000001ff10000 - 000000001ff16000 (ACPI data)
 BIOS-e820: 000000001ff16000 - 000000001ff80000 (ACPI NVS)
 BIOS-e820: 000000001ff80000 - 0000000020000000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
user-defined physical RAM map:
 user: 0000000000000000 - 00000000000a0000 (usable)
 user: 0000000001000000 - 00000000014ff000 (usable)
 user: 000000000159f000 - 0000000008ffc000 (usable)
 user: 000000001ff10000 - 000000001ff16000 (ACPI data)
 user: 000000001ff16000 - 000000001ff80000 (ACPI data)
DMI present.
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 0-20000000
Bootmem setup node 0 0000000000000000-0000000008ffc000
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
disabling kdump
Nvidia board detected. Ignoring ACPI timer override.
ACPI: PM-Timer IO Port: 0x8008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:3 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:3 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
Setting APIC routing to physical flat
Using ACPI (MADT) for SMP configuration information
Nosave address range: 00000000000a0000 - 0000000001000000
Nosave address range: 00000000014ff000 - 000000000159f000
Allocating PCI resources starting at 20000000 (gap: 1ff80000:e0080000)
SMP: Allowing 2 CPUs, 0 hotplug CPUs
Built 1 zonelists.  Total pages: 32252
Kernel command line: ro root=LABEL=/ console=ttyS0,115200  irqpoll maxcpus=1
reset_devices memmap=exactmap memmap=640K@0K memmap=5116K@16384K
memmap=125300K@22140K elfcorehdr=147440K memmap=24K#523328K memmap=424K#523352K
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
Initializing CPU#0
PID hash table entries: 512 (order: 9, 4096 bytes)
irq 26, desc: ffffffff803b7d80, depth: 1, count: 0, unhandled: 0
->handle_irq():  ffffffff800b71df, handle_bad_irq+0x0/0x1f6
->chip(): ffffffff802f1b80, 0xffffffff802f1b80
->action(): 0000000000000000
  IRQ_DISABLED set
unexpected IRQ trap at vector 1a
Console: colour VGA+ 80x25
irq 26, desc: ffffffff803b7d80, depth: 1, count: 0, unhandled: 0
->handle_irq():  ffffffff800b71df, handle_bad_irq+0x0/0x1f6
->chip(): ffffffff802f1b80, 0xffffffff802f1b80
->action(): 0000000000000000
  IRQ_DISABLED set
   IRQ_PENDING set
unexpected IRQ trap at vector 1a
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
Checking aperture...
CPU 0: aperture @ 585e000000 size 32 MB
Aperture too small (32 MB)
No AGP bridge found
Memory: 119612k/147440k available (2456k kernel code, 11444k reserved, 1246k
data, 196k init)
Calibrating delay using timer specific routine.. 1999.26 BogoMIPS (lpj=999634)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
Using local APIC timer interrupts.
result 12472918
Detected 12.472 MHz APIC timer.
Brought up 1 CPUs
testing NMI watchdog ... OK.
Disabling vsyscall due to use of PM timer
time.c: Using 3.579545 MHz WALL PM GTOD PM timer.
time.c: Detected 997.832 MHz processor.
checking if image is initramfs... it is
Freeing initrd memory: 2298k freed
irq 26, desc: ffffffff803b7d80, depth: 1, count: 0, unhandled: 0
->handle_irq():  ffffffff800b71df, handle_bad_irq+0x0/0x1f6
->chip(): ffffffff802f1b80, 0xffffffff802f1b80
->action(): 0000000000000000
  IRQ_DISABLED set
   IRQ_PENDING set
unexpected IRQ trap at vector 1a
NET: Registered protocol family 16
No dock devices found.
ACPI: bus type pci registered
PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved
PCI: Not using MMCONFIG.
PCI: Using configuration type 1
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Transparent bridge - 0000:00:09.0
ACPI: PCI Interrupt Link [LNK1] (IRQs 16 17 18 19) *0, disabled.
ACPI: PCI Interrupt Link [LNK2] (IRQs 16 17 18 *19)
ACPI: PCI Interrupt Link [LNK3] (IRQs 16 17 18 19) *0, disabled.
ACPI: PCI Interrupt Link [LNK4] (IRQs 16 17 18 19) *0, disabled.
ACPI: PCI Interrupt Link [LNK5] (IRQs 16 17 18 19) *0, disabled.
ACPI: PCI Interrupt Link [LSMB] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LUS0] (IRQs 20 21 *22 23)
ACPI: PCI Interrupt Link [LUS2] (IRQs 20 21 22 *23)
ACPI: PCI Interrupt Link [LMAC] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LACI] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LMCI] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LPID] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LTID] (IRQs 20 *21 22 23)
ACPI: PCI Interrupt Link [LSI1] (IRQs *20 21 22 23), disabled.
ACPI: PCI Interrupt Link [APCP] (IRQs 20 21 22 23) *0, disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
irq 26, desc: ffffffff803b7d80, depth: 1, count: 0, unhandled: 0
->handle_irq():  ffffffff800b71df, handle_bad_irq+0x0/0x1f6
->chip(): ffffffff802f1b80, 0xffffffff802f1b80
->action(): 0000000000000000
  IRQ_DISABLED set
   IRQ_PENDING set
unexpected IRQ trap at vector 1a
pnp: PnP ACPI: found 12 devices
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
PCI-DMA: Disabling IOMMU.
pnp: 00:03: ioport range 0x8000-0x807f could not be reserved
pnp: 00:03: ioport range 0x8080-0x80ff has been reserved
pnp: 00:03: ioport range 0x8400-0x847f has been reserved
pnp: 00:03: ioport range 0x8480-0x84ff has been reserved
pnp: 00:03: ioport range 0x8800-0x887f has been reserved
pnp: 00:03: ioport range 0x8880-0x88ff has been reserved
pnp: 00:03: ioport range 0x1440-0x147f has been reserved
pnp: 00:03: ioport range 0x1400-0x143f has been reserved
PCI: Bridge: 0000:00:09.0
  IO window: 2000-2fff
  MEM window: d8000000-d80fffff
  PREFETCH window: d0000000-d7ffffff
PCI: Bridge: 0000:00:0b.0
  IO window: disabled.
  MEM window: d8100000-d81fffff
  PREFETCH window: 20000000-200fffff
PCI: Bridge: 0000:00:0d.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:0e.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
NET: Registered protocol family 2

Then, no further output.

Version-Release number of selected component (if applicable):
RHEL5.2-Server-20080402.0 (x86_64)
kernel-2.6.18-88.el5
kexec-tools-1.102pre-20.el5

How reproducible:
The failure rate is fairly high on ibm-alishan.rhts.boston.redhat.com.
Comment 1 CAI Qian 2008-04-09 06:07:29 EDT
Created attachment 301769 [details]
sosreport
Comment 2 Ed Pollard 2008-04-24 10:36:47 EDT
ibm-alishan has a pre-production cpu in it and likely old firmware. I am in the
process of getting an updated CPU for it and also at the same time will make
sure the firmware is updated to the most recent levels. Has this been seen on
other IBM AMD systems that you are aware of?
Comment 3 CAI Qian 2008-04-24 19:06:34 EDT
There is another one which may be related,

440399: [5.2][kdump] capture kernel reset for IBM eServer x3455
Comment 4 CAI Qian 2008-10-22 06:36:34 EDT
I'll close this out per comment #2, and I have not seen any other machine has the same problem.

Note You need to log in before you can comment on or make changes to this bug.