Bug 719237 - Kdump failed with intel_iommu=on
Summary: Kdump failed with intel_iommu=on
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.7
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Dave Young
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 852604 1196263
TreeView+ depends on / blocked
 
Reported: 2011-07-06 07:31 UTC by Chao Ye
Modified: 2019-07-11 07:33 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Important: Disable IOMMU on Intel Chipsets A limitation in the current implementation of the Intel IOMMU driver can occasionally prevent the kdump service from capturing the core dump image. To use kdump on Intel architectures reliably, it is advised that the IOMMU support is disabled.
Clone Of:
: 852604 1196263 (view as bug list)
Environment:
Last Closed: 2012-08-29 04:38:01 UTC
Target Upstream Version:
Embargoed:
cbuissar: needinfo+


Attachments (Terms of Use)

Description Chao Ye 2011-07-06 07:31:57 UTC
Description of problem:
Second kernel hang with intel_iommu=on
========================================================
[root@hp-dl380g7-01 ~]# dmesg | grep IOMMU
Intel-IOMMU: enabled
IOMMU e7ffe000: ver 1:0 cap c90780106f0462 ecap f0207e
IOMMU 0xe7ffe000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:05:00.0 [0xdf63e000 - 0xdf640000]
IOMMU: Setting identity map for device 0000:02:00.0 [0xdf63e000 - 0xdf640000]
IOMMU: Setting identity map for device 0000:02:00.2 [0xdf63e000 - 0xdf640000]
IOMMU: Setting identity map for device 0000:00:1d.0 [0xdf7f5000 - 0xdf7fb000]
IOMMU: Setting identity map for device 0000:00:1d.1 [0xdf7f5000 - 0xdf7fb000]
IOMMU: Setting identity map for device 0000:00:1d.2 [0xdf7f5000 - 0xdf7fb000]
IOMMU: Setting identity map for device 0000:00:1d.3 [0xdf7f5000 - 0xdf7fb000]
IOMMU: Setting identity map for device 0000:02:00.0 [0xdf7f5000 - 0xdf7fb000]
IOMMU: Setting identity map for device 0000:02:00.2 [0xdf7f5000 - 0xdf7fb000]
IOMMU: Setting identity map for device 0000:02:00.4 [0xdf7f5000 - 0xdf7fb000]
IOMMU: Setting identity map for device 0000:00:1d.7 [0xdf7fc000 - 0xdf7fe000]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000]
[root@hp-dl380g7-01 ~]# uname -a
Linux hp-dl380g7-01.lab.bos.redhat.com 2.6.18-273.el5 #1 SMP Mon Jul 4 14:12:24 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@hp-dl380g7-01 ~]# service kdump restart
Stopping kdump:[确定]
No kdump initial ramdisk found.[警告]
Rebuilding /boot/initrd-2.6.18-273.el5kdump.img
Starting kdump:[确定]
[root@hp-dl380g7-01 ~]# echo c > /proc/sysrq-trigger 
----------------------------------------------------------------------------------------------
SysRq : Trigger a crashdump
Linux version 2.6.18-273.el5 (mockbuild.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-51)) #1 SMP Mon Jul 4 14:12:24 EDT 2011
Command line: ro root=/dev/VolGroup00/LogVol00 consoS1,115200n81 intel_iommu=on irqpoll maxcpus=1 reset_devices  memmap=exactmap memmap=573K@64K memmap=6096K@32768K memmap=124387K@39437K elfcorehdr=163824K memmap=3K$637K memmap=52K#3659964K memmap=75532K$366memmap=2112K$4173824K memmap=8192K$4186112K
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000010000 - 000000000009f400 (usable)
 BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
 BIOS-e820000000100000 - 00000000df62f000 (usable)
 BIOS-e820: 00000000df62f000 - 00000000df63c000 (ACPI data)
 BIOS-e820: 00000000df63c000 - 00000000df63d000 (usable)
 BIOS-e820: 00000000df63d000 - 00000000e4000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fee10000 (reserved)
 BIOS-e820: 00000000ff800000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 000000031ffff000 (usable)
user-defined physical RAM map:
 user: 0000000000010000 - 000000000009f400 (usable)
 user: 000000000009f400 - 00000000000a0000 (reserved)
 user: 0000000002000000 - 00000000025f4000 (usable)
 user: 0000000002683400 - 0000000009ffc000 (usable)
 user: 00000000df62f000 - 00000000df63c000 (ACPI data)
 user: 00000000df63d000 - 00000000e4000000 (reserved)
 user: 00000000fec00000 - 00000000fee10000 (reserved)
 user: 00000000ff800000 - 000000000 (reserved)
DMI 2.6 present.
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 0 -> APIC 2 - 0
SRAT: PXM 0 -> APIC 3 -> Node 0
SRAT: PXM 0 -> APIC 18 -> Node 0
SRAT: PXM 0 -> APIC 19 -> Node 0
SRAT: PXM 0 -> APIC 20 -> Node 0
SRAT: PXM 0 -> APIC 21 -> Node 0
SRAT: PXM 1 -> APIC 32 -> Node 1
SRAT: PXM 1 -> APIC 33 -> Node 1
SRAT: PXM 1 -> APIC 34 -> Node 1
SRAT: PXM 1 -> APIC 35 -> Node 1
SRAT: PXM 1 -> APIC 50 -> Node 1
SRAT: PXM 1 -> APIC 51 -> Node 1
SRAT: PXM 1 -> APIC 52 -> Node 1
SRAT: PXM 1 -> APIC 53 -> Node 1
SRAT: Node 0 PXM 0 0-e0000000
SRAT: Node 0 PXM 0 0-1a0000000
SRAT: Node 1 PXM 1 1a0000000-320000000
Bootmem setup node 0 0000000000000000-0000000009ffc00mory for crash kernel (0x0 to 0x0) notwithin permissible range
disabling kdump
ACPI: PM-Timer IO Port: 0x908
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
Processor #32 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x10] disabled)
ACPI: LAPIC (acpi_id[0x18] lapic_id[0x30] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled)
ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] disabled)
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x14] enabled)
Processor #20 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x34] enabled)
Processor #52 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
Processor #2 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled)
Processor #34 6:12 APIC  21
ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x12] enabled)
Processor #18 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x1a] lapic_id enabled)
Processor #50 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled)
ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] disabled)
ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x16] disabled)
ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x36] disabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x11] lapic_id[0x21] enabled)
Processor #33 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x11] disabled)
ACPI: LAPIC (acpi_id[0x19] lapic_id[0x31] disabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled)
ACPI: LAPIC (acpi_id[0x15] lapic_id[0x25] disabled)
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x15] enabled)
Processor #21 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x35] enabled)
Processor #53 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
Processor #3 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x13] lapic_id[0x23] enabled)
Processor #35 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x13] enabled)
Processor #19 6:12 APIC version 21
ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x33] enabled)
Processor #51 6:1 version 21
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled)
ACPI: LAPIC (acpi_id[0x17] lapic_id[0x27] disabled)
ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x17] disabled)
ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x37] disabled)
ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 3ress 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x00] address[0xfec80000] gsi_base[sion 32, address 0xfec80000, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Setting APIC routing to physical flat
ACPI: d: 0x8086a201 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Nosave address range: 000000000009f000 - 00000000000a0000
Nosave address range: 00000000000a0000 - 0000000002000000
Nosavess range: 00000000025f4000 - 0000000002684000
Allocating PCI resources starting at 10000000 (gap: 9ffc000:d5633000)
SMP: Allowing 32 CPUs, 16 hotplug CPUs
Built 1 zonelists.  Total pages: 32195
Kernel d line: ro root=/dev/VolGroup00/LogVol00 console=ttyS1,115200n81 intel_iommu=on irqpoll maxcpus=1 reset_devices  memmap=exactmap memmap=573K@64K memmap=6096K@32768K memmap=124387K@39437K elfcorehdr=163824K memmap=3K$637K memmap=52K#3659964K memmap=75532K$3660020K memmap=2112K$4173824K memmap=8192K$4186112K
Intel-IOMMU: enabled
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
Initializing CPU#0
PID hash table entries: 512 (order: 9, 4096 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode-cash table entries: 8192 (order: 4, 65536 bytes)
Checking aperture...
Memory: 115224k/163824k available (2603k kernel code, 15828k reserved, 1660k data, 224k init)
Calibrating delay loop (skipped), value calculated using timer frequency.. 4798.78 BogoMIPS (lpj=2399392)
Security Fra v1.0.0 initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 12288K
CPU 0/34 -> Node 0
using mwait in idle threads.
CPU: Physical Processor ID: 1
CPU: Processor Core ID: 10
MCE: Machine Check Exception Reporting is disabled.
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
Using local APIC timer interrupts.
Detected 8.331 MHz APIC timer.
Brought up 1 CPUs
NMI watchdog testing PASSED.
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2399.392 MHz processor.
checking if image is initramfs... it is
Freeing initrd memory: 4715k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
Warning: pci_mmcfg_init marking 256MB space uncacheable.
MCFG table requires 64MB uncacheable only. Trooting with acpi_mcfg_max_pci_bus_num=on
PCI: Using MMCONFIG at e0000000
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
ACPI: PCI Root Bridge [PCI0] (0000)
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 7 *10 11), disabled.
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7 10 *11), disabled.
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 *7 10 11), disabled.
ACPI: PCI Interrupt Link [LNKD] (IRQs *5 7 10 11), disa.
ACPI: PCI Interrupt Link [LNKE] (IRQs *5 7 10 11), disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 5 7 10 11) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 5 7 *10 11), disabled.
ACPI: PCI Intert Link [LNKH] (IRQs 5 *7 10 11), disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 11 devices
usbcore: registered new driver usbfs
usbcore: registered ndriver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols =LABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
hpet0: at MMIO 0xfed00000 (virtual 0xffffffffff5fe000), IRQs 2, 8, 0, 0
hpet0: 4 64-bit timers, 14318180 Hz
DMAR:Host address width 39
DMDRHD base: 0x000000e7ffe000 flags: 0x1
IOMMU e7ffe000: ver 1:0 cap c90780106f0462 ecap f0207e
DMAR:RMRR base: 0x000000df7fc000 end: 0x000000df7fdfff
AR:RMRR base: 0x000000df63e000 end: 0x000000df63ffffff
DMAR:ATSR flags: 0x0
IOMMU
<====================================================Hang

Version-Release number of selected component (if applicable):
kernel-2.6.18-273.el5

How reproducible:
100% on hp-dl380g7-01.lab.bos.redhat.com

Steps to Reproduce:
1.Enable IOMMU
2.Start kdump service
3.Trigger crash
  
Actual results:
Second kernel Hang

Expected results:
vmcore captured

Additional info:

Comment 2 Dave Young 2011-10-20 06:30:21 UTC
Please try: "intel_iommu=on iommu=pt"

It works for me on ibm-x3550m3-01.rhts.eng.nay.redhat.com, but this machine does not hang at same point it hangs after some DMAR fault

Comment 6 Dave Young 2012-08-29 04:48:25 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Important: Disable IOMMU on Intel Chipsets
A limitation in the current implementation of the Intel IOMMU driver can occasionally prevent the kdump service from capturing the core dump image. To use kdump on Intel architectures reliably, it is advised that the IOMMU support is disabled.


Note You need to log in before you can comment on or make changes to this bug.