Bug 611407
Summary: | kvm guest unable to kdump without noapic | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Qian Cai <qcai> | ||||||||||||
Component: | kernel | Assignee: | Prarit Bhargava <prarit> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Han Pingtian <phan> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | 5.5 | CC: | allen.payne, amwang, bcao, bgollahe, bsarathy, cjt, clalance, cww, dhoward, esammons, gleb, kkii, lersek, ltroan, mdeng, mfuruta, mjenner, mstowe, msvoboda, nhorman, redhat-bz, sassmann, sghosh, tgraf, tumeya | ||||||||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: |
When a device triggered an interrupt request (IRQ) during kdump kernel startup, the kernel tried to process the IRQ even though the respective interrupt handler was not loaded. As a consequence, the kernel could not finish its startup and the system became unresponsive. This update allows the kernel to disable the IRQ line if the kernel receives a large number of IRQs and there is no interrupt handler loaded in the kernel. The kernel now starts as expected and kdump can successfully create a core dump.
|
Story Points: | --- | ||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2011-07-21 10:13:19 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 670712, 673717, 691805, 697486, 702001, 761539 | ||||||||||||||
Attachments: |
|
Created attachment 429476 [details]
guest xml
i386 guest is fine. Triage assignment. If you feel this bug doesn't belong to you, or that it cannot be handled in a timely fashion, please contact me for re-assignment Amerigo, I think you need to talk to Chris Lalancette or one of the other kvm guys. If this needs to be done unilaterally on KVM, I think we have a problem in the apic emulation code in kvm and we likely need to fix that, rather than just working around it. Yeah, I'm not sure that we want to unilaterally pass noapic to the kdump kernel without a better understanding of what is going on. It does seem like this is a KVM emulation bug, though I'm not sure of that yet. Given where it is hanging, it sort of seems like particular interrupts aren't being delivered (I know for sure that *some* interrupts are being delivered, since we've successfully made it past timer initialization/calibration). I would suggest getting a core dump of the guest when it is hung up like this. Boot up the guest, crash it using "echo c > /proc/sysrq-trigger" inside the guest, and then once it hangs, go to the host and do "virsh dump <guest> <file>". That should give you a corefile, and I *believe* that crash can be persuaded to debug kdump kernels. Chris Lalancette I've just received another support ticket which appears to be this same issue. Interesting to note however is that this customer determined the problem only occurs when using the virtio nic. When they use e1000 they don't need to add noapic. Hi,
> I would suggest getting a core dump of the guest when it is hung up like this.
> Boot up the guest, crash it using "echo c > /proc/sysrq-trigger" inside the
> guest, and then once it hangs, go to the host and do "virsh dump <guest>
> <file>". That should give you a corefile, and I *believe* that crash can be
> persuaded to debug kdump kernels.
here's the backtrace from the second kernel. The stack trace isn't precise
though, as I think "virsh dump" doesn't flush the latest process information
to memory.
crash> bt -a
PID: 1 TASK: ffff8100019e87a0 CPU: 0 COMMAND: "swapper"
#0 [ffff8100019edbb0] schedule at ffffffff80063f96
#1 [ffff8100019edbb8] thread_return at ffffffff80064054
#2 [ffff8100019edbc8] thread_return at ffffffff80064054
#3 [ffff8100019edbe0] zone_statistics at ffffffff800cd378
#4 [ffff8100019edc68] __alloc_pages at ffffffff8000f2ff
#5 [ffff8100019edcd8] cache_grow at ffffffff80017b70
#6 [ffff8100019edd28] cache_alloc_refill at ffffffff8005c6c9
#7 [ffff8100019edd98] probe_irq_on at ffffffff800bd608
#8 [ffff8100019edde8] serial8250_config_port at ffffffff801c3ccd
#9 [ffff8100019ede38] uart_add_one_port at ffffffff801c1706
#10 [ffff8100019edea8] serial8250_init at ffffffff8042600f
#11 [ffff8100019edec8] init at ffffffff80407a5c
#12 [ffff8100019edf48] kernel_thread at ffffffff8005efb1
We're also seeing this same issue when testing kdump on KVM based guests. Is any progress being made on finding a cause/solution? Has there been any progress on this issue, or are the all the updates just hidden from view? (In reply to comment #32) > Has there been any progress on this issue, or are the all the updates just > hidden from view? I can't reproduce it. What more information do you need in order to help reproduce this issue? Well. You can describe what do you do to reproduce. But I have the description from someone else already and I tried those steps many, many, many times and it just works for me. We've a RHEL-6.0 KVM host using NFS based storage for the guests. The guests are running RHEL-5.5 using virtio drivers for both network & storage. The guests kdump is configured thus: ext3 LABEL=/var path ./crash core_collector makedumpfile -c -d31 default reboot Without the noapic boot option kdump, initiated by 'echo c > /proc/sysrq-trigger', hangs at the line: Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled If the noapic option is added to KDUMP_COMMANDLINE_APPEND, in /etc/sysconfig/kdump and the kdump image regenerated then the guest successfully dumps a core. The kernel command line of the guest is: ro root=/dev/volgrp01/root panic=20 log_buf_len=131072 crashkernel=128M@16M Any other information that would be useful? my kdump config is different, but I doubt this matter much. Will try to reproduce with your config too. Can you reproduce with ide & e1000? I get the hang, at the same place, when using an ide disk & e1000 nic. OK I was able to reproduce at least once now. The important thing seams to be that network traffic should happen in the time of crash (actually I think any device activity that generate a lot of interrupt will have the same effect). Do you see the hang if you run without networking (-net none). So it looks like the problem is in rhel5 kernel, not KVM. I think this problem should be reproducible on real HW too. The problem happens when some device (lets assume it is NIC here) signals level triggered interrupt while kexec is in progress. When new kernels starts to run all interrupts are masked in IOAPIC, so the kernel is able to run till the point it initialize serial. During serial initialization kernel probes interrupt to see what irq is used by serial device. At this point it unmasks all interrupt in IOAPIC and immediately receives interrupt raced by the NIC, but since NIC driver is not yet loaded interrupt reason can't be cleared in the NIC, so after interrupt is acknowledged to APIC it is delivered once again immediately and this loop continues. Kexeced kernel is stuck at this point. Newer kernels mask interrupt in IOAPIC before acking it in APIC if there is no interrupt handler registered. This way interrupt loop described above will not happen. Move the bug to kernel component. *** Bug 701931 has been marked as a duplicate of this bug. *** Created attachment 500670 [details]
kdump.patch
Here's a patch that should mend the hang. It works by disabling the IRQ line after too many interrupts if there's no handler installed. Thus the IRQ line will be brought down, but that keeps the kernel going. You'll notice a message like the following if that happens:
irq 10: nobody cared (try booting with the "irqpoll" option)
Call Trace:
[...]
handlers:
Disabling IRQ #10
Testing feedback welcome
(In reply to comment #61) > Created attachment 500670 [details] > kdump.patch > > Here's a patch that should mend the hang. It works by disabling the IRQ line > after too many interrupts if there's no handler installed. Thus the IRQ line > will be brought down, but that keeps the kernel going. You'll notice a message > like the following if that happens: > irq 10: nobody cared (try booting with the "irqpoll" option) > > Call Trace: > [...] > handlers: > Disabling IRQ #10 > > Testing feedback welcome I assume irq line will be re-enabled after irq handler will be registerd for it, correct? (In reply to comment #63) > I assume irq line will be re-enabled after irq handler will be registerd for > it, correct? Unfortunately no, that's the drawback we have to cope with. However we only apply this in the kdump case so after the dump is saved the system is going to reboot anyway and everything is back to normal. (In reply to comment #64) > (In reply to comment #63) > > I assume irq line will be re-enabled after irq handler will be registerd for > > it, correct? > > Unfortunately no, that's the drawback we have to cope with. However we only > apply this in the kdump case so after the dump is saved the system is going to > reboot anyway and everything is back to normal. Comment #44 says that the bug can be reproduced with doing dd into virtio disk while kexecing. In that case kdump will not be able to access virtio disk to save the dump. The same may happen if NIC and disk share IRQ line (which is possible with virtio disk) So far I wasn't able to reproduce the problem with virtio disk, if somebody can give me some exact steps on how to do reproduce it I'll take a look. (In reply to comment #67) > So far I wasn't able to reproduce the problem with virtio disk, if somebody can > give me some exact steps on how to do reproduce it I'll take a look. I wasn't able to reproduce it either. But if I create two virtio nics and one virtio block then block and one of the nics share interrupt line. If the line will be disabled virtio block will not work too. Created attachment 500957 [details] guest xml for RHEL5U6 Hello, I am also testing a patch in comment #61. I can reproduce a similar problem with a virtio nic and a virtio block on RHEL5U6(x86_64) guest hosted by RHEL6U1(x86_64). The attached xml file is domain information for the guest. [how to reproduce] 1. transfer a huge file from Host to Guest [host]# scp huge_file root:~/ 2. During transferring the file, crash the guest using sysrq-trigger [guest]# echo c > /proc/sysrq-trigger [result] I tested the patch. The problem often(not always) occurs without the patch, but the problem doesn't occur with the patch so far. o RHEL5U6 without the patch # echo c > /proc/sysrq-trigger SysRq : Trigger a crashdump Memory for crash kernel (0x0 to 0x0) notwithin permissible range WARNING calibrate_APIC_clock: the APIC timer calibration may be wrong. ... hanging, no response ... o RHEL5U6 with the patch # echo c /proc/sysrq-trigger SysRq : Trigger a crashdump Memory for crash kernel (0x0 to 0x0) notwithin permissible range WARNING calibrate_APIC_clock: the APIC timer calibration may be wrong. irq 10: nobody cared (try booting with the "irqpoll" option) handlers: Disabling IRQ #10 irq 10: nobody cared (try booting with the "irqpoll" option) handlers: Disabling IRQ #10 irq 10: nobody cared (try booting with the "irqpoll" option) handlers: Disabling IRQ #10 irq 10: nobody cared (try booting with the "irqpoll" option) handlers: Disabling IRQ #10 Mounting proc filesystem Mounting sysfs filesystem Creating /dev .... keep dumping ... [note] - I can also reproduce the same issue on RHEL6.0 Host and RHEL5.5 Guest - In my case, the following message reported by this bug doesn't occur when hanging "Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled" => Any idea? And also I am setting up the test environment on "hp-xw8600-01.rhts.eng.bos.redhat.com". Please access to the server if you want. I got the server via Beaker. And I am using virt-manager and virsh for testing on the server. [kernel rpm in Guest] - kernel-2.6.18-238.el5 : original RHEL5U6 kernel - kernel-2.6.18-238.el5.kdump : customized RHEL5U6 kernel by the patch If you have any comments and need any other information, please let me know. Thanks, Keiichi (In reply to comment #70) > [note] > - I can also reproduce the same issue on RHEL6.0 Host and RHEL5.5 Guest > - In my case, the following message reported by this bug doesn't occur > when hanging > "Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled" > => Any idea? > Probably this is because you have quiet boot enabled. > > - In my case, the following message reported by this bug doesn't occur
> > when hanging
> > "Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled"
> > => Any idea?
> >
> Probably this is because you have quiet boot enabled.
Thanks. You're right.
I can get the same message by removing "quiet" from the boot option.
Created attachment 503899 [details]
RHEL5 fix for this issue
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. With the patch, kdump works. But some backtraces echoed: ... type=2000 audit(1307691347.630:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Initializing Cryptographic API alg: No test for crc32c (crc32c-generic) ksign: Installing public key data Loading keyring - Added public key BB4195B9B26800F8 - User ID: Red Hat, Inc. (Kernel Module GPG key) io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) Limiting direct PCI/PCI transfers. Activating ISA DMA hang workarounds. pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Real Time Clock Driver v1.12ac hpet_acpi_add: no address or irqs in _CRS Non-volatile memory driver v1.2 Linux agpgart interface v0.101 (c) Dave Jones Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled irq 10: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff800bdf69>] __report_bad_irq+0x30/0x7d [<ffffffff800be19c>] note_interrupt+0x1e6/0x227 [<ffffffff800bd678>] __do_IRQ+0xca/0x140 [<ffffffff8006d4c1>] do_IRQ+0xe9/0xf7 [<ffffffff8005d615>] ret_from_intr+0x0/0xa [<ffffffff801cace9>] klist_children_get+0x0/0x9 [<ffffffff8001252a>] __do_softirq+0x51/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006d636>] do_softirq+0x2c/0x7d [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff801cace9>] klist_children_get+0x0/0x9 [<ffffffff800becc8>] probe_irq_on+0x6e/0x151 [<ffffffff801c7d51>] serial8250_config_port+0x7c7/0x9c3 [<ffffffff801c577c>] uart_add_one_port+0xf8/0x278 [<ffffffff801cb424>] device_add+0x34e/0x372 [<ffffffff80484e13>] serial8250_init+0xdb/0x125 [<ffffffff80465a5e>] init+0x1f9/0x2f7 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff801867fd>] acpi_ds_init_one_object+0x0/0x80 [<ffffffff80465865>] init+0x0/0x2f7 [<ffffffff8005dfa7>] child_rip+0x0/0x11 handlers: Disabling IRQ #10 irq 10: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff800bdf69>] __report_bad_irq+0x30/0x7d [<ffffffff800be19c>] note_interrupt+0x1e6/0x227 [<ffffffff800bd678>] __do_IRQ+0xca/0x140 [<ffffffff8006d4c1>] do_IRQ+0xe9/0xf7 [<ffffffff8005d615>] ret_from_intr+0x0/0xa [<ffffffff8001252a>] __do_softirq+0x51/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006d636>] do_softirq+0x2c/0x7d [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff800bed27>] probe_irq_on+0xcd/0x151 [<ffffffff801c7d51>] serial8250_config_port+0x7c7/0x9c3 [<ffffffff801c577c>] uart_add_one_port+0xf8/0x278 [<ffffffff801cb424>] device_add+0x34e/0x372 [<ffffffff80484e13>] serial8250_init+0xdb/0x125 [<ffffffff80465a5e>] init+0x1f9/0x2f7 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff801867fd>] acpi_ds_init_one_object+0x0/0x80 [<ffffffff80465865>] init+0x0/0x2f7 [<ffffffff8005dfa7>] child_rip+0x0/0x11 handlers: Disabling IRQ #10 irq 10: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff800bdf69>] __report_bad_irq+0x30/0x7d [<ffffffff800be19c>] note_interrupt+0x1e6/0x227 [<ffffffff800bd678>] __do_IRQ+0xca/0x140 [<ffffffff8006d4c1>] do_IRQ+0xe9/0xf7 [<ffffffff8005d615>] ret_from_intr+0x0/0xa [<ffffffff8001252a>] __do_softirq+0x51/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006d636>] do_softirq+0x2c/0x7d [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff800becc8>] probe_irq_on+0x6e/0x151 [<ffffffff801c7d90>] serial8250_config_port+0x806/0x9c3 [<ffffffff801c577c>] uart_add_one_port+0xf8/0x278 [<ffffffff801cb424>] device_add+0x34e/0x372 [<ffffffff80484e13>] serial8250_init+0xdb/0x125 [<ffffffff80465a5e>] init+0x1f9/0x2f7 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff801867fd>] acpi_ds_init_one_object+0x0/0x80 [<ffffffff80465865>] init+0x0/0x2f7 [<ffffffff8005dfa7>] child_rip+0x0/0x11 handlers: Disabling IRQ #10 irq 10: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff800bdf69>] __report_bad_irq+0x30/0x7d [<ffffffff800be19c>] note_interrupt+0x1e6/0x227 [<ffffffff800bd678>] __do_IRQ+0xca/0x140 [<ffffffff8006d4c1>] do_IRQ+0xe9/0xf7 [<ffffffff8005d615>] ret_from_intr+0x0/0xa [<ffffffff8001252a>] __do_softirq+0x51/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006d636>] do_softirq+0x2c/0x7d [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff800bed27>] probe_irq_on+0xcd/0x151 [<ffffffff801c7d90>] serial8250_config_port+0x806/0x9c3 [<ffffffff801c577c>] uart_add_one_port+0xf8/0x278 [<ffffffff801cb424>] device_add+0x34e/0x372 [<ffffffff80484e13>] serial8250_init+0xdb/0x125 [<ffffffff80465a5e>] init+0x1f9/0x2f7 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff801867fd>] acpi_ds_init_one_object+0x0/0x80 [<ffffffff80465865>] init+0x0/0x2f7 [<ffffffff8005dfa7>] child_rip+0x0/0x11 handlers: Disabling IRQ #10 erial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A brd: module loaded Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx PIIX3: IDE controller at PCI slot 0000:00:01.1 PIIX3: chipset revision 0 PIIX3: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xc000-0xc007, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xc008-0xc00f, BIOS settings: hdc:pio, hdd:pio hda: QEMU HARDDISK, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hda: max request size: 512KiB hda: 41943040 sectors (21474 MB) w/256KiB Cache, CHS=16383/255/63, (U)DMA hda: cache flushes supported hda: hda1 hda2 ide-floppy driver 0.99.newide usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 TCP bic registered Initializing IPsec netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 ACPI: (supports S3 S4 S5) Initalizing network drop monitor service Freeing unused kernel memory: 224k freed Write protecting the kernel read-only data: 527k Mounting proc filesystem Mounting sysfs filesystem Creating /dev Creating initial device nodes Loading scsi_mod.ko module SCSI subsystem initialized Loading sd_mod.ko module Loading libata.ko module Loading ata_piix.ko module Loading virtio.ko module Loading virtio_blk.ko module Loading virtio_ring.ko module Loading virtio_pci.ko module ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [LNKB] -> GSI 10 (level, high) -> IRQ 10 ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 ACPI: PCI Interrupt 0000:00:03.0[A] -> Link [LNKC] -> GSI 11 (level, high) -> IRQ 11 Loading jbd.ko module Loading ext3.ko module Loading dm-mod.ko module device-mapper: uevent: version 1.0.3 device-mapper: ioctl: 4.11.6-ioctl (2011-02-18) initialised: dm-devel Loading dm-log.ko module Loading dm-mirror.ko module Loading dm-zero.ko module Loading dm-snapshot.ko module Waiting for required block device discovery Waiting for hda...Found Creating Block Devices Creating block device hda hda: hda1 hda2 Creating block device ram0 Creating block device ram1 Creating block device ram10 Creating block device ram11 Creating block device ram12 Creating block device ram13 Creating block device ram14 Creating block device ram15 Creating block device ram2 Creating block device ram3 Creating block device ram4 Creating block device ram5 Creating block device ram6 Creating block device ram7 Creating block device ram8 Creating block device ram9 Making device-mapper control node Scanning logical volumes Reading all physical volumes. This may take a while... Found volume group "VolGroup00" using metadata type lvm2 Activating logical volumes 2 logical volume(s) in volume group "VolGroup00" now active Saving to the local filesystem /dev/mapper/VolGroup00-LogVol00 e2fsck 1.38 (30-Jun-2005) /dev/mapper/VolGroup00-LogVol00: recovering journal input: AT Translated Set 2 keyboard as /class/input/input0 /dev/mapper/VolGroup00-LogVol00: clean, 125095/4695552 files, 1167569/4694016 blocks kjournald starting. Commit interval 5 seconds EXT3 FS on dm-0, internal journal EXT3-fs: mounted filesystem with ordered data mode. [100 %] The dumpfile is saved to /mnt//var/crash/127.0.0.1-2011-06-10-07:36:01/vmcore-incomplete. makedumpfile Completed. Saving core complete md: stopping all md devices. input: ImExPS/2 Generic Explorer Mouse as /class/input/input1 Restarting system. . machine restart Han, these backtraces are a side-effect of the fix and are expected. *** Bug 702290 has been marked as a duplicate of this bug. *** Patch(es) available in kernel-2.6.18-268.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. I also tested with kernel-2.6.18-268.el5 and confirmed kdump works well. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html Please see bug 418501 comment 64. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When a device triggered an interrupt request (IRQ) during kdump kernel startup, the kernel tried to process the IRQ even though the respective interrupt handler was not loaded. As a consequence, the kernel could not finish its startup and the system became unresponsive. This update allows the kernel to disable the IRQ line if the kernel receives a large number of IRQs and there is no interrupt handler loaded in the kernel. The kernel now starts as expected and kdump can successfully create a core dump. |
Created attachment 429473 [details] full kdump kernel boot log Description of problem: I am not sure if we want to document the workaround or fix the real problem. The rhel5.5 guest failed kdump under rhel6 Intel host. It stopped here, Activating ISA DMA hang workarounds. pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Real Time Clock Driver v1.12ac hpet_acpi_add: no address or irqs in _CRS Non-volatile memory driver v1.2 Linux agpgart interface v0.101 (c) Dave Jones Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled Version-Release number of selected component (if applicable): Host: RHEL6.0-20100701.3 kernel-2.6.32-42.el6.x86_64 (+ the patch to fix smp guest regression) qemu-kvm-0.12.1.2-2.90.el6.x86_64 Guest: RHEL5.5 GA x86_64 kernel-2.6.18-194.el6 How reproducible: always