Red Hat Bugzilla – Bug 475843
kdump boot hangs in msleep on several HP XW systems
Last modified: 2009-09-02 05:12:17 EDT
Description of problem:
during a kdump the new kernel hangs at:
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
I see this on hp-xw8600-01 and several others. I will try to get a more complete list and include that info here.
it appears to hang at this line in autoconfig_irq()
/* forget possible initially masked and pending IRQ */
Version-Release number of selected component (if applicable):
kernel-2.6.18-125 (probably all RHEL5.X kernels)
Steps to Reproduce:
1. try kdump with serial console on hp-xw8600-01
I tracked this down a little deeper. The hang happens here:
63 /* Wait for longstanding interrupts to trigger. */
Once I comment out this msleep and another msleep later at line 86 kdump then works just fine.
So, now to figure out why msleep hangs, I am guessing something is not initialized correctly with the timers. Note that I am using a -125 kernel so this does not have the code that disables HPET on shutdown. Using that patch does not appear to make any difference for this issue.
*** Bug 473404 has been marked as a duplicate of this bug. ***
After a lot more digging I have found the root of the problem and have a fix.
The problem is it is unable to map the ACPI tables. This is because the BIOS does not flag the ACPI regions as ACPI but simply marks them as "reserved".
BIOS-e820: 0000000000000000 - 0000000000097000 (usable)
BIOS-e820: 0000000000097000 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007ffc2840 (usable)
BIOS-e820: 000000007ffc2840 - 0000000080000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
In the table above I found that the ACPI tables live in the 000000007ffc2840 - 0000000080000000 region. On most systems this would be marked as ACPI data however here it is simply "reserved".
The kdump kernel is told by kexec to ignore all of the data it finds on its own (via memmap=exactmap) and then passes in the memory info. Currently it only passes in what is known to be usable memory and ACPI memory, it does not tell it about reserved memory. This made the kdump kernel think that ACPI lived outside of legal memory so it was unable to map it. The fix is the pass in the reserved memory as well.
This should be considered a temporary workaround only since this leads to the potential of overflowing the command line length since we pass in many more memmap= arguments. However this has been tested and shown to fix kdump issues on several (potentially all) HP XW workstations.
Also, other testing has shown that this same fix also resolves the problem where kdump would hang when trying to enable HPET (bug 473404) which was seen on HP and other vendors hardware. That is a similar reason. In that case the HPET config register lived in a "reserved" region and could not be mapped by the kdump kernel.
I will attach the patch here however a prebuilt rpm is available inside Red Hat at:
Created attachment 326793 [details]
patch to pass all reserved regions to kdump kernel
this patch tells kexec to pass all reserved e820 regions to the kdump kernel. This should be considered a workaround only, there is the danger that since we are adding more arguments to the kernel command line we might cause overflow since the command line length is a finite length.
I have found that upstream kernels work OK even without this change to kexec-tools. I am still investigating as to why that is. One the kernel change that fixes this upstream is found that will be backported to the RHEL5.X kernel as the "real" fix.
*** Bug 475987 has been marked as a duplicate of this bug. ***
*** Bug 475498 has been marked as a duplicate of this bug. ***
I did more digging in upstream kernel code. As I mentioned earlier upstream works without this change to kdump. I was hoping to find the fix upstream and backport it however from closer inspection that does not appear to be possible.
Upstream works in part because __acpi_map_table() has the ability to fall back to "fixed" mapping if it cannot directly map the table. That bit of code was fairly easy to backport but it did not work due to the fact that it relies on the early memory reservation code (i.e. reserve_early() and related code in e820.c) which does not exist in RHEL5 and is too large to justify backporting when this userspace fix to kexec-tools will do the trick.
We had a concern that we might overflow the command line with all the additional memmap= arguments but since the command line length for RHEL5 is 4k that is not likely and worst case of overflowing is we would loose some of the memmap= args at the end of the command line. In the rare case where a system needed one of those late reserved sections in a kdump boot we would be in the same situation we are now.
I'll incorporate this as soon as it has the appropriate pm acks to allow me to check it in.
Ok, so are we doing a hotfix or a z stream release here? I see both in this bug (comment #10 or comment #12)
(In reply to comment #13)
> Ok, so are we doing a hotfix or a z stream release here? I see both in this
> bug (comment #10 or comment #12)
Z-stream... we use the hotfix tracker to request all accelerated fixes (fastrack/hotfix/zstream/etc) and then support management makes the call on which is appropriate and sets the right flags.
*** Bug 471065 has been marked as a duplicate of this bug. ***
comitted to kexec=-tools-1.102-pre57.el5. When the ztream/hotfix decision is made, I'll update cvs to reflect that appropriately.
I got the system from our QA colleagues and was able to reproduce the
problem. I modified the kernel so that I could issue a NMI when the system
was hanging in the kdump kernel. This got me the following stack trace:
IRQ0xa9_interrupt + 0x0/0xa
probe_irq_on + 0x6e/0x151
serial8250_config_port + 0x7c7/0x9c3
uart_add_one_port + 0xf7/0x278
platform_device_add + 0x111/0x148
serial8250_init + 0xdd/0x127
init + 0x1f9/0x2f7
It seems, the system is stuck just after the interrupt enable in
probe_irq_on. IRQ 0xa9 (169) belongs to the disk controller. Maybe, caused
by the high load, there was just a interrupt raised when the crash dump was
Hopefully, this investigation can give your engineers some hints.
This event sent from IssueTracker by streeter
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.