Red Hat Bugzilla – Bug 442368
[RHEL5.2] [Regression] kdump by INIT does not work
Last modified: 2013-08-05 23:49:32 EDT
Description of problem:
kdump by INIT does not work. After INIT is issued, 2nd kernel starts and kdump
fails. The following is the console message in 2nd kernel.
Creating block device sdd
Creating block device sde
hwclock: Could not access RTC: No such file or directory
Saving to the local filesystem LABEL=/dump
e2fsck 1.38 (30-Jun-2005)
/dump: clean, 17/3525120 files, 4233706/7048444 blocks
There is no error message in the console, but vmcore is not copied on the disk.
I found that makedumpfile failed with the following error message.
/proc/vmcore doesn't contain vmcoreinfo.
'-x' or '-i' must be specified.
The cause of this problem is that kernel does not call crash_save_vmcoreinfo.
When kdump starts by panic or sysrq-trigger, crash_save_vmcoreinfo is called by
crash_kexec. But this function is not called when kdump starts by INIT. The
Attached patch fixes this.
Version-Release number of selected component (if applicable):
2.6.18-89.el5 (5.2 snapshot5)
Steps to Reproduce:
1. Set up kdump
2. Start kdump by INIT
This problem is very serious. If this problem is not fixed, there is no way to
start kdump when system hangs up.
Created attachment 302355 [details]
Fix machine_kdump_on_init so that it can call crash_save_vmcoreinfo
I'm hard pressed to believe that this is a regression. issuing a kdump via SAL
INIT message isn't really a common way to test kdump functionality. Sysrq-C and
panic issued crashes still work just as they are supposed to. While this would
be nice to fix, I really don't see it as needing to get shoved into 5.2 at the
last minute, especially given that this fix isn't upstream.
Let me know what the upstream status is. If you like I can send it up for
review, or you can, whichever you prefer. Once its in upstream I propose we
then backport it for 5.3
>I'm hard pressed to believe that this is a regression. issuing a kdump via SAL
>INIT message isn't really a common way to test kdump functionality. Sysrq-C and
>panic issued crashes still work just as they are supposed to.
I think this is a regression because kdump by INIT worked at 5.0 and
5.1. At 5.2, the new feature of supprting vmcoreinfo is added by
linux-2.6-kexec-fix-vmcoreinfo-patch-that-breaks-kdump.patch, and there
is a bug in the patch.
Functionally INIT of ia64 corresponds to NMI of x86/x86_64, so kdump by
INIT is important. We use INIT when system hangs up. Of course
sysrq-trigger can be also used, but sysrq-trigger does not work if
interruption is disabled.
>Let me know what the upstream status is. If you like I can send it up for
>review, or you can, whichever you prefer. Once its in upstream I propose we
>then backport it for 5.3
I have not post this patch yet, but I'll post it to upstream soon. I understand
this patch needs to be included into upstream at first, but this regression
is very serious as I said above, so we need this fix at 5.2. I think the risk of
including this patch is very low because this patch is one-liner patch againt
arch/ia64/kernel/crash.c. Of course I'll post this patch to upstream soon for
It is technically a regression, yes, but its not that important: It can be used
in much the same way that the nmi interrupt is used to correct deadlock on other
arches. However, it is not exactly the same as the nmi_watchdog can be
configured to automatically detect lockup, as far as I know INIT is a manually
sent command, and while its very helpful, its not going to have as prevalent a
use as its ia-32 counterpart. I'm happy to take the patch, its obviously
correct, but its very late in the release cycle. Iwould just as soon wait for
5.3. If its accepted as a blocker however, I'll post now.
You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.