Bug 442368 - [RHEL5.2] [Regression] kdump by INIT does not work
Summary: [RHEL5.2] [Regression] kdump by INIT does not work
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: ia64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Linda Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-14 15:52 UTC by Takao Indoh
Modified: 2013-08-06 03:49 UTC (History)
4 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:14:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix machine_kdump_on_init so that it can call crash_save_vmcoreinfo (484 bytes, patch)
2008-04-14 15:52 UTC, Takao Indoh
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Takao Indoh 2008-04-14 15:52:04 UTC
Description of problem:
kdump by INIT does not work. After INIT is issued, 2nd kernel starts and kdump
fails. The following is the console message in 2nd kernel.

(snipped)
Creating block device sdd
Creating block device sde
hwclock: Could not access RTC: No such file or directory
Saving to the local filesystem LABEL=/dump
e2fsck 1.38 (30-Jun-2005)
/dump: clean, 17/3525120 files, 4233706/7048444 blocks
Restarting system.

There is no error message in the console, but vmcore is not copied on the disk.
I found that makedumpfile failed with the following error message.

/proc/vmcore doesn't contain vmcoreinfo.
'-x' or '-i' must be specified.

makedumpfile Failed.

The cause of this problem is that kernel does not call crash_save_vmcoreinfo.
When kdump starts by panic or sysrq-trigger, crash_save_vmcoreinfo is called by
crash_kexec. But this function is not called when kdump starts by INIT. The
Attached patch fixes this.


Version-Release number of selected component (if applicable):
2.6.18-89.el5 (5.2 snapshot5)

How reproducible:
Always

Steps to Reproduce:
1. Set up kdump
2. Start kdump by INIT
3.
  
Actual results:
kdump fails.

Expected results:
kdump succeeds.

Additional info:
This problem is very serious. If this problem is not fixed, there is no way to
start kdump when system hangs up.

Comment 1 Takao Indoh 2008-04-14 15:52:04 UTC
Created attachment 302355 [details]
Fix machine_kdump_on_init so that it can call crash_save_vmcoreinfo

Comment 2 Neil Horman 2008-04-14 16:38:10 UTC
I'm hard pressed to believe that this is a regression.  issuing a kdump via SAL
INIT message isn't really a common way to test kdump functionality. Sysrq-C and
panic issued crashes still work just as they are supposed to.  While this would
be nice to fix,  I really don't see it as needing to get shoved into 5.2 at the
last minute, especially given that this fix isn't upstream. 

Let me know what the upstream status is.  If you like I can send it up for
review, or you can, whichever you prefer.  Once its in upstream I propose we
then backport it for 5.3

Comment 3 Takao Indoh 2008-04-14 20:44:15 UTC
>I'm hard pressed to believe that this is a regression.  issuing a kdump via SAL
>INIT message isn't really a common way to test kdump functionality. Sysrq-C and
>panic issued crashes still work just as they are supposed to.

I think this is a regression because kdump by INIT worked at 5.0 and
5.1. At 5.2, the new feature of supprting vmcoreinfo is added by
linux-2.6-kexec-fix-vmcoreinfo-patch-that-breaks-kdump.patch, and there
is a bug in the patch.

Functionally INIT of ia64 corresponds to NMI of x86/x86_64, so kdump by
INIT is important. We use INIT when system hangs up. Of course
sysrq-trigger can be also used, but sysrq-trigger does not work if
interruption is disabled.

>Let me know what the upstream status is.  If you like I can send it up for
>review, or you can, whichever you prefer.  Once its in upstream I propose we
>then backport it for 5.3

I have not post this patch yet, but I'll post it to upstream soon. I understand
this patch needs to be included into upstream at first, but this regression
is very serious as I said above, so we need this fix at 5.2. I think the risk of
including this patch is very low because this patch is one-liner patch againt
arch/ia64/kernel/crash.c. Of course I'll post this patch to upstream soon for
review.

Comment 4 Neil Horman 2008-04-15 11:07:33 UTC
It is technically a regression, yes, but its not that important:  It can be used
in much the same way that the nmi interrupt is used to correct deadlock on other
arches.  However, it is not exactly the same as the nmi_watchdog can be
configured to automatically detect lockup, as far as I know  INIT is a manually
sent command, and while its very helpful, its not going to have as prevalent a
use as its ia-32 counterpart.  I'm happy to take the patch, its obviously
correct, but its very late in the release cycle.  Iwould just as soon wait for
5.3.  If its accepted as a blocker however, I'll post now. 

Comment 10 Don Zickus 2008-04-16 20:50:52 UTC
in kernel-2.6.18-90.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 13 errata-xmlrpc 2008-05-21 15:14:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.