Description of problem: kdump by INIT does not work. After INIT is issued, 2nd kernel starts and kdump fails. The following is the console message in 2nd kernel. (snipped) Creating block device sdd Creating block device sde hwclock: Could not access RTC: No such file or directory Saving to the local filesystem LABEL=/dump e2fsck 1.38 (30-Jun-2005) /dump: clean, 17/3525120 files, 4233706/7048444 blocks Restarting system. There is no error message in the console, but vmcore is not copied on the disk. I found that makedumpfile failed with the following error message. /proc/vmcore doesn't contain vmcoreinfo. '-x' or '-i' must be specified. makedumpfile Failed. The cause of this problem is that kernel does not call crash_save_vmcoreinfo. When kdump starts by panic or sysrq-trigger, crash_save_vmcoreinfo is called by crash_kexec. But this function is not called when kdump starts by INIT. The Attached patch fixes this. Version-Release number of selected component (if applicable): 2.6.18-89.el5 (5.2 snapshot5) How reproducible: Always Steps to Reproduce: 1. Set up kdump 2. Start kdump by INIT 3. Actual results: kdump fails. Expected results: kdump succeeds. Additional info: This problem is very serious. If this problem is not fixed, there is no way to start kdump when system hangs up.
Created attachment 302355 [details] Fix machine_kdump_on_init so that it can call crash_save_vmcoreinfo
I'm hard pressed to believe that this is a regression. issuing a kdump via SAL INIT message isn't really a common way to test kdump functionality. Sysrq-C and panic issued crashes still work just as they are supposed to. While this would be nice to fix, I really don't see it as needing to get shoved into 5.2 at the last minute, especially given that this fix isn't upstream. Let me know what the upstream status is. If you like I can send it up for review, or you can, whichever you prefer. Once its in upstream I propose we then backport it for 5.3
>I'm hard pressed to believe that this is a regression. issuing a kdump via SAL >INIT message isn't really a common way to test kdump functionality. Sysrq-C and >panic issued crashes still work just as they are supposed to. I think this is a regression because kdump by INIT worked at 5.0 and 5.1. At 5.2, the new feature of supprting vmcoreinfo is added by linux-2.6-kexec-fix-vmcoreinfo-patch-that-breaks-kdump.patch, and there is a bug in the patch. Functionally INIT of ia64 corresponds to NMI of x86/x86_64, so kdump by INIT is important. We use INIT when system hangs up. Of course sysrq-trigger can be also used, but sysrq-trigger does not work if interruption is disabled. >Let me know what the upstream status is. If you like I can send it up for >review, or you can, whichever you prefer. Once its in upstream I propose we >then backport it for 5.3 I have not post this patch yet, but I'll post it to upstream soon. I understand this patch needs to be included into upstream at first, but this regression is very serious as I said above, so we need this fix at 5.2. I think the risk of including this patch is very low because this patch is one-liner patch againt arch/ia64/kernel/crash.c. Of course I'll post this patch to upstream soon for review.
It is technically a regression, yes, but its not that important: It can be used in much the same way that the nmi interrupt is used to correct deadlock on other arches. However, it is not exactly the same as the nmi_watchdog can be configured to automatically detect lockup, as far as I know INIT is a manually sent command, and while its very helpful, its not going to have as prevalent a use as its ia-32 counterpart. I'm happy to take the patch, its obviously correct, but its very late in the release cycle. Iwould just as soon wait for 5.3. If its accepted as a blocker however, I'll post now.
in kernel-2.6.18-90.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html