Description of problem: When trying to test bug the fix in bug 313731, I tried to execute kexec/kdump on a QS21 configured with NFS root (QS21 has no local storage). - in running mkdumprd handle_netdev function does not exist (bug 368941) - fixing above produces a working 'service kdump start', but dump is not taken - also with 'enforcing=0' kernel parameter $ cat /proc/cmdline: crashkernel=256M@16M enforcing=0 $ touch /etc/kdump.conf $ service kdump restart $ echo c > /proc/sysrq-trigger - kernel starts to boot, then SOL console drops - last thing you see is 'md: bitmap version 4.39" - my assumption is that the booted kdump kernel somehow dies after that - soon after (<1 minute), the tftp server sees a request for a boot image from the QS21 and the QS21 "comes back" but is now booting the non-kdump kernel - there is no crash in /var/crash - after re-connect with 'console' eventually comes back, but SOL is messed up, and generally unusable. I have to power off the system, detach from the bladecenter and power it back on to get the SOL back.
contents of /etc/kdump.conf? Can you provide a log of what the serial console did manage to capture before it dropped? I expect what happened is, due to bz 313731 mkdumprd got rather confused and produced a bad initramfs for kdump. given the error message above it likely thinks that you are using a software raid setup of some sort (the md utility). At which point it fails setup, and reboots the system back to the origional kernel. My guess is this is a duplicate of bz 368941. I'll leave it open until we're sure. No idea why the SOL would drop during reboot. Isn't the SOL run independent of the system in question? I thought crashes/reboots weren't supposed to affect the management interfaces.
Created attachment 289959 [details] qs21-kdump-53.el5.log: log from 2.6.18-53.el5 with modified mkdumprd The above is a console log of a kdump attempt on qs21 running 2.6.18-53.el5. The suggested patch from 368941 (attachment 289946 [details]) is applied to mkdumprd.
Created attachment 289961 [details] qs21-kdump-58.el5.rhel5u2.sm12.log : same as above with sm12 kernel this is the log running the sm12 kernel (my development kernel) from http://people.redhat.com/smoser/rhel5u2/sm12 . It contains all Cell related fixes for RHEL5u2 (amoung other things). There is no real difference in the log other than the kernel used.
hmm, ok, this may not be a dupe after all. Judging by those logs, we either: 1) may not be getting into the initrd at all (i.e. hanging prior to loading/running /init in the initramfs) 2) Somehow not getting messages to the console properly, even though we are functioning properly otherwise. Scott, did you say I could get access to this machine to test on? It would probabaly be easiest if I could just have direct access to tinker for a bit, if thats possible. Thanks!
(In reply to comment #5) > Scott, did you say I could get access to this machine to test on? It would > probabaly be easiest if I could just have direct access to tinker for a bit, if > thats possible. Thanks! I've forwarded you info.
Thats right, I remember now. Thanks!
FWIW it looks from my tinkering like we're not getting into the initramfs at all yet on this system. Depending on the iteration, we either jump back to bios halfway through kernel init, or we try to access the initramfs, but seem to fail the sys_access call in init() . blocking this on 313731
as per conversation with scott, I'm moving this bug to be dependent on the correct kexec/cell bug.
thanks Neil, adding to RHEl5.2 release notes under "known issues": <quote> Executing kdump on a QS21 configured with NFS root will fail. To avoid this, specify an NFS dump target in /etc/kdump.conf. </quote> please advise if any revisions are required.
Created attachment 297165 [details] netdump-log for RHEL5.2-Beta1 on QS21 I have tried to verify kdump support for RHEL5.2-Beta1(2.6.18-84.el5) on QS21. I found that the secondary kernel has same problem booting on QS21 diskless machine. I performed the following steps: ================================= - install RHEL5.2-Beta1 on QS21(2.6.18-84.el5) - install kernel-kdump(http://people.redhat.com/dzickus/el5/84.el5/ppc64/kernel-kdump-2.6.18-84.el5.ppc64.rpm) - Reboot with crashkernel to kernel command line (boot net crashkernel=256M@32M) - set up kdump to dump to nfs mount point: echo "net your.host.here:/your/exported/dir" >> /etc/kdump.conf - service kdump restart - echo 'c' > /proc/sysrq-trigger The secondary kernel is loaded and starts booting, then the system reboots. I found /var/crash is empty. *Attaching the log. --Regards Omar M
I'm closing this as a dupe of bz 368941, as they're both tracking the same issue, and the other bz has an additional patch in it already to clean some other cruft up. *** This bug has been marked as a duplicate of 368941 ***
minor release note revision as per BZ#438030: <quote> Executing kdump on an IBM Bladecenter QS21 or QS22 configured with NFS root will fail. To avoid this, specify an NFS dump target in /etc/kdump.conf. </quote> please advise if any further revisions are required. thanks!
But It didn't work for me as I said in Comment#19
What you encountered in comment 19 was a different problem, one which IBM is investigating. The RHEL5 kernel was booting as of kernel release -65.el5, but stopped again sometime between -65.el5 and -84.el5. IIRC IBM is bisecting to determine the release in which it initially (re)-broke. If you try to boot with kernel -65.el5 and use the config suggested by Don's release note, then all should work quite well
Hi, the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at which point no further additions or revisions will be entertained. a mockup of the RHEL5.2 release notes can be viewed at the following link: http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html please use the aforementioned link to verify if your bugzilla is already in the release notes (if it needs to be). each item in the release notes contains a link to its original bug; as such, you can search through the release notes by bug number. Cheers, Don
Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes. This Release Note is currently located in the Known Issues section.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team.