Bug 368981 - kexec/kdump doesnt work on nfs root on QS21
Summary: kexec/kdump doesnt work on nfs root on QS21
Keywords:
Status: CLOSED DUPLICATE of bug 368941
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kexec-tools
Version: 5.1
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Neil Horman
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: RHEL5u2_relnotes RHEL5u3_relnotes
TreeView+ depends on / blocked
 
Reported: 2007-11-06 21:48 UTC by Scott Moser
Modified: 2010-03-14 21:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
(all architectures) Executing kdump on an IBM Bladecenter QS21 or QS22 configured with NFS root will fail. To avoid this, specify an NFS dump target in /etc/kdump.conf.
Clone Of:
Environment:
Last Closed: 2008-03-07 19:08:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
qs21-kdump-53.el5.log: log from 2.6.18-53.el5 with modified mkdumprd (7.63 KB, text/plain)
2007-12-18 22:59 UTC, Scott Moser
no flags Details
qs21-kdump-58.el5.rhel5u2.sm12.log : same as above with sm12 kernel (7.77 KB, text/plain)
2007-12-18 23:02 UTC, Scott Moser
no flags Details
netdump-log for RHEL5.2-Beta1 on QS21 (23.31 KB, application/octet-stream)
2008-03-07 10:37 UTC, omar
no flags Details

Description Scott Moser 2007-11-06 21:48:58 UTC
Description of problem:

When trying to test bug the fix in bug 313731, I tried to execute kexec/kdump on
a QS21 configured with NFS root (QS21 has no local storage).

 - in running mkdumprd handle_netdev function does not exist (bug 368941)
 - fixing above produces a working 'service kdump start', but dump is
   not taken
 - also with 'enforcing=0' kernel parameter
  $ cat /proc/cmdline:
   crashkernel=256M@16M enforcing=0
  $ touch /etc/kdump.conf
  $ service kdump restart
  $ echo c > /proc/sysrq-trigger
  - kernel starts to boot, then SOL console drops
   - last thing you see is 'md: bitmap version 4.39"
  - my assumption is that the booted kdump kernel somehow dies  after that
  - soon after (<1 minute), the tftp server sees a request for a boot
    image from the QS21 and the QS21 "comes back" but is now booting the
    non-kdump kernel
  - there is no crash in /var/crash
  - after re-connect with 'console' eventually comes back, but SOL is
    messed up, and generally unusable.  I have to power off the system, 
    detach from the bladecenter and power it back on to get the SOL back.

Comment 2 Neil Horman 2007-12-18 19:50:02 UTC
contents of /etc/kdump.conf?

Can you provide a log of what the serial console did manage to capture before it
dropped?

I expect what happened is, due to bz 313731 mkdumprd got rather confused and
produced a bad initramfs for kdump.  given the error message above it likely
thinks that you are using a software raid setup of some sort (the md utility).
At which point it fails setup, and reboots the system back to the origional
kernel.  My guess is this is a duplicate of bz 368941.  I'll leave it open until
we're sure.

No idea why the SOL would drop during reboot.  Isn't the SOL run independent of
the system in question?  I thought crashes/reboots weren't supposed to affect
the management interfaces.



Comment 3 Scott Moser 2007-12-18 22:59:47 UTC
Created attachment 289959 [details]
qs21-kdump-53.el5.log: log from 2.6.18-53.el5 with modified mkdumprd

The above is a console log of a kdump attempt on qs21 running 2.6.18-53.el5.
The suggested patch from 368941 (attachment 289946 [details]) is applied to mkdumprd.

Comment 4 Scott Moser 2007-12-18 23:02:09 UTC
Created attachment 289961 [details]
qs21-kdump-58.el5.rhel5u2.sm12.log : same as above with sm12 kernel

this is the log running the sm12 kernel (my development kernel) from
http://people.redhat.com/smoser/rhel5u2/sm12 . It contains all Cell
related fixes for RHEL5u2 (amoung other things).  There is no real
difference in the log other than the kernel used.

Comment 5 Neil Horman 2007-12-19 00:32:56 UTC
hmm, ok, this may not be a dupe after all.  Judging by those logs, we either:
1) may not be getting into the initrd at all (i.e. hanging prior to
loading/running /init in the initramfs)

2) Somehow not getting messages to the console properly, even though we are
functioning properly otherwise.

Scott, did you say I could get access to this machine to test on?  It would
probabaly be easiest if I could just have direct access to tinker for a bit, if
thats possible.  Thanks!

Comment 6 Scott Moser 2007-12-19 01:02:46 UTC
(In reply to comment #5)
> Scott, did you say I could get access to this machine to test on?  It would
> probabaly be easiest if I could just have direct access to tinker for a bit, if
> thats possible.  Thanks!

I've forwarded you info.

Comment 7 Neil Horman 2007-12-19 01:06:08 UTC
Thats right, I remember now.  Thanks!

Comment 8 Neil Horman 2007-12-19 17:39:03 UTC
FWIW it looks from my tinkering like we're not getting into the initramfs at all
yet on this system.  Depending on the iteration, we either jump back to bios
halfway through kernel init, or we try to access the initramfs, but seem to fail
the sys_access call in init()
.  blocking this on 313731

Comment 9 Neil Horman 2007-12-19 20:36:36 UTC
as per conversation with scott, I'm moving this bug to be dependent on the
correct kexec/cell bug.

Comment 18 Don Domingo 2008-02-19 00:02:46 UTC
thanks Neil, adding to RHEl5.2 release notes under "known issues":

<quote>
Executing kdump on a QS21 configured with NFS root will fail. To avoid this,
specify an NFS dump target in /etc/kdump.conf.
</quote>

please advise if any revisions are required. 

Comment 19 omar 2008-03-07 10:37:48 UTC
Created attachment 297165 [details]
netdump-log for RHEL5.2-Beta1 on QS21

I have tried to verify kdump support for RHEL5.2-Beta1(2.6.18-84.el5) on QS21.
I found that the secondary kernel has same problem booting on QS21 diskless
machine.

I performed the following steps:
=================================
- install RHEL5.2-Beta1 on QS21(2.6.18-84.el5)
- install
kernel-kdump(http://people.redhat.com/dzickus/el5/84.el5/ppc64/kernel-kdump-2.6.18-84.el5.ppc64.rpm)


- Reboot with crashkernel to kernel command line (boot net
crashkernel=256M@32M)
- set up kdump to dump to nfs mount point:
echo "net your.host.here:/your/exported/dir" >> /etc/kdump.conf
- service kdump restart
- echo 'c' > /proc/sysrq-trigger

The secondary kernel is loaded and starts booting, then the system reboots.
I found /var/crash is empty.

*Attaching the log.

--Regards
  Omar M

Comment 20 Neil Horman 2008-03-07 19:08:22 UTC
I'm closing this as a dupe of bz 368941, as they're both tracking the same
issue, and the other bz has an additional patch in it already to clean some
other cruft up.

*** This bug has been marked as a duplicate of 368941 ***

Comment 21 Don Domingo 2008-03-19 00:54:24 UTC
minor release note revision as per BZ#438030:

<quote>
Executing kdump on an IBM Bladecenter QS21 or QS22 configured with NFS root will
fail. To avoid this, specify an NFS dump target in /etc/kdump.conf.
</quote>

please advise if any further revisions are required. thanks!

Comment 22 omar 2008-03-19 07:00:28 UTC
But It didn't work for me as I said in Comment#19

Comment 23 Neil Horman 2008-03-19 11:37:48 UTC
What you encountered in comment 19 was  a different problem, one which IBM is
investigating.  The RHEL5 kernel was booting as of kernel release -65.el5, but
stopped again sometime between -65.el5 and -84.el5.  IIRC IBM is bisecting to
determine the release in which it initially (re)-broke.  If you try to boot with
kernel -65.el5 and use the config suggested by Don's release note, then all
should work quite well

Comment 24 Don Domingo 2008-04-02 02:12:06 UTC
Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Comment 25 Ryan Lerch 2008-08-08 01:00:10 UTC
Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes. 

This Release Note is currently located in the Known Issues section.

Comment 26 Ryan Lerch 2008-08-08 01:00:10 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.


Note You need to log in before you can comment on or make changes to this bug.