Bug 198956 - Kdump does not consistently dump a core file
Summary: Kdump does not consistently dump a core file
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kexec-tools
Version: 6
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Neil Horman
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-07-14 22:06 UTC by David Mair
Modified: 2009-09-09 03:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-01-18 14:30:58 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
sysreport for the system (777.42 KB, application/x-bzip2)
2006-07-14 22:06 UTC, David Mair
no flags Details

Description David Mair 2006-07-14 22:06:02 UTC
Description of problem:
kexec-kdump doesn't seem to work consistently under FC6T1.  I've only been able
to successfully test dumping a core a couple of times to ensure the setup was
correct.  After editing /etc/init.d/kdump to test dumping to an nfs share or
anywhere other than /var/crash it has stopped working altogether.  No core file,
no log file, nothing.

Version-Release number of selected component (if applicable):
kernel-kdump-2.6.17-1.2356.fc6
kexec-tools-1.101-19

How reproducible:
As of right now, everytime.

Steps to Reproduce:
1.Install FC6, kexec-tools, kdump-kernel packages (latest versions as of this BZ
I believe).
2. Follow the steps for configuring kexec/kdump as outlined here: 
http://intranet.corp.redhat.com/ic/intranet/FC5kexeckdumprocedure.html
3. Configure sysrq triggers
4. Crash the box.
  
Actual results:
Box hangs at starting udev after crashing.  Last output to console is the
following (which I also get when the system successfully boots).
Starting udev: udevd[493]: add_to_rules: unknown key 'MODALIAS'
udevd[493]: add_to_rules: unknown key 'MODALIAS'
udevd[493]: add_to_rules: unknown key 'MODALIAS'

I did receive the following once when I attempted to enter interactive startup
to see if that had any effect.

Disabling IRQ #16

Expected results:
Box should create a core file.

Additional info:

I tried sending this to tech-list internally (thread below).  My testing was to
get kdump to dump the core file over to another box rather than keeping it
locally after booting into the kexec kernel.  Even returning /etc/init.d/kdump
to it's original state the kexec kernel hangs when trying to boot after the crash.

> dmair> I've found documentation on setting up kdump on the Intranet.  It
> dmair> was straight-forward enough.  The one thing I'm missing is setting
> dmair> up kdump to do network dumps instead of to local disk.  Maybe I'm
> dmair> missing something in the docs:
> dmair>
>

>
> dmair> Is there additional info somewhere on how to configure for network
> dmair> dumps or is it as simple as pointing to an nfs share somewhere?
>

>
> dmair> Well, the goal would be to have it dump the core file directly to a
> dmair> central system much like netdump currently allows for.  I guess what
> dmair> is not clear to me is exactly how this is done automatically.
> dmair> Rather than kexec/kdump dumping the core to /var/crash on the local
> dmair> system we want it to dump the core directly to a remote system.
>
> As it stands today, you'd have to modify the kdump init script.  Take a
> look at /etc/init.d/kdump, in the function save_core.  You can change that
> to do whatever you like.
>
> -Jeff

Okay, so I've done this but I think I may not be doing it correctly.  Here's 
what I have in /etc/init.d/kdump:

function save_core()
{
        coredir="172.16.59.50:/var/crash/`date +"%Y-%m-%d-%H:%M"`"

        mkdir -p $coredir
        cp /proc/vmcore $coredir/vmcore
}

What I wind up with is /172.16.59.50:/var/crash/foo

I don't think that having the directory mounted before hand is going to make a 
difference and I don't think it being automounted is going to work either 
since that wouldn't be started before kdump runs.  What am I missing here?

My ultimate goal here is to be able to provide a procedure for my customer to 
test this in their environment so that they can understand how to use this in 
their environment where they have upwards of ~4000 systems to configure... 
local vmcores aren't practical for them since these systems are compute nodes 
and often won't have sufficient local space for full core files.  They want 
to make sure that this will work for them when it comes time to move to RHEL 
5.

Comment 1 David Mair 2006-07-14 22:06:02 UTC
Created attachment 132468 [details]
sysreport for the system


Note You need to log in before you can comment on or make changes to this bug.