Description of problem: kexec-kdump doesn't seem to work consistently under FC6T1. I've only been able to successfully test dumping a core a couple of times to ensure the setup was correct. After editing /etc/init.d/kdump to test dumping to an nfs share or anywhere other than /var/crash it has stopped working altogether. No core file, no log file, nothing. Version-Release number of selected component (if applicable): kernel-kdump-2.6.17-1.2356.fc6 kexec-tools-1.101-19 How reproducible: As of right now, everytime. Steps to Reproduce: 1.Install FC6, kexec-tools, kdump-kernel packages (latest versions as of this BZ I believe). 2. Follow the steps for configuring kexec/kdump as outlined here: http://intranet.corp.redhat.com/ic/intranet/FC5kexeckdumprocedure.html 3. Configure sysrq triggers 4. Crash the box. Actual results: Box hangs at starting udev after crashing. Last output to console is the following (which I also get when the system successfully boots). Starting udev: udevd[493]: add_to_rules: unknown key 'MODALIAS' udevd[493]: add_to_rules: unknown key 'MODALIAS' udevd[493]: add_to_rules: unknown key 'MODALIAS' I did receive the following once when I attempted to enter interactive startup to see if that had any effect. Disabling IRQ #16 Expected results: Box should create a core file. Additional info: I tried sending this to tech-list internally (thread below). My testing was to get kdump to dump the core file over to another box rather than keeping it locally after booting into the kexec kernel. Even returning /etc/init.d/kdump to it's original state the kexec kernel hangs when trying to boot after the crash. > dmair> I've found documentation on setting up kdump on the Intranet. It > dmair> was straight-forward enough. The one thing I'm missing is setting > dmair> up kdump to do network dumps instead of to local disk. Maybe I'm > dmair> missing something in the docs: > dmair> > > > dmair> Is there additional info somewhere on how to configure for network > dmair> dumps or is it as simple as pointing to an nfs share somewhere? > > > dmair> Well, the goal would be to have it dump the core file directly to a > dmair> central system much like netdump currently allows for. I guess what > dmair> is not clear to me is exactly how this is done automatically. > dmair> Rather than kexec/kdump dumping the core to /var/crash on the local > dmair> system we want it to dump the core directly to a remote system. > > As it stands today, you'd have to modify the kdump init script. Take a > look at /etc/init.d/kdump, in the function save_core. You can change that > to do whatever you like. > > -Jeff Okay, so I've done this but I think I may not be doing it correctly. Here's what I have in /etc/init.d/kdump: function save_core() { coredir="172.16.59.50:/var/crash/`date +"%Y-%m-%d-%H:%M"`" mkdir -p $coredir cp /proc/vmcore $coredir/vmcore } What I wind up with is /172.16.59.50:/var/crash/foo I don't think that having the directory mounted before hand is going to make a difference and I don't think it being automounted is going to work either since that wouldn't be started before kdump runs. What am I missing here? My ultimate goal here is to be able to provide a procedure for my customer to test this in their environment so that they can understand how to use this in their environment where they have upwards of ~4000 systems to configure... local vmcores aren't practical for them since these systems are compute nodes and often won't have sufficient local space for full core files. They want to make sure that this will work for them when it comes time to move to RHEL 5.
Created attachment 132468 [details] sysreport for the system