Description of problem: This is a problem we used to hit a lot in normal bootup but was resolved by recent versions of mkinitrd. We need to do something similar in mkdumprd. in RHEL5 when a driver module is loaded it starts scanning for luns in another thread. During bootup it is important to be sure that all the drivers are done scanning before it tries to mount root filesystem or start LVM. This is seen most often on large systems with lots of storage (i.e. a SAN environment). The hack in mkinitrd basically just looks at /proc/scsi/scsi and waits for it to stop changing. I am working on a patch for mkdumprd now that mimics what mkinitrd does. It won't be the same fix because they fixed it by adding a command to the nash shell to handle this but it will be the same idea. I will attach it here once I test it out. Version-Release number of selected component (if applicable): kexec-tools-1.102pre-21.el5 How reproducible: difficult unless you have a big system with a lot of storage. On big systems it can be nearly 100% of the time. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I understand your pain here, but I have to ask: Is waiting for /proc/scsi/scsi to quiesce really that safe a way to make this work? I've solved this problem in what I think is a safer (if far less automatic) way. We added in RHEL5.2 a kdump_pre and kdump_post directive, which allows you to specify a script to run immediately prior to core collection. The kdump_pre directive in particular is usefull here as it allows you to write a script that waits for a specific drive to become available using whatever detection means are most relevant to the environment at hand. Granted, the implication here is that a custom script has to be tailored for any environment that experiences this problem, but I'm not entirely sure that polling /proc/scsi/scsi for a no-change time threshold is much better. If you would please, take a look at the kdump_pre directive (I'm happy to write a script for you to target a particular system if you like), and see if it doesn't solve your problem. If you're not happy with the results,, we can look at the solution you are proposing above. Thanks!
As far as the /proc/scsi/scsi method being safe. no it is not the "best" way but this is how RHEL5.X _always_ boots. More recent upstream kernels have a better fix which I suggested we backport to RHEL but I forget the reasons that was rejected. We have not seen problems with this method yet to the best of my knowledge. We had LOTS of problems before that hack was put into the initrd. I suppose the kdump_pre solution would work but are you saying we would rely on the customer to do that? That is not an acceptable solution. If the system boots OK in a normal situation without the user needing to hack it it should boot OK under kdump.
Created attachment 305363 [details] patch to delay boot while scsi drivers initalize Yeah, my solutions requires customization for environments with this problem. but I had decided that was ok, since all our docs tell the users they should test their configuration for kdump before deploying (lest they reserve too little memory and oom kill themselves on kdump boot). If mkinitrd does this for normal boot, I guess it would be alright to include in mkdumprd (hack though it is). The attached patch should mimic the stabilize functionality in nash. Can you test it out and confirm that it fixes the problem for you? Thanks!
Created attachment 305402 [details] updated patch with backticks and $ properly escaped new patch with proper escape sequences
Neil, Just as I remember when we had to fix this in mkinitrd I am finding this is ugly. Still not working with the latest patch, I will continue to investigate and hopefully have a reliable solution soon. In the meantime I am going to assign this to myself since I have the hardware to reproduce it. - Doug
Created attachment 305525 [details] fix race condition on kdump boot This solution seems to work. I looked into mkinitrd some more and found the biggest difference is mkinitrd probes for what driver it needs for the root device _before_ it looks at /etc/modules.conf. This patch moves the code that probes what is needed for the root filesystem earlier so that the critical modules get loaded first. This is what allows mkinitrd to avoid the race we are seeing. I also noticed that it looks like mkdumprd stated life out as just a modified version of mkinitrd. Has there been any though into splitting out the common bits of both into a new file so that we can maintain it more easily? I found there were a lot of other little changes that have been made to mkinitrd that probably should be included in mkdumprd as well. I have tested this on my rx4640 set up to boot from a lun off of a qlogic FC card. Without this patch it tries to mount "/" before it is probed, with the patch it boots cleanly.
So, I'm looking at this patch, and it just seems to me that, as you say, it just loads scsi hba modules earlier in the boot process. The only thing that I see that doing is providing more time before we start checking /proc/scsi/scsi for changes. That just seems soooooo hackish, weather or not mkinitrd does it too. If thats the case I'd just as soon increase the polling interval on the last chunk (which unless, I'm mistaken, will accomplish the same goal). Although honestly, i feel like we need to have a better solution. In fact, I think I have an idea for one. We can, when we start the kdump service, record the devices in /sys/block, filtered by the devices that we need to talk to ocnfigured dump targets. Then we can just poll untill all those devices are present in /sys/block on kdump boot. I'll put a patch together and post it shortly.
Created attachment 305717 [details] patch to provide initramfs with list of critical block devices Ok, so here's a patch with the idea I had. When mkdumprd runs and generates a list of the modules needed to access the drives that we need to successfully complete the dump capture process, it (with this patch), will also record the names of the corresponding block devices, as they appear in /sys/block. This file is then stored in the initramfs, and queried on bootup, after all the modules are loaded. Booting pauses, and loops until all the named block devices appear in sysfs. While this doesn't wait for all devices to appear, it will now wait until at least until the critical needed devices are present before attempting a dump capture. I tried it on a local test system, and it worked well. If you could please test it on your system, which we know to suffer from this problem in kdump, and confirm the fix, I'll get it checked in ASAP. Thanks!
Works on my rx4640 booting using the qlogic card (the case that failed before). I will test in some other cases but I think this is the only "nasty" one likely to break. thanks! - Doug
This RFE has been reviewed during the RHEL RFE review with Red Hat product management. This request has been *tentatively* approved for inclusion in the next update. This decision is not final and still pends further technical review and scoping by Red Hat development engineering.
fixed in -22.el5 thanks!
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0105.html