Red Hat Bugzilla – Bug 719105
mkdumprd hang in depsolve_modlist when running kernel is not target kernel
Last modified: 2013-09-29 22:26:41 EDT
Description of problem: This is related to Bug 626606. Mkdumprd can still hang in depsolve_modlist. Version-Release number of selected component (if applicable): kexec-tools-2.0.0-188.el6.x86_64 How reproducible: Seen many times here. Believe I understand what is happening now. Steps to Reproduce: 1. Install the stock RH 6.1 kernel. 2. Install our expedient kernel which has a necessary md fix for Bug 707268. 3. Without rebooting, install our occluding versions of drivers which pick up bug fixes our customers need. One set of these is the fusion drivers, mptbase.ko, mptsas.ko, and mptscsih.ko. Our mptsas introduces a dependency on scsi_transport_sas which the original driver did not have. 4. Still without rebooting, add some required kdump pre and post scripts and restart the kdump service. This runs mkdumprd against the expedient kernel. This is done by our installation scripts. Actual results: mkdumprd hangs looping in depsolve_modlist, unable to deal with the dependencies of the three fusion drivers listed above. The input module list never empties out. Expected results: This should work. Additional info: The problem appears to be due to the use of modprobe --show-depends in mkdumprd. In some places it uses modprobe --set-version $kernel to run modprobe against the kernel it is building for, and in other places it omits that parameter and thus is using the running kernel instead. In particular: moduledep does not use --set-version depsolve_modlist does not use --set-version (in two places) findmodule does use --set-version findstoragedriverinsys does use --set-version findnetdriver does use --set-version
Created attachment 511448 [details] Proposed patch Does adding '--set-version $kernel' to all those places fix the problem? Like this patch.
I'm still trying to verify it; I'll get back to you.
The patch is on the right track, but it does not completely fix the problem of using data for the modules of the wrong kernel when more than one kernel is installed. If you are doing mkdumprd for a kernel other than the one that is booted, the lsmod in /sbin/mkdumprd still gets data for the booted kernel, not the one you are building the kdump initrd for. What I have installed right now on my system is a stock 6.1 kernel and our expedient kernel with the required fix in it, and that expedient kernel is what is booted. The expedient kernel also has the driver modules that we have to patch occluded (that is, there are .ko files in /lib/modules/2.6.32-131.../extra/... of the same names) because I installed them there, but the stock kernel does not. It happens that mopprobe --show-depends mptsas shows four things for the stock kernel but five for the expedient kernel since our fix to it added a reference to some symbol in scsi_hbas.ko. So the loop in depsolve_modlist never empties out the incoming list and so loops forever.
Never mind about the lsmod; it's OK as is. I finally reproduced the exact scenario, and your patch (adding the --set-version $kernel in the three places where it was missing originally) does fix it.
Thanks, Charlotte. If you can provide the steps of reproducing this bug, it would be helpful.
Based on comment#8, comment#10 and Kdump Tier1 test against RHEL6.2-20110907.1+kernel-2.6.32-197.el6+kexec-tools-2.0.0-199.el6. No regression found. https://tcms.engineering.redhat.com/run/26802/ Change status to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1532.html