Bug 719105

Summary: mkdumprd hang in depsolve_modlist when running kernel is not target kernel
Product: Red Hat Enterprise Linux 6 Reporter: Charlotte Richardson <charlotte.richardson>
Component: kexec-toolsAssignee: Cong Wang <amwang>
Status: CLOSED ERRATA QA Contact: Chao Ye <cye>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: charlotte.richardson, cye, jparadis, kevin.paetzold, qcai, rkhan, robert.evans, wgomerin
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kexec-tools-2_0_0-194_el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 18:19:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Proposed patch none

Description Charlotte Richardson 2011-07-05 19:23:25 UTC
Description of problem:
This is related to Bug 626606. Mkdumprd can still hang in depsolve_modlist.


Version-Release number of selected component (if applicable):
kexec-tools-2.0.0-188.el6.x86_64


How reproducible:
Seen many times here. Believe I understand what is happening now.


Steps to Reproduce:
1. Install the stock RH 6.1 kernel.
2. Install our expedient kernel which has a necessary md fix for Bug 707268.
3. Without rebooting, install our occluding versions of drivers which pick up bug fixes our customers need. One set of these is the fusion drivers, mptbase.ko, mptsas.ko, and mptscsih.ko. Our mptsas introduces a dependency on scsi_transport_sas which the original driver did not have.
4. Still without rebooting, add some required kdump pre and post scripts and restart the kdump service. This runs mkdumprd against the expedient kernel. This is done by our installation scripts.
  
Actual results:
mkdumprd hangs looping in depsolve_modlist, unable to deal with the dependencies of the three fusion drivers listed above. The input module list never empties out.


Expected results:
This should work.


Additional info:
The problem appears to be due to the use of modprobe --show-depends in mkdumprd. In some places it uses modprobe --set-version $kernel to run modprobe against the kernel it is building for, and in other places it omits that parameter and thus is using the running kernel instead. In particular:
moduledep does not use --set-version
depsolve_modlist does not use --set-version (in two places)
findmodule does use --set-version
findstoragedriverinsys does use --set-version
findnetdriver does use --set-version

Comment 2 Cong Wang 2011-07-06 07:36:45 UTC
Created attachment 511448 [details]
Proposed patch

Does adding '--set-version $kernel' to all those places fix the problem? Like this patch.

Comment 3 Charlotte Richardson 2011-07-08 12:47:58 UTC
I'm still trying to verify it; I'll get back to you.

Comment 4 Charlotte Richardson 2011-07-08 19:33:59 UTC
The patch is on the right track, but it does not completely fix the problem of using data for the modules of the wrong kernel when more than one kernel is installed. If you are doing mkdumprd for a kernel other than the one that is booted, the lsmod in /sbin/mkdumprd still gets data for the booted kernel, not the one you are building the kdump initrd for.

What I have installed right now on my system is a stock 6.1 kernel and our expedient kernel with the required fix in it, and that expedient kernel is what is booted. The expedient kernel also has the driver modules that we have to patch occluded (that is, there are .ko files in /lib/modules/2.6.32-131.../extra/... of the same names) because I installed them there, but the stock kernel does not. It happens that mopprobe --show-depends mptsas shows four things for the stock kernel but five for the expedient kernel since our fix to it added a reference to some symbol in scsi_hbas.ko. So the loop in depsolve_modlist never empties out the incoming list and so loops forever.

Comment 5 Charlotte Richardson 2011-07-11 19:27:46 UTC
Never mind about the lsmod; it's OK as is. I finally reproduced the exact scenario, and your patch (adding the --set-version $kernel in the three places where it was missing originally) does fix it.

Comment 6 Cong Wang 2011-07-15 12:40:24 UTC
Thanks, Charlotte.
If you can provide the steps of reproducing this bug, it would be helpful.

Comment 12 Chao Ye 2011-09-19 06:21:04 UTC
Based on comment#8, comment#10 and Kdump Tier1 test against RHEL6.2-20110907.1+kernel-2.6.32-197.el6+kexec-tools-2.0.0-199.el6. No regression found.
https://tcms.engineering.redhat.com/run/26802/

Change status to VERIFIED.

Comment 13 errata-xmlrpc 2011-12-06 18:19:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1532.html