Bug 719105 - mkdumprd hang in depsolve_modlist when running kernel is not target kernel
Summary: mkdumprd hang in depsolve_modlist when running kernel is not target kernel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kexec-tools
Version: 6.1
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Cong Wang
QA Contact: Chao Ye
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-05 19:23 UTC by Charlotte Richardson
Modified: 2013-09-30 02:26 UTC (History)
8 users (show)

Fixed In Version: kexec-tools-2_0_0-194_el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 18:19:27 UTC


Attachments (Terms of Use)
Proposed patch (1.71 KB, patch)
2011-07-06 07:36 UTC, Cong Wang
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1532 normal SHIPPED_LIVE Moderate: kexec-tools security, bug fix, and enhancement update 2011-12-06 01:01:52 UTC
Red Hat Bugzilla 959802 None CLOSED Under kernel-rt, mkdumprd hangs in depsolve_modlist function due to different dependencies 2019-08-29 12:34:11 UTC

Internal Links: 959802

Description Charlotte Richardson 2011-07-05 19:23:25 UTC
Description of problem:
This is related to Bug 626606. Mkdumprd can still hang in depsolve_modlist.


Version-Release number of selected component (if applicable):
kexec-tools-2.0.0-188.el6.x86_64


How reproducible:
Seen many times here. Believe I understand what is happening now.


Steps to Reproduce:
1. Install the stock RH 6.1 kernel.
2. Install our expedient kernel which has a necessary md fix for Bug 707268.
3. Without rebooting, install our occluding versions of drivers which pick up bug fixes our customers need. One set of these is the fusion drivers, mptbase.ko, mptsas.ko, and mptscsih.ko. Our mptsas introduces a dependency on scsi_transport_sas which the original driver did not have.
4. Still without rebooting, add some required kdump pre and post scripts and restart the kdump service. This runs mkdumprd against the expedient kernel. This is done by our installation scripts.
  
Actual results:
mkdumprd hangs looping in depsolve_modlist, unable to deal with the dependencies of the three fusion drivers listed above. The input module list never empties out.


Expected results:
This should work.


Additional info:
The problem appears to be due to the use of modprobe --show-depends in mkdumprd. In some places it uses modprobe --set-version $kernel to run modprobe against the kernel it is building for, and in other places it omits that parameter and thus is using the running kernel instead. In particular:
moduledep does not use --set-version
depsolve_modlist does not use --set-version (in two places)
findmodule does use --set-version
findstoragedriverinsys does use --set-version
findnetdriver does use --set-version

Comment 2 Cong Wang 2011-07-06 07:36:45 UTC
Created attachment 511448 [details]
Proposed patch

Does adding '--set-version $kernel' to all those places fix the problem? Like this patch.

Comment 3 Charlotte Richardson 2011-07-08 12:47:58 UTC
I'm still trying to verify it; I'll get back to you.

Comment 4 Charlotte Richardson 2011-07-08 19:33:59 UTC
The patch is on the right track, but it does not completely fix the problem of using data for the modules of the wrong kernel when more than one kernel is installed. If you are doing mkdumprd for a kernel other than the one that is booted, the lsmod in /sbin/mkdumprd still gets data for the booted kernel, not the one you are building the kdump initrd for.

What I have installed right now on my system is a stock 6.1 kernel and our expedient kernel with the required fix in it, and that expedient kernel is what is booted. The expedient kernel also has the driver modules that we have to patch occluded (that is, there are .ko files in /lib/modules/2.6.32-131.../extra/... of the same names) because I installed them there, but the stock kernel does not. It happens that mopprobe --show-depends mptsas shows four things for the stock kernel but five for the expedient kernel since our fix to it added a reference to some symbol in scsi_hbas.ko. So the loop in depsolve_modlist never empties out the incoming list and so loops forever.

Comment 5 Charlotte Richardson 2011-07-11 19:27:46 UTC
Never mind about the lsmod; it's OK as is. I finally reproduced the exact scenario, and your patch (adding the --set-version $kernel in the three places where it was missing originally) does fix it.

Comment 6 Cong Wang 2011-07-15 12:40:24 UTC
Thanks, Charlotte.
If you can provide the steps of reproducing this bug, it would be helpful.

Comment 12 Chao Ye 2011-09-19 06:21:04 UTC
Based on comment#8, comment#10 and Kdump Tier1 test against RHEL6.2-20110907.1+kernel-2.6.32-197.el6+kexec-tools-2.0.0-199.el6. No regression found.
https://tcms.engineering.redhat.com/run/26802/

Change status to VERIFIED.

Comment 13 errata-xmlrpc 2011-12-06 18:19:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1532.html


Note You need to log in before you can comment on or make changes to this bug.