Bug 626606

Summary: mkdumprd does not complete and hangs forever
Product: Red Hat Enterprise Linux 6 Reporter: Babu Moger <babu.moger>
Component: kexec-toolsAssignee: Cong Wang <amwang>
Status: CLOSED ERRATA QA Contact: Petr Beňas <pbenas>
Severity: medium Docs Contact:
Priority: high    
Version: 6.0CC: amwang, charlotte.richardson, dl-iop-bugzilla, nhorman, pbenas, phan, pstehlik, rkhan, tgraf
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kexec-tools-2_0_0-146_el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 14:15:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Logs running mkdumprd
none
Proposed patch none

Description Babu Moger 2010-08-23 22:56:38 UTC
Created attachment 440517 [details]
Logs running mkdumprd

Description of problem:
I am trying to create a kdump image by running mkdumprd. My purpose is to create a custome kdump image after adding some new drivers. Now I found that basic mkdumprd command itself is not working.. 

I ran the following command..

#mkdumprd -v -f /boot/initrd-2.6.32-44.el6.x86_64kdump1.img `uname -r`

Found that this command never completes.  

Version-Release number of selected component (if applicable):

#uname -a
Linux myMachine 2.6.32-59.el6.x86_64 #1 SMP Wed Aug 4 12:47:47 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:

Consistantly everytime..


Steps to Reproduce:
1. Install RHEL 6.
2. Run the command mkdumprd.
  
Actual results:

Command never completes..

Expected results:

Command completes and generates the kdump image..

Additional info:

I have attached the logs after running this command.

Comment 2 Cong Wang 2010-08-26 04:38:14 UTC
Created attachment 441108 [details]
Proposed patch

This patch should fix your problem.

Comment 3 Babu Moger 2010-08-26 15:11:22 UTC
Yes,  Patch fixes the problem.. Thanks for quick response..

Comment 4 Neil Horman 2010-09-02 19:48:23 UTC
Note: The easy workaround for this bug, if encountered in the field, until it is fixed properly in 6.1, is to create an empty /etc/modprobe.conf file, or empty /etc/modprobe.d/modprobe.conf file via:
touch /etc/modprobe.d/modprobe.conf

Comment 7 Petr Beňas 2011-02-17 11:11:08 UTC
Reproduced in kexec-tools-2.0.0-145.el6.x86_64 and verified in kexec-tools-2.0.0-161.el6.x86_64.

Comment 8 errata-xmlrpc 2011-05-19 14:15:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0736.html

Comment 9 Charlotte Richardson 2011-06-30 14:18:36 UTC
This still happens on RHEL 6.1 GA, which has kexec-tools-2.0.0-188.el6. It is not reproducible on demand for us, at least not yet, probably meaning it is a race condition due to having 24 CPUs running. I verified that the mkdumprd does include the suggested patch.

Comment 10 Cong Wang 2011-07-04 03:11:42 UTC
Charlotte, if you find some way to reproduce it, please file a new BZ.
Thanks.

Comment 11 Charlotte Richardson 2011-07-05 13:10:34 UTC
Will do. I've never managed to reproduce it myself, but three different engineers have seen it, one of them twice, when IPLing 6.1 systems. It appears that the loop at depsolve_modlist never exits, and it seems to have something to do with having installed some patched drivers that occlude the standard ones by being in /lib/modules/<kernel>/updates. In one case we know that mkdumprd got hung up there trying to resolve dependencies for the fusion drivers we have to patch, which are mptbase, mptsas, and mptscsih. Unknown what happened in the other cases. The problem never reproduces even when the same person reinstalls the same system in the exact same way, which makes it feel like a timing problem to me (these are 24-core systems, so plenty of chances for those sorts of bugs). That's all I know at present, sorry!

Comment 12 Charlotte Richardson 2011-07-05 19:24:36 UTC
See Bug 719105.