Bug 629003 - mkdumprd: hangs in depsolve_modlist if VirtualBox installed and initrd needs regeneration
Summary: mkdumprd: hangs in depsolve_modlist if VirtualBox installed and initrd needs ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kexec-tools
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Neil Horman
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-08-31 16:13 UTC by Charles Butterfield
Modified: 2011-01-03 17:34 UTC (History)
3 users (show)

Fixed In Version: netdump-server-0.7.16-23.el5
Clone Of:
Environment:
Last Closed: 2011-01-03 17:34:04 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
mkdumprd from RHEL6 (75.82 KB, patch)
2010-09-01 10:51 UTC, Neil Horman
no flags Details | Diff
Output of mkdumprd with tracing on in key function (5.97 MB, text/plain)
2010-09-03 00:57 UTC, Charles Butterfield
no flags Details
modified version of patch from comment 12 (1.39 KB, patch)
2010-12-15 17:56 UTC, Neil Horman
no flags Details | Diff
updated patch (2.20 KB, patch)
2010-12-16 11:51 UTC, Neil Horman
no flags Details | Diff

Description Charles Butterfield 2010-08-31 16:13:36 UTC
Description of problem: mkdumprd has multiple problems on my F13 system, and does not generate any initrd.  Recently I installed VirtualBox and the boot hung.  I traced it back to a hang in depsolve_modlist.


Version-Release number of selected component (if applicable):
  Fedora-13
  kexec-tools.x86_64 2.0.0-36.fc13
  VirtualBox-OSE.x86_64 3.2.6-2.fc13
  kmod-VirtualBox-OSE-2.6.33.8-149.fc13.x86_64.x86_64 3.2.6-1.fc13.4
  
How reproducible: TOTALLY (under the stated conditions, latent defect normally)

Steps to Reproduce:
1. yum install VirtualBox-OSE.x86_64
2. rm -f /boot/initrd*kdump.img
3. chkconfig kdump on
3. reboot
  
Actual results: Hang trying to recreated
In particular, loops incessantly with TMPINMODS stuck with the following entries that cannot be dealt with:

    /lib/modules/2.6.33.8-149.fc13.x86_64/extra/VirtualBox-OSE/vboxnetadp.ko
    /lib/modules/2.6.33.8-149.fc13.x86_64/extra/VirtualBox-OSE/vboxnetflt.ko

Both of these modules exist on my system

Expected results: No Hang
Additional info:

If your initrd already exists, this will be a long time latent problem until initrd needs to be rebuild.  On my system there are other issues (improper parsing of mdadm.conf, and calls to now-gone "nash") that prevent building an initrd.

Comment 1 Charles Butterfield 2010-08-31 16:23:35 UTC
Obvious, it would be more productive to test as follows (rather that really failing at boot).  That is what I did once I encountered the problem

1. rm -f /boot/initrd*kdump.img
2. service kdump start

Comment 2 Neil Horman 2010-09-01 10:51:24 UTC
Created attachment 442374 [details]
mkdumprd from RHEL6

I've been pretty focused on RHEL6 work lately, and I think I've already fixed this there. Can you please try this mkdumprd script? (just copy it over the F-13 mkdumprd).  I expect that will fix it.

Comment 3 Charles Butterfield 2010-09-02 02:10:33 UTC
I tried the version you posted, but unfortunately it exhibits the same failure mode.  The script gets stuck in an inifinite loop with TMPINMODS stuck with the following values:

   /lib/modules/2.6.33.8-149.fc13.x86_64/extra/VirtualBox-OSE/vboxnetadp.ko
   /lib/modules/2.6.33.8-149.fc13.x86_64/extra/VirtualBox-OSE/vboxnetflt.ko

It seems to me that the outer loop is a hang just waiting to happen.  There should be some check for "no progress" that breaks the loop.  Perhaps checking the new value of TMPINMODS with the previous value and breaking the loop if there is no change.

Comment 4 Neil Horman 2010-09-02 13:34:18 UTC
So, I just tried to recreate this on my F-13 system here, and was unable to I installed VirtualBox 3.2.8 from here:
http://download.virtualbox.org/virtualbox/3.2.8/VirtualBox-3.2-3.2.8_64453_fedora13-1.i686.rpm

And installed the latest F-13 kexec-tools (kexec-tools-2.0.0-36).  I modprobed the vbox modules and started kdump.  The initramfs built without error or hang.  I verified that the generated initramfs created and loaded the vbox drivers as well.  So I'm not quite sure whats going on here.  I did notice that during the install my vbox drivers installed to /lib/modules/`uname -r`/misc/ instead of to /lib/modules/`uname -r`/extra/VirtualBox-OSE/, not sure why that would make a difference though, as long as modprobe knows how to find the modules.  Can you please edit /sbin/mkdumprd to add a set -x to the top of the depsolve_modlist function and a set +x to the bottom of the function.  Then restart the service, let it run for a few minutes, ctrl-c out of it, and send in the output log please?  Thanks!

Comment 5 Charles Butterfield 2010-09-03 00:57:13 UTC
Created attachment 442763 [details]
Output of mkdumprd with tracing on in key function

Comment 6 Charles Butterfield 2010-09-03 00:59:28 UTC
Neil: Several notes:

1) I just realized I am getting VirtualBox-OSE from rpmfusion-free-updates.  I had not paid attention to that, just thinking, "cool it is in Fedora". My VirtualBox-OSE version is 3.2.6-2.fc13.

2) My vboxdrv files are going into /lib/modules/<uname>/extra/... as shown

/lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxdrv.ko
/lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxguest.ko
/lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxnetadp.ko
/lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxnetflt.ko
/lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxsf.ko
/lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxvideo.ko

3) I enabled tracing in depsolve_modlist, and also added "echo TMPINMODS=$TMPINMODS" at the top of the loop to help determine if when we got to the non-productive point.  I deleted the remainder of the log file after a few non-productive iterations had occurred.  (See attached).

Comment 7 Neil Horman 2010-09-15 17:58:01 UTC
Ok, I think I see whats happening here.  IT would appear that we're never satisfying any of the requirements to put any of the vbox* modules on the output module list, which means that we think we've never solved their dependencies.  Can you please, attach the output of this command:

for i in `ls /lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/*.ko`
do
     echo OUTPUT FOR $i
     mname=`basename $i | sed -e's/\.ko//'
     modprobe --show-depends $mname 2>/dev/null | awk '/insmod/ {print $2}'
     echo " "
done


That should let me figure out how I'm parsing the deptree for these modules differently.  thanks!

Comment 8 Charles Butterfield 2010-09-16 01:09:10 UTC
Here is the output you requested:

OUTPUT FOR /lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxdrv.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/extra/VirtualBox-OSE/vboxdrv.ko

OUTPUT FOR /lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxguest.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/extra/VirtualBox-OSE/vboxguest.ko

OUTPUT FOR /lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxnetadp.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/extra/VirtualBox-OSE/vboxguest.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/extra/VirtualBox-OSE/vboxnetadp.ko

OUTPUT FOR /lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxnetflt.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/extra/VirtualBox-OSE/vboxdrv.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/extra/VirtualBox-OSE/vboxguest.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/extra/VirtualBox-OSE/vboxnetflt.ko

OUTPUT FOR /lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxsf.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/extra/VirtualBox-OSE/vboxguest.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/extra/VirtualBox-OSE/vboxsf.ko

OUTPUT FOR /lib/modules/2.6.34.6-47.fc13.x86_64/extra/VirtualBox-OSE/vboxvideo.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/kernel/drivers/i2c/i2c-core.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/kernel/drivers/gpu/drm/drm.ko
/lib/modules/2.6.34.6-54.fc13.x86_64/extra/VirtualBox-OSE/vboxvideo.ko

Comment 9 Charles Butterfield 2010-09-16 01:10:34 UTC
Line wrapping often broke the "OUTPUT FOR XXX" into two lines above.

Comment 10 Neil Horman 2010-09-30 19:21:10 UTC
So I think I found at least part of the problem.  When you start the kdump service, can you tell me if the vboxguest.ko module is loaded?  Looking at the output dump you provided, vboxguest never shows up on the input module list of modules to add, which, according to the above list, it should if you have any of the vbox modules loaded.  Whats happening is that we're waiting to solve a dependency which can never be solved, but the thing we are dependent on isn't on the input list of modules.  So we need to figure out why that module isn't on the input list.  Can you please:

1) look to make sure that vboxguest.ko is loaded when the kdump service starts
2) make sure that its not changing its name during registration.

Thanks!

Comment 11 Charles Butterfield 2010-12-08 21:00:11 UTC
Neil -- I am getting back to this issue, but in the interim I have upgraded from Fedora-13 to Fedora-14.  This has introduced some more snags, as follows:

1) The F14 mkdumprd still calls "nash", so I used your attachment of 2010-09-02.
2) However that script fails with:
     "grep: character class syntax is [[:space:]], not [:space:]"

I fixed the typo at line 1352, and then get the following:

No module ARRAY found for kernel 2.6.35.9-64.fc14.x86_64, aborting.

This sounded very familiar, and I relocated https://bugzilla.redhat.com/show_bug.cgi?id=479211.  I am indeed running with an MD raid, here is my /etc/mdadm.conf:

# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md0 UUID=7a4eb642:0041f6a6:25eac650:1e2f64f0
ARRAY /dev/md127 UUID=761c32bf:2f4e01eb:c7b3da57:a14787fa

Any suggestions as to how to proceed past this little speed bump?  I have NOT yet re-installed VirtualBox -- one issue at a time :-)

Comment 12 Neil Horman 2010-12-09 17:53:53 UTC
you should just be able to apply the patch here:
https://bugzilla.redhat.com/attachment.cgi?id=328595
to /sbin/mkdumprd and that problem will be fixed.  I'll commit it shortly.

Comment 13 Neil Horman 2010-12-09 17:59:29 UTC
Actually, scratch that, are you sure you're upgrade replaced the contents of /sbin/mkdumprd properly?  I ask because the changes that caused that problem above have been removed from f14, such that that patch shouldn't be needed.  Can you try removing kexec-tools, verifying that /sbin/mkdumprd* is gone, then re-installing?

Comment 14 Charles Butterfield 2010-12-10 21:45:16 UTC
Neil -- My "upgrade" was really a fresh install (including a reformat of the root partition).  So my comment #11 was based on starting with the latest F14 copy of kexec-tools, namely kexec-tools-2.0.0-39.fc14.1.x86_64.

I tried to apply the patch in comment #12 to the F14 mkdumprd, but they were not compatible.  By hand I was able to apply the first two changes, but the third was not consistent with the existing F14 mkdumprd.

Comment 15 Neil Horman 2010-12-15 17:56:11 UTC
Created attachment 468920 [details]
modified version of patch from comment 12

here you go, I took a look at the patch, and noted some discrepancies that needed to be updated.  I still don't think it should have caused a hang, but the resultant finds for modules might have taken a hideously long time with the lack of fmPath modifications.  Anywho, let me know how this works for you.

Comment 16 Charles Butterfield 2010-12-16 04:22:09 UTC
Neil -- I applied the patch in Comment 15 to my Fedora-14 mkdumprd (kexec-tools 39.fc14.1).  The patch applies fine, but "mkdumprd" dies with the previously discussed "nash: command not found".

Here is what I did and the output:

patch < patch-file
rm /boot/initrd-2.6.35.9-64.fc14.x86_64kdump.img
service kdump start
No kdump initial ramdisk found.                            [WARNING]
Rebuilding /boot/initrd-2.6.35.9-64.fc14.x86_64kdump.img
IN HANDLERAID
/sbin/mkdumprd: line 952: nash: command not found
Starting kdump:                                            [FAILED]

I'm mystified as to how this works for you.
Regards, Charlie

Comment 17 Neil Horman 2010-12-16 11:51:58 UTC
Created attachment 469123 [details]
updated patch

IT works for me because I don't nominally use software raid.  I'm not testing this at the moment because I don't have time and I'm trying to help you as best I'm able with the cycles I have available.  Heres an updated patch that fixes the nash problem as well.

Comment 18 Charles Butterfield 2010-12-16 20:02:16 UTC
Neil -- I applied the patch in Comment 17.  That works better, although there was a dangling reference to "emitdms" at line 1819 of the patched dumprd.  I commented that out and we seem to build a dump.img sucessfully.  I'm about to reinstall VirtualBox, but wanted to report the success thus far.

Comment 19 Neil Horman 2010-12-16 20:25:10 UTC
Ok, I'll fix the dangling refrence and update the netdump-server package with this bz.  If you have subsequent problems, please open a new bug.  Thanks

Comment 20 Charles Butterfield 2010-12-16 20:34:29 UTC
Neil -- I just installed VirtualBox-OSE.x86_64 (3.2.10-1.fc14 from
rpmfusion-free-updates).  The infinite loop that we had encountered in the past
is still present.

The 2 modules that cannot be found are:
/lib/modules/2.6.35.9-64.fc14.x86_64/extra/VirtualBox-OSE/vboxnetadp.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/extra/VirtualBox-OSE/vboxnetflt.ko

Interestingly, a 3rd VirtualBox module IS found:
/lib/modules/2.6.35.9-64.fc14.x86_64/extra/VirtualBox-OSE/vboxnetflt.ko

I could see no mention of vboxguest.ko in the module names added to
"TMPINMODS".

Here is the entire initial list of modules:
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/fs/fuse/fuse.ko
/sbin/modprobe
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/fs/nfs_common/nfs_acl.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/net/sunrpc/auth_gss/auth_rpcgss.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/fs/exportfs/exportfs.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/fs/lockd/lockd.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/cpufreq/cpufreq_ondemand.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/cpufreq/freq_table.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/arch/x86/kernel/cpu/cpufreq/mperf.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/net/netfilter/nf_conntrack_netbios_ns.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/net/ipv6/netfilter/ip6t_REJECT.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/net/ipv6/netfilter/nf_conntrack_ipv6.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/net/ipv6/netfilter/ip6table_filter.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/net/ipv6/netfilter/ip6_tables.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/net/ipv6/ipv6.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/extra/VirtualBox-OSE/vboxnetadp.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/extra/VirtualBox-OSE/vboxnetflt.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/extra/VirtualBox-OSE/vboxdrv.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/arch/x86/kvm/kvm-intel.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/arch/x86/kvm/kvm.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/input/misc/uinput.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/sound/pci/hda/snd-hda-codec-analog.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/sound/pci/hda/snd-hda-intel.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/sound/pci/hda/snd-hda-codec.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/sound/core/snd-hwdep.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/sound/core/seq/snd-seq.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/sound/core/seq/snd-seq-device.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/char/ppdev.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/watchdog/iTCO_wdt.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/sound/core/snd-timer.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/i2c/busses/i2c-i801.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/net/tg3.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/parport/parport_pc.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/sound/core/snd.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/watchdog/iTCO_vendor_support.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/edac/i7core_edac.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/parport/parport.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/platform/x86/dell-wmi.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/edac/edac_core.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/input/serio/serio_raw.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/firmware/dcdbas.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/sound/soundcore.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/sound/core/snd-page-alloc.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/input/joydev.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/platform/x86/wmi.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/arch/x86/kernel/microcode.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/md/raid1.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/gpu/drm/nouveau/nouveau.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/gpu/drm/ttm/ttm.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/gpu/drm/drm_kms_helper.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/gpu/drm/drm.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/i2c/algos/i2c-algo-bit.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/acpi/video.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/video/output.ko
/lib/modules/2.6.35.9-64.fc14.x86_64/kernel/drivers/i2c/i2c-core.ko

Comment 21 Charles Butterfield 2010-12-16 20:38:38 UTC
Whoops Comments 19 and 20 collided.  I bailed out an resubmitted 20, then read 19.  Should I open a new bz for the VirtualBox issue, or did you mean anything else other that the VirtualBox issue?

Regards
-- Charlie

Comment 22 Fedora Update System 2010-12-16 20:40:42 UTC
netdump-server-0.7.16-23.el5 has been submitted as an update for Fedora EPEL 5.
https://admin.fedoraproject.org/updates/netdump-server-0.7.16-23.el5

Comment 23 Fedora Update System 2010-12-17 18:05:53 UTC
netdump-server-0.7.16-23.el5 has been pushed to the Fedora EPEL 5 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update netdump-server'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/netdump-server-0.7.16-23.el5

Comment 24 Fedora Update System 2011-01-03 17:33:57 UTC
netdump-server-0.7.16-23.el5 has been pushed to the Fedora EPEL 5 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.