Red Hat Bugzilla – Bug 76237
losetup -d hangs when installing 2.4.18-17.8.0
Last modified: 2008-08-01 12:22:52 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0rc3) Gecko/20020523
Description of problem:
Attempting to install the RH8 errata kernel on Dell Poweredge 1650. Currently
running kernel-2.4.18-14.i686. The mkinitrd step hangs on losetup -d. see ps
-Hwefl output below. /proc/mounts shows /dev/loop0 unmounted.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. rpm -ivh kernel-2.4.18-17.8.0.i686.rpm
Actual Results: F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME
TTY TIME CMD
100 S root 20556 18138 0 80 5 - 2243 pause 10:43 pts/2 00:00:02
rpm -ivh kernel-2.4.18-17.8.0.i686.rpm
000 S root 20561 20556 0 80 5 - 957 wait4 10:44 pts/2 00:00:00
/bin/sh /var/tmp/rpm-tmp.34824 2
000 S root 20578 20561 0 80 5 - 968 wait4 10:44 pts/2 00:00:00
/bin/bash /sbin/new-kernel-pkg --mkinitrd --depmod --install
000 S root 20581 20578 0 80 5 - 978 wait4 10:44 pts/2 00:00:00
/bin/bash /sbin/mkinitrd -f /boot/initrd-2.4.18-17.8.0.img
100 R root 20907 20581 99 90 5 - 784 - 10:44 pts/2 00:50:42
000 D root 20912 20581 0 81 5 - 776 down 10:46 pts/2 00:00:00
losetup -d /dev/loop0
Expected Results: losetup -d succeeds.
Side note: We have a local "tripwire-like" script that does md5sums. Last night
it produced a bogus md5sum for /boot/grub/stage2; today it shows the correct
value, same as /usr/share/grub/i386-pc/stage2. /boot partition has clean fsck.
This machine ran memtest86 for four hours before RH8 was loaded on it. I will
run it again, but I wanted to submit this before rebooting the machine.
loop0 S 00000002 0 20832 1 23497 (L-TLB)
Call Trace: [<c0107d31>] __down_interruptible [kernel] 0x71 (0xde2dbf84))
[<c0107ddf>] __down_failed_interruptible [kernel] 0x7 (0xde2dbfac))
[<e0926100>] .text.lock.loop [loop] 0x55 (0xde2dbfb8))
[<e0924bf0>] loop_thread [loop] 0x0 (0xde2dbfc8))
[<c010744e>] kernel_thread [kernel] 0x2e (0xde2dbff0))
[<e0924bf0>] loop_thread [loop] 0x0 (0xde2dbff8))
umount R current 0 20907 20581 20912 (NOTLB)
Call Trace: [<c014182a>] invalidate_bdev [kernel] 0x5a (0xdd71bf04))
[<c01460ab>] kill_bdev [kernel] 0x1b (0xdd71bf38))
[<c0146f2d>] blkdev_put [kernel] 0xad (0xdd71bf4c))
[<c014501a>] remove_super [kernel] 0x6a (0xdd71bf68))
[<c0157e7f>] sys_umount [kernel] 0x3f (0xdd71bf80))
[<c0157ef7>] sys_oldumount [kernel] 0x17 (0xdd71bfb4))
[<c010910f>] system_call [kernel] 0x33 (0xdd71bfc0))
losetup D DFF1EF68 2640 20912 20581 20907 (NOTLB)
Call Trace: [<c0107c7a>] __down [kernel] 0x6a (0xd7991ee4))
[<c0107dd4>] __down_failed [kernel] 0x8 (0xd7991f08))
[<c0146e30>] blkdev_open [kernel] 0x0 (0xd7991f10))
[<c01470af>] .text.lock.block_dev [kernel] 0x5 (0xd7991f18))
[<c0146e68>] blkdev_open [kernel] 0x38 (0xd7991f38))
[<c013ee5b>] dentry_open [kernel] 0x14b (0xd7991f50))
[<c013ed08>] filp_open [kernel] 0x68 (0xd7991f70))
[<c013f133>] sys_open [kernel] 0x53 (0xd7991fa8))
[<c010910f>] system_call [kernel] 0x33 (0xd7991fc0))
The md5sum difference is due to using grub with a RAID1 /boot. If one sets the
boot partition to sda1 and then modifies grub settings from the boot prompt,
sda1 and sdb1 will differ, of course ... so the side-note is a red herring.
mkinitrd also hangs in losetup -d with 2.4.18-7.8.0. Please fix.
Still happening with 2.4.18-18.8.0:
loop0 S 00000002 0 31093 1 24572 (L-TLB)
Call Trace: [<c0107d91>] __down_interruptible [kernel] 0x71 (0xc957bf84))
[<c0107e3f>] __down_failed_interruptible [kernel] 0x7 (0xc957bfac))
[<e094d050>] .text.lock.loop [loop] 0x55 (0xc957bfb8))
[<e094bb40>] loop_thread [loop] 0x0 (0xc957bfc8))
[<c010746e>] kernel_thread [kernel] 0x2e (0xc957bff0))
[<e094bb40>] loop_thread [loop] 0x0 (0xc957bff8))
umount R current 16 31168 30839 (NOTLB)
Call Trace: [<c0143e72>] invalidate_bdev [kernel] 0x52 (0xcaefbf04))
[<c014858b>] kill_bdev [kernel] 0x1b (0xcaefbf38))
[<c014940d>] blkdev_put [kernel] 0xad (0xcaefbf4c))
[<c01474fa>] remove_super [kernel] 0x6a (0xcaefbf68))
[<c015a19f>] sys_umount [kernel] 0x3f (0xcaefbf80))
[<c015a217>] sys_oldumount [kernel] 0x17 (0xcaefbfb4))
[<c0109177>] system_call [kernel] 0x33 (0xcaefbfc0))
The same thing happens with RHL 7.3 kernel errata, at least with 2.4.18-18.7.x
and 2.4.18-19.7.x. However I don't think it's a problem in the kernel binary
pkg, because the same thing happens if you try to run mkinitrd by hand.
So even though I can normally use the loop devices just fine, it seems
mkinitrd hangs while trying to umount the initrd img file, just as Mr.
Rugolsky described earlier.
What's even more confusing though is that, in my experience, this doesn't
always happen: I have two systems running on rather identical server-class
installations of RHL 7.3. They both have the same base packages installed, and
both are always updated with the latest errata. However, this problem with
mkinitrd only occurs on one, while mkinitrd works just fine on the other one!
I don't know what to say, perhaps someone from Red Hat could give us some
insight as to what could be causing this.
I've also noticed this problem on my RHL7.3 machines with various
2.4.18-17,18,19 kernels.. Again, the mkinitrd hangs on 'umount'. A reboot
clears the problem and re-running mkinitrd after the reboot succeeds. So, I
have no idea what, in particular, causes the problem, but a reboot certainly
Perhaps there is an uninitialized variable in the loop driver (or some other
kernel module) that happens to get a reasonable value when the system is fresh,
and an unreasonable value at 'random' times?
I can reliably reproduce this under any of the more recent 7.3 errata kernels.
If there's any way I can help with debugging issues, let me know; it's not clear
if anyone is really looking at this right now, though, and I don't have
bandwidth to drive an investigation anytime soon. (My steps for reproducing
this: install AFS. Transfer a moderately-sized file (say 20MB). Try to unmount
a loopback filesystem. <wedge>.)
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases,
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/