From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0rc3) Gecko/20020523 Description of problem: Attempting to install the RH8 errata kernel on Dell Poweredge 1650. Currently running kernel-2.4.18-14.i686. The mkinitrd step hangs on losetup -d. see ps -Hwefl output below. /proc/mounts shows /dev/loop0 unmounted. Version-Release number of selected component (if applicable): How reproducible: Didn't try Steps to Reproduce: 1. rpm -ivh kernel-2.4.18-17.8.0.i686.rpm Actual Results: F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 100 S root 20556 18138 0 80 5 - 2243 pause 10:43 pts/2 00:00:02 rpm -ivh kernel-2.4.18-17.8.0.i686.rpm 000 S root 20561 20556 0 80 5 - 957 wait4 10:44 pts/2 00:00:00 /bin/sh /var/tmp/rpm-tmp.34824 2 000 S root 20578 20561 0 80 5 - 968 wait4 10:44 pts/2 00:00:00 /bin/bash /sbin/new-kernel-pkg --mkinitrd --depmod --install 2.4.18-17.8.0 000 S root 20581 20578 0 80 5 - 978 wait4 10:44 pts/2 00:00:00 /bin/bash /sbin/mkinitrd -f /boot/initrd-2.4.18-17.8.0.img 2.4.18-17.8.0 100 R root 20907 20581 99 90 5 - 784 - 10:44 pts/2 00:50:42 umount /var/tmp/initrd.mnt.kGn4xx 000 D root 20912 20581 0 81 5 - 776 down 10:46 pts/2 00:00:00 losetup -d /dev/loop0 Expected Results: losetup -d succeeds. Additional info: Side note: We have a local "tripwire-like" script that does md5sums. Last night it produced a bogus md5sum for /boot/grub/stage2; today it shows the correct value, same as /usr/share/grub/i386-pc/stage2. /boot partition has clean fsck. This machine ran memtest86 for four hours before RH8 was loaded on it. I will run it again, but I wanted to submit this before rebooting the machine.
SysRq-t info: loop0 S 00000002 0 20832 1 23497 (L-TLB) Call Trace: [<c0107d31>] __down_interruptible [kernel] 0x71 (0xde2dbf84)) [<c0107ddf>] __down_failed_interruptible [kernel] 0x7 (0xde2dbfac)) [<e0926100>] .text.lock.loop [loop] 0x55 (0xde2dbfb8)) [<e0924bf0>] loop_thread [loop] 0x0 (0xde2dbfc8)) [<c010744e>] kernel_thread [kernel] 0x2e (0xde2dbff0)) [<e0924bf0>] loop_thread [loop] 0x0 (0xde2dbff8)) umount R current 0 20907 20581 20912 (NOTLB) Call Trace: [<c014182a>] invalidate_bdev [kernel] 0x5a (0xdd71bf04)) [<c01460ab>] kill_bdev [kernel] 0x1b (0xdd71bf38)) [<c0146f2d>] blkdev_put [kernel] 0xad (0xdd71bf4c)) [<c014501a>] remove_super [kernel] 0x6a (0xdd71bf68)) [<c0157e7f>] sys_umount [kernel] 0x3f (0xdd71bf80)) [<c0157ef7>] sys_oldumount [kernel] 0x17 (0xdd71bfb4)) [<c010910f>] system_call [kernel] 0x33 (0xdd71bfc0)) losetup D DFF1EF68 2640 20912 20581 20907 (NOTLB) Call Trace: [<c0107c7a>] __down [kernel] 0x6a (0xd7991ee4)) [<c0107dd4>] __down_failed [kernel] 0x8 (0xd7991f08)) [<c0146e30>] blkdev_open [kernel] 0x0 (0xd7991f10)) [<c01470af>] .text.lock.block_dev [kernel] 0x5 (0xd7991f18)) [<c0146e68>] blkdev_open [kernel] 0x38 (0xd7991f38)) [<c013ee5b>] dentry_open [kernel] 0x14b (0xd7991f50)) [<c013ed08>] filp_open [kernel] 0x68 (0xd7991f70)) [<c013f133>] sys_open [kernel] 0x53 (0xd7991fa8)) [<c010910f>] system_call [kernel] 0x33 (0xd7991fc0))
The md5sum difference is due to using grub with a RAID1 /boot. If one sets the boot partition to sda1 and then modifies grub settings from the boot prompt, sda1 and sdb1 will differ, of course ... so the side-note is a red herring.
mkinitrd also hangs in losetup -d with 2.4.18-7.8.0. Please fix.
Still happening with 2.4.18-18.8.0: loop0 S 00000002 0 31093 1 24572 (L-TLB) Call Trace: [<c0107d91>] __down_interruptible [kernel] 0x71 (0xc957bf84)) [<c0107e3f>] __down_failed_interruptible [kernel] 0x7 (0xc957bfac)) [<e094d050>] .text.lock.loop [loop] 0x55 (0xc957bfb8)) [<e094bb40>] loop_thread [loop] 0x0 (0xc957bfc8)) [<c010746e>] kernel_thread [kernel] 0x2e (0xc957bff0)) [<e094bb40>] loop_thread [loop] 0x0 (0xc957bff8)) umount R current 16 31168 30839 (NOTLB) Call Trace: [<c0143e72>] invalidate_bdev [kernel] 0x52 (0xcaefbf04)) [<c014858b>] kill_bdev [kernel] 0x1b (0xcaefbf38)) [<c014940d>] blkdev_put [kernel] 0xad (0xcaefbf4c)) [<c01474fa>] remove_super [kernel] 0x6a (0xcaefbf68)) [<c015a19f>] sys_umount [kernel] 0x3f (0xcaefbf80)) [<c015a217>] sys_oldumount [kernel] 0x17 (0xcaefbfb4)) [<c0109177>] system_call [kernel] 0x33 (0xcaefbfc0))
The same thing happens with RHL 7.3 kernel errata, at least with 2.4.18-18.7.x and 2.4.18-19.7.x. However I don't think it's a problem in the kernel binary pkg, because the same thing happens if you try to run mkinitrd by hand. So even though I can normally use the loop devices just fine, it seems mkinitrd hangs while trying to umount the initrd img file, just as Mr. Rugolsky described earlier. What's even more confusing though is that, in my experience, this doesn't always happen: I have two systems running on rather identical server-class installations of RHL 7.3. They both have the same base packages installed, and both are always updated with the latest errata. However, this problem with mkinitrd only occurs on one, while mkinitrd works just fine on the other one! I don't know what to say, perhaps someone from Red Hat could give us some insight as to what could be causing this.
I've also noticed this problem on my RHL7.3 machines with various 2.4.18-17,18,19 kernels.. Again, the mkinitrd hangs on 'umount'. A reboot clears the problem and re-running mkinitrd after the reboot succeeds. So, I have no idea what, in particular, causes the problem, but a reboot certainly corrects it. Perhaps there is an uninitialized variable in the loop driver (or some other kernel module) that happens to get a reasonable value when the system is fresh, and an unreasonable value at 'random' times?
I can reliably reproduce this under any of the more recent 7.3 errata kernels. If there's any way I can help with debugging issues, let me know; it's not clear if anyone is really looking at this right now, though, and I don't have bandwidth to drive an investigation anytime soon. (My steps for reproducing this: install AFS. Transfer a moderately-sized file (say 20MB). Try to unmount a loopback filesystem. <wedge>.)
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/