Bug 76237

Summary: losetup -d hangs when installing 2.4.18-17.8.0
Product: [Retired] Red Hat Linux Reporter: Bill Rugolsky, Jr. <bill>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0CC: amb, chris, zmousm
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:40:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bill Rugolsky, Jr. 2002-10-18 15:43:27 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0rc3) Gecko/20020523

Description of problem:
Attempting to install the RH8 errata kernel on Dell Poweredge 1650. Currently
running kernel-2.4.18-14.i686. The mkinitrd step hangs on losetup -d. see ps
-Hwefl output below.  /proc/mounts shows /dev/loop0 unmounted.

Version-Release number of selected component (if applicable):


How reproducible:
Didn't try

Steps to Reproduce:
1. rpm -ivh kernel-2.4.18-17.8.0.i686.rpm
	

Actual Results:    F S UID        PID  PPID  C PRI  NI ADDR    SZ WCHAN  STIME
TTY          TIME CMD
100 S root     20556 18138  0  80   5    -  2243 pause  10:43 pts/2    00:00:02
            rpm -ivh kernel-2.4.18-17.8.0.i686.rpm
000 S root     20561 20556  0  80   5    -   957 wait4  10:44 pts/2    00:00:00
              /bin/sh /var/tmp/rpm-tmp.34824 2
000 S root     20578 20561  0  80   5    -   968 wait4  10:44 pts/2    00:00:00
                /bin/bash /sbin/new-kernel-pkg --mkinitrd --depmod --install
2.4.18-17.8.0
000 S root     20581 20578  0  80   5    -   978 wait4  10:44 pts/2    00:00:00
                  /bin/bash /sbin/mkinitrd -f /boot/initrd-2.4.18-17.8.0.img
2.4.18-17.8.0
100 R root     20907 20581 99  90   5    -   784 -      10:44 pts/2    00:50:42
                    umount /var/tmp/initrd.mnt.kGn4xx
000 D root     20912 20581  0  81   5    -   776 down   10:46 pts/2    00:00:00
                    losetup -d /dev/loop0


Expected Results:  losetup -d succeeds.

Additional info:

Side note: We have a local "tripwire-like" script that does md5sums.  Last night
it produced a bogus md5sum for /boot/grub/stage2; today it shows the correct
value, same as /usr/share/grub/i386-pc/stage2.  /boot partition has clean fsck.
 This machine ran memtest86 for four hours before RH8 was loaded on it.  I will
run it again, but I wanted to submit this before rebooting the machine.

Comment 1 Bill Rugolsky, Jr. 2002-10-18 17:44:53 UTC
SysRq-t info:

loop0         S 00000002     0 20832      1               23497 (L-TLB)
Call Trace: [<c0107d31>] __down_interruptible [kernel] 0x71 (0xde2dbf84))
[<c0107ddf>] __down_failed_interruptible [kernel] 0x7 (0xde2dbfac))
[<e0926100>] .text.lock.loop [loop] 0x55 (0xde2dbfb8))
[<e0924bf0>] loop_thread [loop] 0x0 (0xde2dbfc8))
[<c010744e>] kernel_thread [kernel] 0x2e (0xde2dbff0))
[<e0924bf0>] loop_thread [loop] 0x0 (0xde2dbff8))

umount        R current      0 20907  20581               20912 (NOTLB)
Call Trace: [<c014182a>] invalidate_bdev [kernel] 0x5a (0xdd71bf04))
[<c01460ab>] kill_bdev [kernel] 0x1b (0xdd71bf38))
[<c0146f2d>] blkdev_put [kernel] 0xad (0xdd71bf4c))
[<c014501a>] remove_super [kernel] 0x6a (0xdd71bf68))
[<c0157e7f>] sys_umount [kernel] 0x3f (0xdd71bf80))
[<c0157ef7>] sys_oldumount [kernel] 0x17 (0xdd71bfb4))
[<c010910f>] system_call [kernel] 0x33 (0xdd71bfc0))

losetup       D DFF1EF68  2640 20912  20581         20907       (NOTLB)
Call Trace: [<c0107c7a>] __down [kernel] 0x6a (0xd7991ee4))
[<c0107dd4>] __down_failed [kernel] 0x8 (0xd7991f08))
[<c0146e30>] blkdev_open [kernel] 0x0 (0xd7991f10))
[<c01470af>] .text.lock.block_dev [kernel] 0x5 (0xd7991f18))
[<c0146e68>] blkdev_open [kernel] 0x38 (0xd7991f38))
[<c013ee5b>] dentry_open [kernel] 0x14b (0xd7991f50))
[<c013ed08>] filp_open [kernel] 0x68 (0xd7991f70))
[<c013f133>] sys_open [kernel] 0x53 (0xd7991fa8))
[<c010910f>] system_call [kernel] 0x33 (0xd7991fc0))



Comment 2 Bill Rugolsky, Jr. 2002-10-21 14:27:40 UTC
The md5sum difference is due to using grub with a RAID1 /boot. If one sets the
boot partition to sda1 and then modifies grub settings from the boot prompt,
sda1 and sdb1 will differ, of course ... so the side-note is a red herring.

Comment 3 Bill Rugolsky, Jr. 2002-10-24 17:02:23 UTC
mkinitrd also hangs in losetup -d with 2.4.18-7.8.0. Please fix.

Comment 4 Bill Rugolsky, Jr. 2002-11-27 16:09:17 UTC
Still happening with 2.4.18-18.8.0:

loop0         S 00000002     0 31093      1               24572 (L-TLB)
Call Trace: [<c0107d91>] __down_interruptible [kernel] 0x71 (0xc957bf84))
[<c0107e3f>] __down_failed_interruptible [kernel] 0x7 (0xc957bfac))
[<e094d050>] .text.lock.loop [loop] 0x55 (0xc957bfb8))
[<e094bb40>] loop_thread [loop] 0x0 (0xc957bfc8))
[<c010746e>] kernel_thread [kernel] 0x2e (0xc957bff0))
[<e094bb40>] loop_thread [loop] 0x0 (0xc957bff8))

umount        R current     16 31168  30839                     (NOTLB)
Call Trace: [<c0143e72>] invalidate_bdev [kernel] 0x52 (0xcaefbf04))
[<c014858b>] kill_bdev [kernel] 0x1b (0xcaefbf38))
[<c014940d>] blkdev_put [kernel] 0xad (0xcaefbf4c))
[<c01474fa>] remove_super [kernel] 0x6a (0xcaefbf68))
[<c015a19f>] sys_umount [kernel] 0x3f (0xcaefbf80))
[<c015a217>] sys_oldumount [kernel] 0x17 (0xcaefbfb4))
[<c0109177>] system_call [kernel] 0x33 (0xcaefbfc0))


Comment 5 Zenon Mousmoulas 2003-01-09 13:26:54 UTC
The same thing happens with RHL 7.3 kernel errata, at least with 2.4.18-18.7.x 
and 2.4.18-19.7.x. However I don't think it's a problem in the kernel binary 
pkg, because the same thing happens if you try to run mkinitrd by hand.

So even though I can normally use the loop devices just fine, it seems 
mkinitrd hangs while trying to umount the initrd img file, just as Mr. 
Rugolsky described earlier.

What's even more confusing though is that, in my experience, this doesn't 
always happen: I have two systems running on rather identical server-class 
installations of RHL 7.3. They both have the same base packages installed, and 
both are always updated with the latest errata. However, this problem with 
mkinitrd only occurs on one, while mkinitrd works just fine on the other one!

I don't know what to say, perhaps someone from Red Hat could give us some 
insight as to what could be causing this.

Comment 6 Derek Atkins 2003-01-24 22:53:53 UTC
I've also noticed this problem on my RHL7.3 machines with various
2.4.18-17,18,19 kernels..  Again, the mkinitrd hangs on 'umount'.  A reboot
clears the problem and re-running mkinitrd after the reboot succeeds.  So, I
have no idea what, in particular, causes the problem, but a reboot certainly
corrects it.

Perhaps there is an uninitialized variable in the loop driver (or some other
kernel module) that happens to get a reasonable value when the system is fresh,
and an unreasonable value at 'random' times?

Comment 7 andrew m. boardman 2003-01-25 03:52:43 UTC
I can reliably reproduce this under any of the more recent 7.3 errata kernels. 
If there's any way I can help with debugging issues, let me know; it's not clear
if anyone is really looking at this right now, though, and I don't have
bandwidth to drive an investigation anytime soon.  (My steps for reproducing
this: install AFS.  Transfer a moderately-sized file (say 20MB).  Try to unmount
a loopback filesystem.  <wedge>.)


Comment 8 Bugzilla owner 2004-09-30 15:40:05 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/