758159 – Loop devices not detachable/detached after umount

Bug 758159 - Loop devices not detachable/detached after umount

Summary: Loop devices not detachable/detached after umount

Keywords:
Status:	CLOSED DUPLICATE of bug 808795
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	16
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	494832
TreeView+	depends on / blocked

Reported:	2011-11-29 12:51 UTC by Michal Růžička
Modified:	2016-05-12 17:59 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-04-15 17:44:50 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
process-list.txt (5.73 KB, text/plain) 2011-11-30 01:53 UTC, Michal Růžička	no flags	Details
unit-list.txt (1.64 KB, text/plain) 2011-11-30 01:55 UTC, Michal Růžička	no flags	Details
View All

Description Michal Růžička 2011-11-29 12:51:00 UTC

Description of problem:
Loop devices (/dev/loop[0-X]) can't be detached after they had been unmounted; they are reported to be busy.

Version-Release number of selected component (if applicable):
kernel-3.1.2-1.fc16.x86_64
(kernel-PAE-2.6.40.6-0.fc15.i686 in F15 has the problem too)

How reproducible:
always

Steps to Reproduce:
dd if=/dev/zero bs=8192 count=128 of=disk.img
mkfs -t ext2 -F disk.img
losetup /dev/loop0 disk.img
mount /dev/loop0 /mnt
losetup -d /dev/loop0
  
Actual results:
loop: can't delete device /dev/loop0: Device or resource busy

Expected results:
the "losetup -d" command completes successfully, nothing is reported, the loop device is detached

Additional info:
- it seems that the problem is really in the mount call as before the loop device is mounted, it is possible to detach it without issues
- the problem does not seem to be specific to any particular filesystem which happens to be present on the loop device; I have originally noticed it when mounting NTFS over fuse/ntfs-3g, and was able to reproduce it with vfat filesystem as well
- using "mount -oloop disk.img /mnt" has the same problem and it is worse in that the failure to detach the loop device upon umount is not reported (not even by a non-zero exit code from the umount command)

Comment 1 Michal Růžička 2011-11-29 12:58:44 UTC

Oops, I forgot to include "umount" in the instructions (which is the thing that makes it all so bad) - the correct steps to reproduce are:

dd if=/dev/zero bs=8192 count=128 of=disk.img
mkfs -t ext2 -F disk.img
losetup /dev/loop0 disk.img
mount /dev/loop0 /mnt
umount /mnt
losetup -d /dev/loop0

Comment 2 Mads Kiilerich 2011-11-29 13:23:55 UTC

It works for me:

# dd if=/dev/zero bs=8192 count=128 of=disk.img
...
# mkfs -t ext2 -F disk.img
...

# losetup /dev/loop0 disk.img
# mount /dev/loop0 /mnt
# umount /mnt
# losetup -a
/dev/loop0: [fd02]:5581 (/root/disk.img)
# losetup -d /dev/loop0
# losetup -a
#

and

# mount -o loop disk.img /mnt
# losetup -a
/dev/loop0: [fd02]:5581 (/root/disk.img)
# umount /mnt
# losetup -a
# 

Your issue could perhaps be caused by some kind of indexing or GUI automount that keeps a reference to the device so it can't be deleted.

Comment 3 Michal Růžička 2011-11-30 01:51:03 UTC

Thanks for the quick reply, but unfortunately it doesn't seem to be that simple.
Here are steps to reproduce the problem in an almost shut-down system with only a handful of processes and services running
(set the attached files for the lists of running processes and active systemd units at the time of testing):

Boot the system in the standard way, then stop all systemd units (via 'systemctl stop ...') which don't have matching entries in the attached list of active units on my system at the time of testing, additionally kill any user space processes which don't have matching entries in the attached list of processes running on my machine at the time of testing.
Then execute:

dd if=/dev/zero bs=8192 count=128 of=disk.img
mkfs -t ext2 -F disk.img
mkdir a b
mount --bind a b
losetup /dev/loop0 disk.img
mount /dev/loop0 /mnt
umount /mnt
losetup -d /dev/loop0

ADDITIONAL INFO
- The problem seems to be the sequence of a "--bind" mount followed by a loop device mount. (Note that the root filesystem on my machine where all this happens is ext4, I don't know if this is relevant to the problem though.)
- It is not possible to find out which process is holding the reference to the affected loop device by using lsof and fuser (the commands simply can't find any references to the affected loop device).
- The problem on my machine is apparently caused by the name server (bind-9.8.1-4.P1.fc16.x86_64) which uses "--bind" mounts to setup its changeroot environment.
- I was not able to reproduce the problem in recovery mode, despite there were actually more processes running than in the environment in which it fails for me.
- Please also note the processes (kernel threads) towards the end of the attached process list file:

root      2204     2  0 20:45 ?        00:00:00 [loop0]
root      2206     2  0 20:45 ?        00:00:00 [ext4-dio-unwrit]
root      2211     2  0 20:45 ?        00:00:00 [flush-7:0]
root      2233     2  0 20:45 ?        00:00:00 [loop1]
root      2235     2  0 20:45 ?        00:00:00 [ext4-dio-unwrit]
root      2240     2  0 20:45 ?        00:00:00 [flush-7:1]
root      2336     2  0 20:48 ?        00:00:00 [loop2]
root      2338     2  0 20:48 ?        00:00:00 [ext4-dio-unwrit]
root      2343     2  0 20:48 ?        00:00:00 [flush-7:2]
root      2351     2  0 20:49 ?        00:00:00 [loop3]
root      2353     2  0 20:49 ?        00:00:00 [ext4-dio-unwrit]
root      2358     2  0 20:49 ?        00:00:00 [flush-7:3]

as they appear to be related to the problem I'm seeing.

Comment 4 Michal Růžička 2011-11-30 01:53:57 UTC

Created attachment 538358 [details]
process-list.txt

list of processes running on my machine at the time of testing

Comment 5 Michal Růžička 2011-11-30 01:55:46 UTC

Created attachment 538359 [details]
unit-list.txt

list of active systemd units on my machine at the time of testing

Comment 6 Mads Kiilerich 2011-11-30 10:41:19 UTC

Confirmed. Weird!

Comment 7 Patrick C. F. Ernzer 2011-12-21 20:54:32 UTC

I also see this when the system is running[1] but have not yet found a reliable way to reproduce it. Many times a simple 'umount /tmp/a'[2] does work, others I end up with many duplicate entries in the output of 'mount' and eventually run out of free loop devices with the default setting. Since I upped that to max_loop=64 I no longer run out of loop devices but right now 'mount | wc -l' gave me 196638 and the machine is extremely sluggish on mount or umount (about 10 minutes per umount and the load temporarily goes up to abount 40) not sure if the sluggishness is the same bug or a different one.

I'll be happy to pull logs for you, but it will probably be after Jan 1st.


[1] one thing I do multiple times one day in nearly every week is 
1) mount -o loop,ro /some/path/some-old-ISO-image.iso /tmp/a
2) jigdo-lite /some/path/some-newer-ISO-image.jigdo
3) let jigdo pull from the old ISO the files that have not changed since last week and then download the new files and generate me some-newer-ISO-image.iso
4) umount /tmp/a
5) rinse, wash, repeat
because of what ISOs I assemble, I'll be running some runs of steps 1-4 while another set is in progress (e.g. start processing a i386 ISO and while that is at step 3 I'll kick off the x86_64 ISO)

[2] I understand in F16 the '-d' flag to umount is no longer needed, please tell if that impression is wrong.

Comment 8 Dave Jones 2012-03-22 16:46:52 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 9 Dave Jones 2012-03-22 16:51:32 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 10 Dave Jones 2012-03-22 17:02:00 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 11 Jonathan Dieter 2012-03-29 14:13:14 UTC

I've just run into this when trying to create a livecd on 3.3.0-4.fc16.  I'm able to reliably reproduce it with something as simple as:

# dd if=/dev/zero of=test bs=1M count=100
# losetup /dev/loop0 test
# mkfs.ext4 /dev/loop0
# mount /dev/loop0 /mnt
# umount /mnt
# losetup -d /dev/loop0
loop: can't delete device /dev/loop0: Device or resource busy

Comment 12 Jonathan Dieter 2012-03-30 09:53:48 UTC

Ok, I've spent the last few hours looking at this and I'm completely lost.  To give some context, I'm trying to use livecd-creator to create a minimal Fedora install.  I'm doing this on one of our lab computers, which is running Fedora 16 x86_64, kernel 3.3.0-4.fc16.

Every time I mount a loopback filesystem, and then unmount it, I cannot detach the loop device.  As mentioned in comment #3, the kernel threads for the filesystem keep running, even after the unmount.  The filesystem doesn't have any effect; I run into the same problem with btrfs.

Switching to a different lab computer doesn't fix the problem.  Switching to Fedora's 3.2.9 kernel doesn't fix the problem.  Using my laptop, which is running Fedora 16, kernel 3.3.0-4.fc16 *does* work.

Disabling services on the lab machine so it matches what's running on my laptop doesn't fix the problem.  Enabling selinux doesn't fix the problem.

Any suggestions on how to troubleshoot this?

Comment 13 Mads Kiilerich 2012-03-30 10:25:37 UTC

As a workaround try to add some sleeps before the commands that fails.


It mostly works for me on f17, but this:

dd if=/dev/zero bs=8192 count=128 of=disk.img
mkfs -t ext2 -F disk.img
mkdir a b
mount --bind a b # I don't know if that makes a difference
while true
do
        losetup /dev/loop0 disk.img
        mount /dev/loop0 /mnt
        umount /mnt
        losetup -d /dev/loop0
done

will occasionally fail with:

+ losetup /dev/loop0 disk.img
+ mount /dev/loop0 /mnt
mount: you must specify the filesystem type

or:

+ losetup /dev/loop0 disk.img
+ mount /dev/loop0 /mnt
+ umount /mnt
+ losetup -d /dev/loop0
losetup: /dev/loop0: detach failed: Device or resource busy

I guess it is timing related and influenced by other processes watching and using the devices.

It seems like losetup is more async than I would expect. I guess it takes a kernel or util-linux hacker to debug this ... and they should be able to reproduce it.

Comment 14 Jonathan Dieter 2012-03-30 17:28:57 UTC

In my case, no amount of waiting or killing programs seems to have any effect.  On one lab system, I ended up killing almost all of the non-kernel processes and was still unable to detach the loopback filesystem. :(

In my case, I'm not doing any bind mounts that I'm aware of.  For now I've generated the Live CD using my laptop.

Comment 15 Rob Woolley 2012-04-01 18:50:32 UTC

I ran into this problem as well.

I've been using the mount --bind a b reproducer mentioned above.  It seems to aggravate the problem and increase the chances of the problem appear.

I've been using these steps:


# setup
truncate --size=100MB /tmp/fat.img
mkfs.vfat /dev/loop0
mkdir /tmp/a /tmp/b
mount --bind /tmp/a /tmp/b

Then using test.sh
----
#!/bin/bash
mountdir=/tmp/fat
loopdev=$(losetup -f --show fat.img)
mount $loopdev $mountdir
echo "hello" > $mountdir/world.txt
umount $mountdir
losetup -d $loopdev 
exit $?
----

When boot into Fedora 16 with init=/bin/bash I cannot get the problem to appear.  I also have difficulty reproducing the problem if I avoid logging in to the GUI and instead go to a virtual terminal (CTRL-ALT-F2) and log in as root to conduct the tests.

I also noticed that the bind mount and the loopback mount need to be on the same filesystem for the problem to emerge.  If I reproduce the problem on /tmp (where  /tmp is part of the root filesystem at / ) then go to a separate mountpoint /home/username/ I can successfully loopback mount and umount.

I just created a minimal Fedora installation with virt-install and could not reproduce the problem there.

Comment 16 Rob Woolley 2012-04-02 19:13:35 UTC

So far the big difference between my Fedora minimal and Fedora desktop virtual machines seems to be the following:

Output is from ftrace with filtering on lo_open, lo_ioctl, and lo_release:

minimal-----

         losetup-1643  [000] .... 1111947.473922: lo_open <-__blkdev_get
         losetup-1643  [000] .... 1111947.473930: lo_ioctl <-blkdev_ioctl
         losetup-1643  [000] .... 1111947.473934: lo_release <-__blkdev_put
         losetup-1643  [000] .... 1111947.473956: lo_open <-__blkdev_get
         losetup-1643  [000] .... 1111947.473982: lo_ioctl <-blkdev_ioctl
           blkid-1644  [000] .... 1111947.475689: lo_open <-__blkdev_get
           blkid-1644  [000] .... 1111947.475784: lo_ioctl <-blkdev_ioctl
         losetup-1643  [000] .... 1111947.475815: lo_ioctl <-blkdev_ioctl
         losetup-1643  [000] .... 1111947.475817: lo_release <-__blkdev_put
           blkid-1644  [000] .... 1111947.477862: lo_release <-__blkdev_put
           mount-1646  [000] .... 1111954.428390: lo_open <-__blkdev_get
           mount-1646  [000] .... 1111954.428487: lo_ioctl <-blkdev_ioctl
           mount-1646  [000] .... 1111954.430211: lo_release <-__blkdev_put
           mount-1646  [000] .... 1111954.430378: lo_open <-__blkdev_get
          umount-1647  [000] .... 1111961.795576: lo_release <-__blkdev_put
         losetup-1649  [000] .... 1111964.790946: lo_open <-__blkdev_get
         losetup-1649  [000] .... 1111964.790952: lo_ioctl <-blkdev_ioctl
           blkid-1650  [000] .... 1111964.792785: lo_open <-__blkdev_get
           blkid-1650  [000] .... 1111964.792893: lo_ioctl <-blkdev_ioctl
           blkid-1650  [000] .... 1111964.792993: lo_release <-__blkdev_put
         losetup-1649  [000] .... 1111964.793740: lo_release <-__blkdev_put

desktop-----

# tracer: function
#
# entries-in-buffer/entries-written: 29/29   #P:1
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
         losetup-1957  [000] .... 1114652.185868: lo_open <-__blkdev_get
         losetup-1957  [000] .... 1114652.185875: lo_ioctl <-blkdev_ioctl
         losetup-1957  [000] .... 1114652.185878: lo_release <-__blkdev_put
         losetup-1957  [000] .... 1114652.185895: lo_open <-__blkdev_get
         losetup-1957  [000] .... 1114652.185912: lo_ioctl <-blkdev_ioctl
           blkid-1959  [000] .... 1114652.187623: lo_open <-__blkdev_get
         losetup-1957  [000] .... 1114652.187675: lo_ioctl <-blkdev_ioctl
         losetup-1957  [000] .... 1114652.187696: lo_release <-__blkdev_put
           blkid-1959  [000] .... 1114652.188604: lo_ioctl <-blkdev_ioctl
       mkfs.vfat-1960  [000] .... 1114652.189691: lo_open <-__blkdev_get
       mkfs.vfat-1960  [000] .... 1114652.189696: lo_release <-__blkdev_put
       mkfs.vfat-1960  [000] .... 1114652.190092: lo_open <-__blkdev_get
       mkfs.vfat-1960  [000] .... 1114652.190430: lo_release <-__blkdev_put
           blkid-1959  [000] .... 1114652.194553: lo_release <-__blkdev_put
  udisks-part-id-1963  [000] .... 1114652.196384: lo_open <-__blkdev_get
  udisks-part-id-1963  [000] .... 1114652.196436: lo_release <-__blkdev_put
           mount-1962  [000] .... 1114652.196599: lo_open <-__blkdev_get
           mount-1962  [000] .... 1114652.196674: lo_ioctl <-blkdev_ioctl
   udisks-daemon-1334  [000] .... 1114652.197167: lo_open <-__blkdev_get
   udisks-daemon-1334  [000] .... 1114652.197168: lo_ioctl <-blkdev_ioctl
   udisks-daemon-1334  [000] .... 1114652.197173: lo_release <-__blkdev_put
           mount-1962  [000] .... 1114652.198115: lo_release <-__blkdev_put
           mount-1962  [000] .... 1114652.213672: lo_open <-__blkdev_get
   udisks-daemon-1334  [000] .... 1114652.215193: lo_open <-__blkdev_get
   udisks-daemon-1334  [000] .... 1114652.215195: lo_ioctl <-blkdev_ioctl
   udisks-daemon-1334  [000] .... 1114652.215199: lo_release <-__blkdev_put
   udisks-daemon-1334  [000] .... 1114652.262227: lo_open <-__blkdev_get
   udisks-daemon-1334  [000] .... 1114652.262230: lo_ioctl <-blkdev_ioctl
   udisks-daemon-1334  [000] .... 1114652.262235: lo_release <-__blkdev_put
-----

udisks is present in the desktop version.

Also there is an unbalanced lo_open call in mount-1962.

I've run the test with sleep 5 and pgrep for mount before exiting.  The mount process appears to have terminated before tracing stopped.  Yet, I can't seem to capture an lo_release from it.

Comment 17 Rob Woolley 2012-04-03 03:12:16 UTC

(I should have said umount not mount in my last comment.)

Regardless here are updated traces.  This time both are with the Fedora Desktop.  The first is from a virtual terminal before logging in to gdm.  The second is from within the GNOME Desktop with udisks running.

--- before login ---

          umount-1361  [000] .N.. 1136923.228319: release_mounts <-sys_umount
          umount-1361  [000] .... 1136923.228676: mntput <-release_mounts
          umount-1361  [000] .... 1136923.228677: mntput_no_expire <-mntput
          umount-1361  [000] .... 1136923.228677: mntput <-release_mounts
          umount-1361  [000] .... 1136923.228677: mntput_no_expire <-mntput
          umount-1361  [000] .... 1136923.228678: mntput_no_expire <-sys_umount
          umount-1361  [000] .... 1136923.228682: deactivate_super <-mntput_no_expire
          umount-1361  [000] .... 1136923.228683: deactivate_locked_super <-deactivate_super
          umount-1361  [000] .... 1136923.228683: kill_block_super <-deactivate_locked_super
          umount-1361  [000] .... 1136923.228708: blkdev_put <-kill_block_super
          umount-1361  [000] .... 1136923.228709: __blkdev_put <-blkdev_put
          umount-1361  [000] .... 1136923.228727: lo_release <-__blkdev_put

--- after login ---

          umount-1949  [000] .N.. 1143293.861718: release_mounts <-sys_umount
          umount-1949  [000] .N.. 1143293.861719: mntput <-release_mounts
          umount-1949  [000] .N.. 1143293.861719: mntput_no_expire <-mntput
          umount-1949  [000] .N.. 1143293.861719: mntput <-release_mounts
          umount-1949  [000] .N.. 1143293.861720: mntput_no_expire <-mntput
          umount-1949  [000] .N.. 1143293.861724: deactivate_super <-mntput_no_expire

---

I'm unfamiliar with the details of the VFS internals, but it appears as if the the struct super_block has s_active set.  This seems to prevent deactivate_super  from calling deactivate_locked_super.

I suspect that the polling by udisks and gvfs is keeping it active somehow but the exact cause is unclear.  My attempts to use the --inhibit and --inhibit-all-polling parameters of udisks didn't reveal anything.

Comment 18 Rob Woolley 2012-04-03 17:21:25 UTC

I tracked this down further.

I used kill -STOP to freeze udisks.  Even killed udisks-daemon entirely.

In my case it turned out to be cups, which was started when I logged into GNOME.

As far as I can see, this bug is a duplicate of 808121 - cupsd interferes with loop devices

https://bugzilla.redhat.com/show_bug.cgi?id=808121

I was able to make the problem disappear by using:
systemctl stop cups.service

This doesn't help existing loop devices that are stuck, but it prevents the problem from occurring.

Comment 19 Milan Broz 2012-04-12 18:48:59 UTC

Please also see bug #808795 ...

Comment 20 Milan Broz 2012-04-12 20:03:08 UTC

Please try https://bugzilla.redhat.com/show_bug.cgi?id=808795#c20
and close it as duplicate if it helps, thanks.

Comment 21 Warren Togami 2012-04-13 00:26:30 UTC

Tested, this bug is not solved with your systemd -18.1

Comment 22 Milan Broz 2012-04-13 20:49:00 UTC

And disabling sandbox service + reboot helps? See
https://bugzilla.redhat.com/show_bug.cgi?id=808795#c31

Comment 23 Sylvain Petreolle 2012-04-15 16:03:53 UTC

(In reply to comment #22)
> And disabling sandbox service + reboot helps? See
> https://bugzilla.redhat.com/show_bug.cgi?id=808795#c31

Disabling sandbox & rebooting fixes the problem here.

Comment 24 Milan Broz 2012-04-15 17:44:50 UTC


*** This bug has been marked as a duplicate of bug 808795 ***

Comment 25 Vaclav Kocian 2012-05-27 18:12:58 UTC

Hell-o,

I'm faced to this problem too. I use mount -o loop to edit iso images of corporate install CD's. When I repeatedly mount/umount xxx.iso to folder y, loop devices are not detached, so this procedure eats all my /dev/loop[n] devices and I have to restart to get make them free again. losetup -d /dev/loop[n] always says it's busy after umount. In fact, in my file manager I can see all those "fantoms" listed. It is quite annoying bug.

Comment 26 Vaclav Kocian 2012-05-27 18:57:34 UTC

'chkconfig sandbox off' magically fixed my problem. I do not exatly know, what the sandbox is, but it does not any good for me.

Comment 27 Andrei ILIE 2012-11-27 15:09:23 UTC

Confirming this bug against 64bit Fedora 17 (kernel-v3.6.7 / systemd-v44)...


# kpartx -a disk-image.dd

# mount /dev/mapper/loop0p2 /mnt/P2

    <work, work, work>

# umount /mnt/P2

# losetup --detach-all
losetup: /dev/loop0: detach failed: Device or resource busy

Note You need to log in before you can comment on or make changes to this bug.