Description of problem: Loop devices (/dev/loop[0-X]) can't be detached after they had been unmounted; they are reported to be busy. Version-Release number of selected component (if applicable): kernel-3.1.2-1.fc16.x86_64 (kernel-PAE-2.6.40.6-0.fc15.i686 in F15 has the problem too) How reproducible: always Steps to Reproduce: dd if=/dev/zero bs=8192 count=128 of=disk.img mkfs -t ext2 -F disk.img losetup /dev/loop0 disk.img mount /dev/loop0 /mnt losetup -d /dev/loop0 Actual results: loop: can't delete device /dev/loop0: Device or resource busy Expected results: the "losetup -d" command completes successfully, nothing is reported, the loop device is detached Additional info: - it seems that the problem is really in the mount call as before the loop device is mounted, it is possible to detach it without issues - the problem does not seem to be specific to any particular filesystem which happens to be present on the loop device; I have originally noticed it when mounting NTFS over fuse/ntfs-3g, and was able to reproduce it with vfat filesystem as well - using "mount -oloop disk.img /mnt" has the same problem and it is worse in that the failure to detach the loop device upon umount is not reported (not even by a non-zero exit code from the umount command)
Oops, I forgot to include "umount" in the instructions (which is the thing that makes it all so bad) - the correct steps to reproduce are: dd if=/dev/zero bs=8192 count=128 of=disk.img mkfs -t ext2 -F disk.img losetup /dev/loop0 disk.img mount /dev/loop0 /mnt umount /mnt losetup -d /dev/loop0
It works for me: # dd if=/dev/zero bs=8192 count=128 of=disk.img ... # mkfs -t ext2 -F disk.img ... # losetup /dev/loop0 disk.img # mount /dev/loop0 /mnt # umount /mnt # losetup -a /dev/loop0: [fd02]:5581 (/root/disk.img) # losetup -d /dev/loop0 # losetup -a # and # mount -o loop disk.img /mnt # losetup -a /dev/loop0: [fd02]:5581 (/root/disk.img) # umount /mnt # losetup -a # Your issue could perhaps be caused by some kind of indexing or GUI automount that keeps a reference to the device so it can't be deleted.
Thanks for the quick reply, but unfortunately it doesn't seem to be that simple. Here are steps to reproduce the problem in an almost shut-down system with only a handful of processes and services running (set the attached files for the lists of running processes and active systemd units at the time of testing): Boot the system in the standard way, then stop all systemd units (via 'systemctl stop ...') which don't have matching entries in the attached list of active units on my system at the time of testing, additionally kill any user space processes which don't have matching entries in the attached list of processes running on my machine at the time of testing. Then execute: dd if=/dev/zero bs=8192 count=128 of=disk.img mkfs -t ext2 -F disk.img mkdir a b mount --bind a b losetup /dev/loop0 disk.img mount /dev/loop0 /mnt umount /mnt losetup -d /dev/loop0 ADDITIONAL INFO - The problem seems to be the sequence of a "--bind" mount followed by a loop device mount. (Note that the root filesystem on my machine where all this happens is ext4, I don't know if this is relevant to the problem though.) - It is not possible to find out which process is holding the reference to the affected loop device by using lsof and fuser (the commands simply can't find any references to the affected loop device). - The problem on my machine is apparently caused by the name server (bind-9.8.1-4.P1.fc16.x86_64) which uses "--bind" mounts to setup its changeroot environment. - I was not able to reproduce the problem in recovery mode, despite there were actually more processes running than in the environment in which it fails for me. - Please also note the processes (kernel threads) towards the end of the attached process list file: root 2204 2 0 20:45 ? 00:00:00 [loop0] root 2206 2 0 20:45 ? 00:00:00 [ext4-dio-unwrit] root 2211 2 0 20:45 ? 00:00:00 [flush-7:0] root 2233 2 0 20:45 ? 00:00:00 [loop1] root 2235 2 0 20:45 ? 00:00:00 [ext4-dio-unwrit] root 2240 2 0 20:45 ? 00:00:00 [flush-7:1] root 2336 2 0 20:48 ? 00:00:00 [loop2] root 2338 2 0 20:48 ? 00:00:00 [ext4-dio-unwrit] root 2343 2 0 20:48 ? 00:00:00 [flush-7:2] root 2351 2 0 20:49 ? 00:00:00 [loop3] root 2353 2 0 20:49 ? 00:00:00 [ext4-dio-unwrit] root 2358 2 0 20:49 ? 00:00:00 [flush-7:3] as they appear to be related to the problem I'm seeing.
Created attachment 538358 [details] process-list.txt list of processes running on my machine at the time of testing
Created attachment 538359 [details] unit-list.txt list of active systemd units on my machine at the time of testing
Confirmed. Weird!
I also see this when the system is running[1] but have not yet found a reliable way to reproduce it. Many times a simple 'umount /tmp/a'[2] does work, others I end up with many duplicate entries in the output of 'mount' and eventually run out of free loop devices with the default setting. Since I upped that to max_loop=64 I no longer run out of loop devices but right now 'mount | wc -l' gave me 196638 and the machine is extremely sluggish on mount or umount (about 10 minutes per umount and the load temporarily goes up to abount 40) not sure if the sluggishness is the same bug or a different one. I'll be happy to pull logs for you, but it will probably be after Jan 1st. [1] one thing I do multiple times one day in nearly every week is 1) mount -o loop,ro /some/path/some-old-ISO-image.iso /tmp/a 2) jigdo-lite /some/path/some-newer-ISO-image.jigdo 3) let jigdo pull from the old ISO the files that have not changed since last week and then download the new files and generate me some-newer-ISO-image.iso 4) umount /tmp/a 5) rinse, wash, repeat because of what ISOs I assemble, I'll be running some runs of steps 1-4 while another set is in progress (e.g. start processing a i386 ISO and while that is at step 3 I'll kick off the x86_64 ISO) [2] I understand in F16 the '-d' flag to umount is no longer needed, please tell if that impression is wrong.
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update.
I've just run into this when trying to create a livecd on 3.3.0-4.fc16. I'm able to reliably reproduce it with something as simple as: # dd if=/dev/zero of=test bs=1M count=100 # losetup /dev/loop0 test # mkfs.ext4 /dev/loop0 # mount /dev/loop0 /mnt # umount /mnt # losetup -d /dev/loop0 loop: can't delete device /dev/loop0: Device or resource busy
Ok, I've spent the last few hours looking at this and I'm completely lost. To give some context, I'm trying to use livecd-creator to create a minimal Fedora install. I'm doing this on one of our lab computers, which is running Fedora 16 x86_64, kernel 3.3.0-4.fc16. Every time I mount a loopback filesystem, and then unmount it, I cannot detach the loop device. As mentioned in comment #3, the kernel threads for the filesystem keep running, even after the unmount. The filesystem doesn't have any effect; I run into the same problem with btrfs. Switching to a different lab computer doesn't fix the problem. Switching to Fedora's 3.2.9 kernel doesn't fix the problem. Using my laptop, which is running Fedora 16, kernel 3.3.0-4.fc16 *does* work. Disabling services on the lab machine so it matches what's running on my laptop doesn't fix the problem. Enabling selinux doesn't fix the problem. Any suggestions on how to troubleshoot this?
As a workaround try to add some sleeps before the commands that fails. It mostly works for me on f17, but this: dd if=/dev/zero bs=8192 count=128 of=disk.img mkfs -t ext2 -F disk.img mkdir a b mount --bind a b # I don't know if that makes a difference while true do losetup /dev/loop0 disk.img mount /dev/loop0 /mnt umount /mnt losetup -d /dev/loop0 done will occasionally fail with: + losetup /dev/loop0 disk.img + mount /dev/loop0 /mnt mount: you must specify the filesystem type or: + losetup /dev/loop0 disk.img + mount /dev/loop0 /mnt + umount /mnt + losetup -d /dev/loop0 losetup: /dev/loop0: detach failed: Device or resource busy I guess it is timing related and influenced by other processes watching and using the devices. It seems like losetup is more async than I would expect. I guess it takes a kernel or util-linux hacker to debug this ... and they should be able to reproduce it.
In my case, no amount of waiting or killing programs seems to have any effect. On one lab system, I ended up killing almost all of the non-kernel processes and was still unable to detach the loopback filesystem. :( In my case, I'm not doing any bind mounts that I'm aware of. For now I've generated the Live CD using my laptop.
I ran into this problem as well. I've been using the mount --bind a b reproducer mentioned above. It seems to aggravate the problem and increase the chances of the problem appear. I've been using these steps: # setup truncate --size=100MB /tmp/fat.img mkfs.vfat /dev/loop0 mkdir /tmp/a /tmp/b mount --bind /tmp/a /tmp/b Then using test.sh ---- #!/bin/bash mountdir=/tmp/fat loopdev=$(losetup -f --show fat.img) mount $loopdev $mountdir echo "hello" > $mountdir/world.txt umount $mountdir losetup -d $loopdev exit $? ---- When boot into Fedora 16 with init=/bin/bash I cannot get the problem to appear. I also have difficulty reproducing the problem if I avoid logging in to the GUI and instead go to a virtual terminal (CTRL-ALT-F2) and log in as root to conduct the tests. I also noticed that the bind mount and the loopback mount need to be on the same filesystem for the problem to emerge. If I reproduce the problem on /tmp (where /tmp is part of the root filesystem at / ) then go to a separate mountpoint /home/username/ I can successfully loopback mount and umount. I just created a minimal Fedora installation with virt-install and could not reproduce the problem there.
So far the big difference between my Fedora minimal and Fedora desktop virtual machines seems to be the following: Output is from ftrace with filtering on lo_open, lo_ioctl, and lo_release: minimal----- losetup-1643 [000] .... 1111947.473922: lo_open <-__blkdev_get losetup-1643 [000] .... 1111947.473930: lo_ioctl <-blkdev_ioctl losetup-1643 [000] .... 1111947.473934: lo_release <-__blkdev_put losetup-1643 [000] .... 1111947.473956: lo_open <-__blkdev_get losetup-1643 [000] .... 1111947.473982: lo_ioctl <-blkdev_ioctl blkid-1644 [000] .... 1111947.475689: lo_open <-__blkdev_get blkid-1644 [000] .... 1111947.475784: lo_ioctl <-blkdev_ioctl losetup-1643 [000] .... 1111947.475815: lo_ioctl <-blkdev_ioctl losetup-1643 [000] .... 1111947.475817: lo_release <-__blkdev_put blkid-1644 [000] .... 1111947.477862: lo_release <-__blkdev_put mount-1646 [000] .... 1111954.428390: lo_open <-__blkdev_get mount-1646 [000] .... 1111954.428487: lo_ioctl <-blkdev_ioctl mount-1646 [000] .... 1111954.430211: lo_release <-__blkdev_put mount-1646 [000] .... 1111954.430378: lo_open <-__blkdev_get umount-1647 [000] .... 1111961.795576: lo_release <-__blkdev_put losetup-1649 [000] .... 1111964.790946: lo_open <-__blkdev_get losetup-1649 [000] .... 1111964.790952: lo_ioctl <-blkdev_ioctl blkid-1650 [000] .... 1111964.792785: lo_open <-__blkdev_get blkid-1650 [000] .... 1111964.792893: lo_ioctl <-blkdev_ioctl blkid-1650 [000] .... 1111964.792993: lo_release <-__blkdev_put losetup-1649 [000] .... 1111964.793740: lo_release <-__blkdev_put desktop----- # tracer: function # # entries-in-buffer/entries-written: 29/29 #P:1 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | losetup-1957 [000] .... 1114652.185868: lo_open <-__blkdev_get losetup-1957 [000] .... 1114652.185875: lo_ioctl <-blkdev_ioctl losetup-1957 [000] .... 1114652.185878: lo_release <-__blkdev_put losetup-1957 [000] .... 1114652.185895: lo_open <-__blkdev_get losetup-1957 [000] .... 1114652.185912: lo_ioctl <-blkdev_ioctl blkid-1959 [000] .... 1114652.187623: lo_open <-__blkdev_get losetup-1957 [000] .... 1114652.187675: lo_ioctl <-blkdev_ioctl losetup-1957 [000] .... 1114652.187696: lo_release <-__blkdev_put blkid-1959 [000] .... 1114652.188604: lo_ioctl <-blkdev_ioctl mkfs.vfat-1960 [000] .... 1114652.189691: lo_open <-__blkdev_get mkfs.vfat-1960 [000] .... 1114652.189696: lo_release <-__blkdev_put mkfs.vfat-1960 [000] .... 1114652.190092: lo_open <-__blkdev_get mkfs.vfat-1960 [000] .... 1114652.190430: lo_release <-__blkdev_put blkid-1959 [000] .... 1114652.194553: lo_release <-__blkdev_put udisks-part-id-1963 [000] .... 1114652.196384: lo_open <-__blkdev_get udisks-part-id-1963 [000] .... 1114652.196436: lo_release <-__blkdev_put mount-1962 [000] .... 1114652.196599: lo_open <-__blkdev_get mount-1962 [000] .... 1114652.196674: lo_ioctl <-blkdev_ioctl udisks-daemon-1334 [000] .... 1114652.197167: lo_open <-__blkdev_get udisks-daemon-1334 [000] .... 1114652.197168: lo_ioctl <-blkdev_ioctl udisks-daemon-1334 [000] .... 1114652.197173: lo_release <-__blkdev_put mount-1962 [000] .... 1114652.198115: lo_release <-__blkdev_put mount-1962 [000] .... 1114652.213672: lo_open <-__blkdev_get udisks-daemon-1334 [000] .... 1114652.215193: lo_open <-__blkdev_get udisks-daemon-1334 [000] .... 1114652.215195: lo_ioctl <-blkdev_ioctl udisks-daemon-1334 [000] .... 1114652.215199: lo_release <-__blkdev_put udisks-daemon-1334 [000] .... 1114652.262227: lo_open <-__blkdev_get udisks-daemon-1334 [000] .... 1114652.262230: lo_ioctl <-blkdev_ioctl udisks-daemon-1334 [000] .... 1114652.262235: lo_release <-__blkdev_put ----- udisks is present in the desktop version. Also there is an unbalanced lo_open call in mount-1962. I've run the test with sleep 5 and pgrep for mount before exiting. The mount process appears to have terminated before tracing stopped. Yet, I can't seem to capture an lo_release from it.
(I should have said umount not mount in my last comment.) Regardless here are updated traces. This time both are with the Fedora Desktop. The first is from a virtual terminal before logging in to gdm. The second is from within the GNOME Desktop with udisks running. --- before login --- umount-1361 [000] .N.. 1136923.228319: release_mounts <-sys_umount umount-1361 [000] .... 1136923.228676: mntput <-release_mounts umount-1361 [000] .... 1136923.228677: mntput_no_expire <-mntput umount-1361 [000] .... 1136923.228677: mntput <-release_mounts umount-1361 [000] .... 1136923.228677: mntput_no_expire <-mntput umount-1361 [000] .... 1136923.228678: mntput_no_expire <-sys_umount umount-1361 [000] .... 1136923.228682: deactivate_super <-mntput_no_expire umount-1361 [000] .... 1136923.228683: deactivate_locked_super <-deactivate_super umount-1361 [000] .... 1136923.228683: kill_block_super <-deactivate_locked_super umount-1361 [000] .... 1136923.228708: blkdev_put <-kill_block_super umount-1361 [000] .... 1136923.228709: __blkdev_put <-blkdev_put umount-1361 [000] .... 1136923.228727: lo_release <-__blkdev_put --- after login --- umount-1949 [000] .N.. 1143293.861718: release_mounts <-sys_umount umount-1949 [000] .N.. 1143293.861719: mntput <-release_mounts umount-1949 [000] .N.. 1143293.861719: mntput_no_expire <-mntput umount-1949 [000] .N.. 1143293.861719: mntput <-release_mounts umount-1949 [000] .N.. 1143293.861720: mntput_no_expire <-mntput umount-1949 [000] .N.. 1143293.861724: deactivate_super <-mntput_no_expire --- I'm unfamiliar with the details of the VFS internals, but it appears as if the the struct super_block has s_active set. This seems to prevent deactivate_super from calling deactivate_locked_super. I suspect that the polling by udisks and gvfs is keeping it active somehow but the exact cause is unclear. My attempts to use the --inhibit and --inhibit-all-polling parameters of udisks didn't reveal anything.
I tracked this down further. I used kill -STOP to freeze udisks. Even killed udisks-daemon entirely. In my case it turned out to be cups, which was started when I logged into GNOME. As far as I can see, this bug is a duplicate of 808121 - cupsd interferes with loop devices https://bugzilla.redhat.com/show_bug.cgi?id=808121 I was able to make the problem disappear by using: systemctl stop cups.service This doesn't help existing loop devices that are stuck, but it prevents the problem from occurring.
Please also see bug #808795 ...
Please try https://bugzilla.redhat.com/show_bug.cgi?id=808795#c20 and close it as duplicate if it helps, thanks.
Tested, this bug is not solved with your systemd -18.1
And disabling sandbox service + reboot helps? See https://bugzilla.redhat.com/show_bug.cgi?id=808795#c31
(In reply to comment #22) > And disabling sandbox service + reboot helps? See > https://bugzilla.redhat.com/show_bug.cgi?id=808795#c31 Disabling sandbox & rebooting fixes the problem here.
*** This bug has been marked as a duplicate of bug 808795 ***
Hell-o, I'm faced to this problem too. I use mount -o loop to edit iso images of corporate install CD's. When I repeatedly mount/umount xxx.iso to folder y, loop devices are not detached, so this procedure eats all my /dev/loop[n] devices and I have to restart to get make them free again. losetup -d /dev/loop[n] always says it's busy after umount. In fact, in my file manager I can see all those "fantoms" listed. It is quite annoying bug.
'chkconfig sandbox off' magically fixed my problem. I do not exatly know, what the sandbox is, but it does not any good for me.
Confirming this bug against 64bit Fedora 17 (kernel-v3.6.7 / systemd-v44)... # kpartx -a disk-image.dd # mount /dev/mapper/loop0p2 /mnt/P2 <work, work, work> # umount /mnt/P2 # losetup --detach-all losetup: /dev/loop0: detach failed: Device or resource busy