Created attachment 574250 [details] Test script that exercises the create/mount/unmount/remove process Description of problem: I first found something wrong with rsnapshot, when backups using LVM snapshots stopped working because the LVM snapshots could not be removed. If a user is logged in to Gnome Shell when an LVM snapshot is mounted, something (I suspect Gnome Shell) opens a reference to the snapshot's device which in turn prevents that snapshot from being removed until the system is rebooted, even if the user logs out. Snapshots that are created and mounted without a user logged in to Gnome Shell can be removed mostly without issue, occasionally hitting bug #753105. Version-Release number of selected component (if applicable): lvm2-2.02.86-6.fc16.x86_64 nautilus-3.2.1-2.fc16.x86_64 How reproducible: Every time, as long as a user is logged in to Gnome Shell. Steps to Reproduce: 1. Log in to Gnome Shell 2. Create an LVM snapshot 3. Mount the snapshot 4. Umount the snapshot 5. Attempt to remove the LVM snapshot, get an error until the system is rebooted. Actual results: Get the following error: Can't remove open logical volume "snaptest" Expected results: If a user is logged in to Gnome Shell, I expect the snapshot to be removed. Additional info: I created a script that iterates through the process of creating a snapshot, mounting the file system, umounting the file system, and removing the snapshot. This script runs without errors (other than occasionally hitting bug #753105) if a user is not logged in to Gnome Shell, but fails immediately if a user is logged in to Gnome Shell. This script is attached (or will be shortly if I'm unable to attach it right now...) Problem has been reproduced on two separate systems, one x86_64 and one i686.
I have four systems running F16. After F16 updates for all four systems on March 24, 2012, I found a problem with two of the systems. When a LVM snapshot is created and then mounted and then umounted, the snapshot could not be removed without rebooting the system. Before the updates daily backups using snapshots were done on the systems without problems (with the time, remove and loop workaround for the occasional failure in the backup script). On or about March 25 I did the following testing: I removed the disks from one of the systems with the problem and used it for a test system. I installed F16 from DVD with no other repositories selected. After the install and before doing updates I could create, mount, umount and remove snapshots normally. After applying the updates I could not remove a snapshot until after a reboot. I added another disk and created more pv's and vg's and found that I could not remove ANY logical volume after it was mounted and then umounted. SEE the following: [root@redwood ~]# gdisk /dev/sdb GPT fdisk (gdisk) version 0.8.2 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Command (? for help): p Disk /dev/sdb: 156247887 sectors, 74.5 GiB Logical sector size: 512 bytes Disk identifier (GUID): BA084048-2BED-4B6C-966E-14EF98DDC2FE Partition table holds up to 128 entries First usable sector is 34, last usable sector is 156247853 Partitions will be aligned on 2048-sector boundaries Total free space is 107093772 sectors (51.1 GiB) Number Start (sector) End (sector) Size Code Name 1 2048 4095 1024.0 KiB EF02 BIOS boot partition 2 4096 16388095 7.8 GiB 8E00 Linux LVM 3 16388096 32772095 7.8 GiB 8E00 Linux LVM 4 32772096 49156095 7.8 GiB 8E00 Linux LVM Command (? for help): quit [root@redwood ~]# pvcreate /dev/sdb4 Writing physical volume data to disk "/dev/sdb4" Physical volume "/dev/sdb4" successfully created [root@redwood ~]# vgcreate vg_deadend /dev/sdb4 Volume group "vg_deadend" successfully created [root@redwood ~]# pvscan PV /dev/sdb4 VG vg_deadend lvm2 [7.81 GiB / 7.81 GiB free] PV /dev/sdb2 VG vg_test lvm2 [7.81 GiB / 7.81 GiB free] PV /dev/mapper/luks-d21d3e5a-7b30-4ef2-86c1-1ecedaaf45dd VG vg_redwood lvm2 [74.00 GiB / 19.75 GiB free] Total: 3 [89.62 GiB] / in use: 3 [89.62 GiB] / in no VG: 0 [0 ] [root@redwood ~]# vgscan Reading all physical volumes. This may take a while... Found volume group "vg_deadend" using metadata type lvm2 Found volume group "vg_test" using metadata type lvm2 Found volume group "vg_redwood" using metadata type lvm2 [root@redwood ~]# lvcreate -L1G -n lv_deadend vg_deadend Logical volume "lv_deadend" created [root@redwood ~]# lvscan ACTIVE '/dev/vg_deadend/lv_deadend' [1.00 GiB] inherit ACTIVE '/dev/vg_redwood/lv_home' [14.66 GiB] inherit ACTIVE '/dev/vg_redwood/lv_swap' [5.38 GiB] inherit ACTIVE '/dev/vg_redwood/lv_root' [34.22 GiB] inherit [root@redwood ~]# ls -l /dev/mapper total 0 crw-------. 1 root root 10, 236 Mar 28 15:03 control lrwxrwxrwx. 1 root root 7 Mar 28 15:03 luks-d21d3e5a-7b30-4ef2-86c1-1ecedaaf45dd -> ../dm-0 lrwxrwxrwx. 1 root root 7 Mar 28 15:32 vg_deadend-lv_deadend -> ../dm-4 lrwxrwxrwx. 1 root root 7 Mar 28 15:03 vg_redwood-lv_home -> ../dm-3 lrwxrwxrwx. 1 root root 7 Mar 28 15:03 vg_redwood-lv_root -> ../dm-2 lrwxrwxrwx. 1 root root 7 Mar 28 15:03 vg_redwood-lv_swap -> ../dm-1 [root@redwood ~]# lvdisplay /dev/mapper/vg_deadend-lv_deadend --- Logical volume --- LV Name /dev/vg_deadend/lv_deadend VG Name vg_deadend LV UUID icd06q-Vdwr-kUyo-QHMK-2hDx-HJyT-XqIZBA LV Write Access read/write LV Status available # open 0 LV Size 1.00 GiB Current LE 256 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:4 [root@redwood ~]# mkfs.ext4 /dev/mapper/vg_deadend-lv_deadend mke2fs 1.41.14 (22-Dec-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 65536 inodes, 262144 blocks 13107 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=268435456 8 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 25 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. [root@redwood ~]# mount /dev/mapper/vg_deadend-lv_deadend /ZZZ [root@redwood ~]# lvdisplay /dev/mapper/vg_deadend-lv_deadend --- Logical volume --- LV Name /dev/vg_deadend/lv_deadend VG Name vg_deadend LV UUID icd06q-Vdwr-kUyo-QHMK-2hDx-HJyT-XqIZBA LV Write Access read/write LV Status available # open 1 LV Size 1.00 GiB Current LE 256 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:4 [root@redwood ~]# umount /dev/mapper/vg_deadend-lv_deadend [root@redwood ~]# lvdisplay /dev/mapper/vg_deadend-lv_deadend --- Logical volume --- LV Name /dev/vg_deadend/lv_deadend VG Name vg_deadend LV UUID icd06q-Vdwr-kUyo-QHMK-2hDx-HJyT-XqIZBA LV Write Access read/write LV Status available # open 1 <------ NOT RIGHT! LV Size 1.00 GiB Current LE 256 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:4 [root@redwood ~]# lvremove -f /dev/mapper/vg_deadend-lv_deadend Can't remove open logical volume "lv_deadend" "# open" should be zero after unmounting. The snapshots show the same indication (non-zero "# open") with lvdisplay. Note the two systems that fail have ATI integrated GPUs while the two that work OK have NVIDIA GPUs one integrated and one PCIe. Hardware Profiles on or about 2012-03-28: test system (redwood) - Can't remove lv's without reboot. Motherboard chipset AMD 780G / SB700 IGP-GPU ATI Radeon 3200 3.3.0-4.fc16.x86_64 #1 SMP Tue Mar 20 18:05:40 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux http://www.smolts.org/client/show/pub_a0908e56-6714-4e4d-a695-772b090c9ae1 my computer (cedar) - Can't remove lv's without reboot. Motherboard chipset AMD 880G / SB710 IGP-GPU ATI Radeon HD 4250 3.3.0-4.fc16.x86_64 #1 SMP Tue Mar 20 18:05:40 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux http://www.smolts.org/client/show/pub_b37e83ea-1912-42f3-8b67-fe86fb0e3a47 transparent bridge and firewall (kwai) - Works OK. Motherboard chipset NVIDIA GeForce 6100/nForce 405 IGP 3.3.0-4.fc16.i686 #1 SMP Tue Mar 20 18:45:14 UTC 2012 i686 i686 i386 GNU/Linux http://www.smolts.org/client/show/pub_322c66df-28e5-4fb3-90f6-14afda3f0c5a wife's computer (willow) - Works OK. Motherboard chipset AMD 770 / SB710 PCIe-GPU NVIDIA GeForce 8400 GS 3.3.0-4.fc16.x86_64 #1 SMP Tue Mar 20 18:05:40 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux http://www.smolts.org/client/show/pub_877aee92-b4de-4a3e-b598-0677601fde9d Testing today April 10, 2012. 1. The problem affects both snapshot and regular logical volumes. 2. If you boot into single user systems work correctly. 3. If you boot into run-level 3 failing systems fail. 4. The two systems that "seem" un-affected are older and slower than the two that fail. Opinion: Since the problem is not affecting all my systems it appears to be related to bug 712100 and bug 753105.
What process has the snapshot open? (e.g. try 'lsof' or look in /proc to find out) Then transfer this bugzilla to the component that owns that process.
sorry folks - i must say i am FED UP to the eyebrows more and more we have a bug i observe and work around since fc13 (577798) with (--force should be sufficient) or without open files ... when will the "upstream" or whatever junk be haunted ???? overintelligent crapware failing when trying to be too smart everywhere ? greetings
(In reply to comment #2) > What process has the snapshot open? (e.g. try 'lsof' or look in /proc to find > out) > Then transfer this bugzilla to the component that owns that process. Thanks for the reply, Alasdair. I see no process that has the device open: [root@rivendell ~]# find /proc -maxdepth 2 -type d -name 'fd' -exec ls -l {} \; | grep "/dev/vg_rivendell/snaptest" [root@rivendell html]# lsof /dev/vg_rivendell/snaptest lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /home/flatline/.gvfs Output information may be incomplete. If I'm not looking in the correct place in /proc, can you please let me know where to look so I can find this information and appropriately update this bug? Again, thanks for your help... John
Hunt around a bit more. lsblk dmsetup ls --tree ls /sys/block/dm-<minor>/holders /proc/mounts /proc/swaps
If you're grepping, match against the major+minor or all possible forms of the name
The open count shown in lvdisplay seems to be bogus. I cannot find an open device with lsof or any false mounts in /proc that would be related to the open count. The problem with devices appearing busy after a umount includes REGULAR block devices as well as DM- devices. See bug 810393 which is related. This a more general bug than with just LVM. The following test done April 11, 2012. After reboot.. [root@willow ~]# uname -a Linux willow.newbarnyard.net 3.3.0-4.fc16.x86_64 #1 SMP Tue Mar 20 18:05:40 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux [root@willow ~]# mkfs.ext2 /dev/sda4 mke2fs 1.41.14 (22-Dec-2010) . . . This filesystem will be automatically checked every 23 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. [root@willow ~]# mkfs.ext2 /dev/sda4 mke2fs 1.41.14 (22-Dec-2010) . . . Writing inode tables: done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 37 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. ### Can make filesystems repeatedly until mount followed by umount. ### Device is now busy. ### This happens on three systems with 3.3.0-4.fc16.x86_64 [root@willow ~]# mount /dev/sda4 /testmnt [root@willow ~]# umount /dev/sda4 [root@willow ~]# mkfs.ext2 /dev/sda4 mke2fs 1.41.14 (22-Dec-2010) /dev/sda4 is apparently in use by the system; will not make a filesystem here! ### I have one 32bit kernel that works fine. Device is NOT busy after umount. [root@kwai ~]# uname -a Linux kwai.newbarnyard.net 3.3.0-4.fc16.i686 #1 SMP Tue Mar 20 18:45:14 UTC 2012 i686 i686 i386 GNU/Linux [root@kwai ~]# mkfs.ext2 /dev/sda5 mke2fs 1.41.14 (22-Dec-2010) , , , This filesystem will be automatically checked every 27 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. [root@kwai ~]# mount /dev/sda5 /testmnt [root@kwai ~]# umount /dev/sda5 [root@kwai ~]# mkfs.ext2 /dev/sda5 mke2fs 1.41.14 (22-Dec-2010) Filesystem label= . . . This filesystem will be automatically checked every 30 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. ### 64bit kernel systems work correctly if the test is done in ### single user mode and after a reboot you can make filesystems ### until you mount/umount again. Note I made ext2 filesystems because ### the journaling block device gets mixed up and continues to ### run after the filesystem is unmounted in the failure cases. ### This is true for the jbd2/dm- running on a DM- device with ### ext3/4 filesystems.
Hi, I have a similar but not so elaborate bug (810393) (FC16 x86_64 latest update) that may have the same root cause as this bug, where I'm unable to umount a disk. lsof shows jbd2 as the ONLY reference to the device in question. I wonder if this would be true in these circumstances? George...
Other filesystems or only ext*? Open count always '1' or can it be >1 ? Try with different kernel/systemd versions too
There are several bugs with the same symtomps, we used systemtap to check and found that filesystem-specific umount is not called (path similar to lazy umount). There is no visible reference, it is fs itself which keep the device busy. Once we find the trigger for it, it will be much straightforward to fix that. (lsblk, lsof, open count is perhaps unusable, I see it as problem somewhere in VFS umount part, triggered by some asynchronous GUI action.)
Hi, I have seen this only with ext4 file systems. George...
Hi, One last bit of information. I do not use Gnome Shell... just konsole. George...
I haven't been able to find any outstanding references when I follow the instructions in comment #2 or comment #5. On my system, the problem began occurring on 2012-Mar-24 shortly after noon EDT when the system was booted into the 3.2.7-1.fc16.x86_64 kernel. The system had been up and running for thirteen days at that point and was using LVM snapshots every four hours to backup the data on the various file systems. All file systems that are being archived on my system are ext4. I plan to search through the yum history and provide a list of the packages that have been updated since the problem began occurring for me. Thanks to all for providing additional data and for providing analysis of the data provided - greatly appreciated.
Please can you try kill "colord" and "colord-sane" processes in system and retry? (With clear reboot, no busy devices) Seems this process is doing something very strange leading to increasing atomic use counted for mounted fs. (At least on my system.)
Created attachment 577070 [details] yum log For what it may be worth the attachment contains a list of the updates of March 24 which immediately preceeded my first systoms of busy devices after umounts. The following tests were done April 12, 2012. 1. make filesystem 2. mount filesystem 3. umount filesystem 4. make filesystem on same device again Step 4 works in single user mode. Step 4 finds the device busy in runlevel 3. Tests done for ext2, ext3, ext4, msdos, vfat, xfs. Results the same for all filesystem types. For ext3/4 filesystems the journaling block device, jbd2, continues to run since it believes the filesystem is still mounted. jdb2 in lsof /proc for the device is a side effect and not a cause of the problem. I retested clean reboot and after killing colord and got the SAME results. Killing colord made no difference.
offtopic/ontopic: hi ! i run lvm2-libs-2.02.84-2.fc14.i686 & lvm2-2.02.84-2.fc14.i686 pure server environment no interactive activity (except ssh) and the sleeping x-console ... i am having "the bug" for years and working around it i suspect it is something that lying around since fc14 not something "new" i see a slighly different behaviour when trying to remove snapshots with and without force ... ----------------- e.g. script [root@bks ~]# /rbin/snapXctrl destroy INF: lvremove -f /dev/vgk/fc14_bks-snap Can't remove open logical volume "fc14_bks-snap" INF: return=5 INF: lvremove -f /dev/vgk/fc14_bks-snap Can't remove open logical volume "fc14_bks-snap" INF: return=5 e.g. interactively snapXctrl() : aborting ... ERR: lv 'fc14_bks-snap' vg 'vgk' after destroy still in status 'swi-a-' ----------------- e.g. by hand [root@bks ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert bks_home vgk -wi-ao 10.00g fc14_bks vgk owi-ao 60.00g fc14_bks-snap vgk swi-a- 5.00g fc14_bks 7.12 swap vgk -wi-a- 8.00g vmware vgk -wi-ao 250.00g swapq vgq -wi-ao 8.00g swapq2 vgq -wi-ao 16.00g tmpvol vgq -wi-ao 10.00g fc14_bks-dd vgz -wi-a- 60.00g save vgz -wi-ao 1.46t save_grass vgz -wi-ao 900.00g [root@bks ~]# lvdisplay /dev/vgk/fc14_bks-snap --- Logical volume --- LV Name /dev/vgk/fc14_bks-snap VG Name vgk LV UUID Tsv08Y-6YJu-Abp3-L0px-3OxO-UpWk-iOKCDx LV Write Access read/write LV snapshot status active destination for /dev/vgk/fc14_bks LV Status available # open 0 LV Size 60.00 GiB Current LE 1920 COW-table size 5.00 GiB COW-table LE 160 Allocated to snapshot 7.12% Snapshot chunk size 4.00 KiB Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:10 [root@bks ~]# lvremove /dev/vgk/fc14_bks-snap Do you really want to remove active logical volume fc14_bks-snap? [y/n]: y Logical volume "fc14_bks-snap" successfully removed -------------- could one please give some useful info how to trace any debug-versions so i can participate in the haunt from an really old perspective if that helps ? w.
Please do not mix problems. If you have this problems for years, it is perhaps problem with watch rule mentioned here https://bugzilla.redhat.com/show_bug.cgi?id=809188#c14
Which version of systemd do you have installed? Can you try to downgrade to systemd-37-3.fc16 a try to reproduce it again? (this one http://koji.fedoraproject.org/koji/buildinfo?buildID=271914)
I am running systemd-37-17.fc16.x86_64. I updated from systemd-37-13.fc16.x86_64 to systemd-37-15.fc16.x86_64 on 2012-Mar-14. I'll try downgrading to systemd-37-3.fc16 as requested, but I may not be able to perform this experiment until the weekend. Thanks, John
Ok, so I finally found the problem: patch 0043-execute-avoid-logging-to-closed-fds.patch in systemd add this +void log_forget_fds(void) { + console_fd = kmsg_fd = syslog_fd = -1; +} + Perhaps author mixed up descriptors, but now systemd leaks them. If I comment this out: --- systemd-37old/src/execute.c 2012-04-12 23:22:14.000000000 +0200 +++ systemd-37/src/execute.c 2012-04-12 23:22:52.375009337 +0200 @@ -1023,7 +1023,7 @@ int exec_spawn(ExecCommand *command, /* Close sockets very early to make sure we don't * block init reexecution because it cannot bind its * sockets */ - log_forget_fds(); + //log_forget_fds(); err = close_all_fds(socket_fd >= 0 ? &socket_fd : fds, socket_fd >= 0 ? 1 : n_fds); if (err < 0) { it works again. I prepared build with patch, please can you test? This one should be safe to update to. http://mbroz.fedorapeople.org/tmp/systemd/ (The whole problem is that filesystem in kernel is still referenced, so umount cannot be finished.)
*** Bug 809188 has been marked as a duplicate of this bug. ***
*** Bug 810393 has been marked as a duplicate of this bug. ***
wow ! things accelarate like a charm ... three cheers & more grats to you milan & thx
I did the following: yum downgrade systemd-36-3.fc16.x86_64 systemd-units-36-3.fc16.x86_64 systemd-sysv-36-3.fc16.x86_64 systemd-gtk-37-17.fc16.x86_64 reboot Could not log into gnome-shell. The backgound for the login remained and the shell background never appeared. I rebooted into single user mode then init 3. When I attempted a mount it never returned. It just hung. I rebooted, upgraded to 37-17 and tried the downgrade again and got the same results. My wife says I must stop now. I will try the build in http://mbroz.fedorapeople.org/tmp/systemd/ in the morning.
Hi, I have tried to install the rpms listed below but there are a TON of messages like this one. I didn't want to force in case that's the wrong thing to do. Thanks for the GREAT work! George... file /usr/share/systemd/kbd-model-map from install of systemd-37-18.1.fc16.x86_64 conflicts with file from package systemd-37-17.fc16.x86_64
Created attachment 577196 [details] loopfail.sh Tried systemd-37-18.1 builds from http://mbroz.fedorapeople.org/tmp/systemd/ Unfortunately it failed to solve the loop device cannot be destroyed after unmount in Bug #758159. This script is attached. [root@newcaprica tmp]# rpm -qa 'systemd*' systemd-units-37-18.1.fc16.x86_64 systemd-sysv-37-18.1.fc16.x86_64 systemd-37-18.1.fc16.x86_64 [root@newcaprica tmp]# sh loopfail.sh + dd if=/dev/zero of=loopfail.img bs=1MB count=50 50+0 records in 50+0 records out 50000000 bytes (50 MB) copied, 0.0431564 s, 1.2 GB/s + losetup /dev/loop1 loopfail.img + mkfs.ext4 -m 0 -J size=1 /dev/loop1 mke2fs 1.41.14 (22-Dec-2010) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) Stride=0 blocks, Stripe width=0 blocks 12240 inodes, 48828 blocks 0 blocks (0.00%) reserved for the super user First data block=1 Maximum filesystem blocks=50069504 6 block groups 8192 blocks per group, 8192 fragments per group 2040 inodes per group Superblock backups stored on blocks: 8193, 24577, 40961 Writing inode tables: done Creating journal (1024 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 39 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. + mkdir loopfail + mount /dev/loop1 loopfail + touch loopfail/testfile + umount loopfail + losetup -d /dev/loop1 loop: can't delete device /dev/loop1: Device or resource busy
I downgraded to systemd-*-37-3 as per suggestion in comment #18 and found behavior similar to that described in comment #24 - login to Gnome Shell hung and system needed to be power cycled as it would not gracefully reboot. I upgraded to systemd-*-37-18 as per suggestion in comment #20 and ran the test script that I included in my initial bug report - no change in the behavior, snapshot still cannot be removed and "lvdisplay" indicates an "open" value of 1. The following are the packages that were updated immediately before the problem began to occur on March 24 at 12pm EDT: Package Name Initial Version Upgraded Version audit 2.1.3-4.fc16.x86_64 2.2.1.fc16.x86_64 audit-libs 2.1.3-4.fc16.i686 2.2-1.fc16.i686 audit-libs 2.1.3-4.fc16.x86_64 2.2-1.fc16.x86_64 audit-libs-python 2.1.3-4.fc16.x86_64 2.2-1.fc16.x86_64 cups 1:1.5.2-1.fc16.x86_64 1:1.5.2-6.fc16.x86_64 cups-libs 1:1.5.2-1.fc16.i686 1:1.5.2-6.fc16.i686 cups-libs 1:1.5.2-1.fc16.x86_64 1:1.5.2-6.fc16.x86_64 dash 0.5.6-6.fc16.x86_64 0.5.7-1.fc16.x86_64 ffmpeg-libs 0.8.9-1.fc16.x86_64 0.8.10-1.fc16.x86_64 iproute 2.6.39-4.fc16.x86_64 2.6.39-5.fc16.x86_64 jetty 6.1.26-7.fc16.noarch 6.1.26-8.fc16.noarch libjpeg-turbo 1.1.1-3.fc16.i686 1.2.0-1.fc16.i686 libjpeg-turbo 1.1.1-3.fc16.x86_64 1.2.0-1.fc16.x86_64 libjpeg-turbo-devel 1.1.1-3.fc16.x86_64 1.2.0-1.fc16.x86_64 libpng 2:1.2.46-2.fc16.i686 2:1.2.48-1.fc16.i686 libpng 2:1.2.46-2.fc16.x86_64 2:1.2.48-1.fc16.x86_64 libpng-devel 2:1.2.46-2.fc16.x86_64 2:1.2.48-1.fc16.x86_64 libquvi-scripts 0.4.2-1.fc16.noarch 0.4.3-1.fc16.noarch lohit-assamese-fonts 2.5.0-1.fc16.noarch 2.5.1-1.fc16.noarch lohit-bengali-fonts 2.5.0-1.fc16.noarch 2.5.1-1.fc16.noarch lohit-gujarati-fonts 2.5.0-1.fc16.noarch 2.5.1-1.fc16.noarch lohit-kannada-fonts 2.5.0-1.fc16.noarch 2.5.1-1.fc16.noarch lohit-oriya-fonts 2.5.0-1.fc16.noarch 2.5.1-1.fc16.noarch lohit-punjabi-fonts 2.5.0-2.fc16.noarch 2.5.1-1.fc16.noarch lohit-telugu-fonts 2.4.5-13.fc16.noarch 2.5.1-1.fc16.noarch nfs-utils 1:1.2.5-4.fc16.x86_64 1:1.2.5-5.fc16.x86_64 pigz 2.2.3-1.fc16.x86_64 2.2.4-1.fc16.x86_64 rpmfusion-free-release 16-1.2.noarch 16-3.noarch rpmfusion-nonfree-release 16-1.1.noarch 16-3.noarch selinux-policy 3.10.0-75.fc16.noarch 3.10.0-80.fc16.noarch selinux-policy-targeted 3.10.0-75.fc16.noarch 3.10.0-80.fc16.noarch (I severely doubt the font packages had any effect, but I included them to provide a complete picture of the yum transaction that occurred prior to the beginning of the failure.)
ok, apparently there is still some kernel problem related, there are reports that the problem disappears with colord/cups/systemd not running - so perhaps it is just random trigger. Whatever, we are still not finished here :) (For me, the problem is systemd is easily reproduced, so we will try what is really going there. I see exactly the same like in comment #27 - once reference is leaked in kernel, only reboot fixes it. No traces in open descriptors, inotify, whatever.)
Do you guys have the "sandbox" service enabled? Do the problems go away if you disable it? My suspicion is vaguely about the propagation of mounts to namespaces. sandbox sets up everything as shared ("mount --make-rshared /"). cups uses PrivateTmp...
The "sandbox" service is enabled, and the problem still occurs after I stop it using "systemctl stop sandbox.service" as follows: [root@rivendell ~]# systemctl status sandbox.service sandbox.service - SYSV: sandbox, xguest and other apps that want to use pam_namespace require this script be run at boot. This service script does not actually run any service but sets up: / to be shared by any app that starts a separate namespace Loaded: loaded (/etc/rc.d/init.d/sandbox) Active: active (exited) since Thu, 12 Apr 2012 19:56:47 -0400; 12h ago CGroup: name=systemd:/system/sandbox.service [root@rivendell ~]# systemctl stop sandbox.service [root@rivendell ~]# cd bug808795/ [root@rivendell bug808795]# ./snaptest.sh === Iteration: 1 Logical volume "snaptest" created Can't remove open logical volume "snaptest" Can't remove open logical volume "snaptest" Can't remove open logical volume "snaptest" Can't remove open logical volume "snaptest" Can't remove open logical volume "snaptest" Can't remove open logical volume "snaptest" Can't remove open logical volume "snaptest" Can't remove open logical volume "snaptest" Can't remove open logical volume "snaptest" Can't remove open logical volume "snaptest" Failed on snapshot removal: 5 [root@rivendell bug808795]# systemctl status sandbox.service sandbox.service - SYSV: sandbox, xguest and other apps that want to use pam_namespace require this script be run at boot. This service script does not actually run any service but sets up: / to be shared by any app that starts a separate namespace Loaded: loaded (/etc/rc.d/init.d/sandbox) Active: inactive (dead) since Fri, 13 Apr 2012 08:28:14 -0400; 51s ago Process: 12256 ExecStop=/etc/rc.d/init.d/sandbox stop (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/sandbox.service
(In reply to comment #29) > Do you guys have the "sandbox" service enabled? Do the problems go away if you > disable it? > My suspicion is vaguely about the propagation of mounts to namespaces. sandbox > sets up everything as shared ("mount --make-rshared /"). cups uses > PrivateTmp... Bingo! I disabled sandbox.service, and rebooted. Now fsck has no complaints after mounting and unmounting ext4 filesystems, and lvremove snapshot works first time. Many thanks. Let me know if you need any further info.
(In reply to comment #30) > The "sandbox" service is enabled, and the problem still occurs after I stop it > using "systemctl stop sandbox.service" as follows: Stopping it is not likely to be enough. Please retest with chkconfig sandbox off && reboot
I had to remove policycoreutils (sandbox is part of it) then it helped. This explains why VFS has some strange references... Anyway, there is perhaps still bug in kernel.
(In reply to comment #33) > I had to remove policycoreutils (sandbox is part of it) then it helped. > > This explains why VFS has some strange references... Anyway, there is perhaps > still bug in kernel. Interesting, I have not removed policycoreutils. I should have specified package versions: policycoreutils-2.1.4-13.fc16.x86_64 selinux-policy-3.10.0-80.fc16.noarch selinux-policy-targeted-3.10.0-80.fc16.noarch systemd-37-17.fc16.x86_64 systemd-units-37-17.fc16.x86_64 systemd-sysv-37-17.fc16.x86_64 kernel-3.3.1-3.fc16.x86_64
So I have managed to reproduce exactly the same behaviour just with mount. mount --make-shared / mount --bind / / mount /dev/loop0 /mnt/test mount --make-slave / umount /mnt/test losetup -d /dev/loop0 loop: can't delete device /dev/loop0: Device or resource busy And 'mount' does not show /dev/loop0 mounted but it actually is still mounted on the shared / mount below the slave / mount. funny ? :) you bet it is. However I can still see /dev/loop0 mounted in /proc/mounts. So maybe this sandbox thing is doing something wrong. -Lukas
Note that 'service sandbox stop' is a no-op - that might obfuscate test results. Disabling and reboot without uninstalling should be enough.
I setup a test system. It's F16 default DVD desktop install with only the home LV reduced in size to allow snapshots and testing with of LV's. It was updated manually in groups of packages followed by reboot and a couple of test script runs until I found which packages kicked off the problem. The system is fully up to date and I can downgrade to eliminate the problem and upgrade to re-create it. Also chkconfig sandbox off, reboot and chkconfig sandbox on, reboot will eliminate or re-create the problem as noted before. I think the only useful information is that the cups packages were changed in such a way that they manifest the problem. The test setup is fairly easy to create and being able to turn the problem off and on might be useful to someone more qualified than me. [root@plainjane bogwan]# sh -x test.sh + lvremove /dev/vg_plainjane/test Do you really want to remove active logical volume test? [y/n]: y Logical volume "test" successfully removed + lvcreate -L1G -n test vg_plainjane Logical volume "test" created + mkfs.ext2 /dev/vg_plainjane/test mke2fs 1.41.14 (22-Dec-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 65536 inodes, 262144 blocks 13107 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=268435456 8 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Writing inode tables: done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 34 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. + mount /dev/vg_plainjane/test /test + umount /dev/vg_plainjane/test [root@plainjane bogwan]# sh -x test.sh + lvremove /dev/vg_plainjane/test Can't remove open logical volume "test" + lvcreate -L1G -n test vg_plainjane Logical volume "test" already exists in volume group "vg_plainjane" + mkfs.ext2 /dev/vg_plainjane/test mke2fs 1.41.14 (22-Dec-2010) /dev/vg_plainjane/test is apparently in use by the system; will not make a filesystem here! + mount /dev/vg_plainjane/test /test + umount /dev/vg_plainjane/test [root@plainjane bogwan]# yum downgrade cups cups-libs cups-pk-helper Loaded plugins: langpacks, presto, refresh-packagekit Resolving Dependencies --> Running transaction check ---> Package cups.x86_64 1:1.5.0-16.fc16 will be a downgrade ---> Package cups.x86_64 1:1.5.2-6.fc16 will be erased ---> Package cups-libs.x86_64 1:1.5.0-16.fc16 will be a downgrade ---> Package cups-libs.x86_64 1:1.5.2-6.fc16 will be erased ---> Package cups-pk-helper.x86_64 0:0.1.3-2.fc16 will be a downgrade ---> Package cups-pk-helper.x86_64 0:0.1.3-3.fc16 will be erased --> Finished Dependency Resolution Dependencies Resolved ======================================================================================================================= Package Arch Version Repository Size ======================================================================================================================= Downgrading: cups x86_64 1:1.5.0-16.fc16 fedora 2.0 M cups-libs x86_64 1:1.5.0-16.fc16 fedora 350 k cups-pk-helper x86_64 0.1.3-2.fc16 fedora 42 k Transaction Summary ======================================================================================================================= Downgrade 3 Packages Total download size: 2.4 M Is this ok [y/N]: y Downloading Packages: (1/3): cups-1.5.0-16.fc16.x86_64.rpm | 2.0 MB 00:03 (2/3): cups-libs-1.5.0-16.fc16.x86_64.rpm | 350 kB 00:00 (3/3): cups-pk-helper-0.1.3-2.fc16.x86_64.rpm | 42 kB 00:00 ----------------------------------------------------------------------------------------------------------------------- Total 485 kB/s | 2.4 MB 00:05 Running Transaction Check Running Transaction Test Transaction Test Succeeded Running Transaction Installing : 1:cups-libs-1.5.0-16.fc16.x86_64 1/6 Installing : 1:cups-1.5.0-16.fc16.x86_64 2/6 Installing : cups-pk-helper-0.1.3-2.fc16.x86_64 3/6 Cleanup : 1:cups-1.5.2-6.fc16.x86_64 4/6 Cleanup : cups-pk-helper-0.1.3-3.fc16.x86_64 5/6 Cleanup : 1:cups-libs-1.5.2-6.fc16.x86_64 6/6 Verifying : 1:cups-1.5.0-16.fc16.x86_64 1/6 Verifying : 1:cups-libs-1.5.0-16.fc16.x86_64 2/6 Verifying : cups-pk-helper-0.1.3-2.fc16.x86_64 3/6 Verifying : 1:cups-libs-1.5.2-6.fc16.x86_64 4/6 Verifying : 1:cups-1.5.2-6.fc16.x86_64 5/6 Verifying : cups-pk-helper-0.1.3-3.fc16.x86_64 6/6 Removed: cups.x86_64 1:1.5.2-6.fc16 cups-libs.x86_64 1:1.5.2-6.fc16 cups-pk-helper.x86_64 0:0.1.3-3.fc16 Installed: cups.x86_64 1:1.5.0-16.fc16 cups-libs.x86_64 1:1.5.0-16.fc16 cups-pk-helper.x86_64 0:0.1.3-2.fc16 Complete! [root@plainjane bogwan]# reboot [root@plainjane bogwan]# sh -x test.sh + lvremove /dev/vg_plainjane/test Do you really want to remove active logical volume test? [y/n]: y Logical volume "test" successfully removed + lvcreate -L1G -n test vg_plainjane Logical volume "test" created + mkfs.ext2 /dev/vg_plainjane/test mke2fs 1.41.14 (22-Dec-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 65536 inodes, 262144 blocks 13107 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=268435456 8 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Writing inode tables: done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 20 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. + mount /dev/vg_plainjane/test /test + umount /dev/vg_plainjane/test [root@plainjane bogwan]# sh -x test.sh + lvremove /dev/vg_plainjane/test Do you really want to remove active logical volume test? [y/n]: y Logical volume "test" successfully removed + lvcreate -L1G -n test vg_plainjane Logical volume "test" created + mkfs.ext2 /dev/vg_plainjane/test mke2fs 1.41.14 (22-Dec-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 65536 inodes, 262144 blocks 13107 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=268435456 8 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Writing inode tables: done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 39 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. + mount /dev/vg_plainjane/test /test + umount /dev/vg_plainjane/test [root@plainjane bogwan]# I hope this is useful I updated every package manually on the clean install looking for something that started the behavior. Ths only packages I found were the cups packages. Cups had been mentioned before so I waited to the very last group to update it.
Moving to kernel.
cat /proc/*/mounts and /proc/*/mountinfo Does the problem device that should have been unmounted show in any process or not? (Compare before the mount, after the mount, after the umount.) If it still shows after the unmount, then look at the mountinfo data to try to work out how it got into that state (namespace clone?). If it doesn't appear anywhere after the umount, then most likely a kernel bug. Again, it would help to know the sequence of operations performed on the mount point/namespaces to see which system call is going wrong.
(In reply to comment #36) > Note that 'service sandbox stop' is a no-op - that might obfuscate test > results. Disabling and reboot without uninstalling should be enough. Shouldn't it: check for sandboxes - if there are any, refuse to stop the service; if there aren't any, mount --make-rprivate etc. (whatever's necessary to undo the 'start')? And should it be off by default when installed?
Or should the sandbox service be removed and the functionality instead be integrated directly into the packages that need it? (I guess I'm not really understanding how mount --make-rshared is particularly wise except in some very limited circumstances.)
(Fedora Packaging Guidelines should perhaps get a section about shared subtree mounts. Imagine if I added a new package that did 'mount --make-rprivate /' and ran it on the same system as another package which does 'mount --make-rshared /'. Which package wins?)
It has been removed from F17. Not even sure it is needed in F16 any longer.
(In reply to comment #43) > It has been removed from F17. Not even sure it is needed in F16 any longer. It is in F16 and enabled by default (not sandbox binary itself, but initscript doing "mount --make-rshared /" which is first step to trigger the leak bug. I think there is still kernel problem (I see no reference in mountinfo but loop is still referenced) but for F16 the easy workaround is to default this to off. Can we do it please for now?
(In reply to comment #32) > Stopping it is not likely to be enough. Please retest with > chkconfig sandbox off && reboot I tested again after rebooting with the sandbox service disabled via chkconfig and the problem does not reproduce using the script I've been using to reliably reproduce the issue. Thank you very much!
*** Bug 758159 has been marked as a duplicate of this bug. ***
*** Bug 812549 has been marked as a duplicate of this bug. ***
To make it even more complicated, util-linux in F16 has a bug, MS_REC (recursive flag) is not set in mount --make-rshared / (runing in sandbox)
(In reply to comment #48) > To make it even more complicated, util-linux in F16 has a bug, MS_REC > (recursive flag) is not set in mount --make-rshared / (runing in sandbox) The bug #813315 should be fixed in util-linux-2.20.1-2.3.fc16.
Also systemd wrongly combine mount flags here, reported as bug #813563 (So it is possible we hit some "undefined" behaviour here than leak.)
*** Bug 813794 has been marked as a duplicate of this bug. ***
This should be reproducer for underlying problem, note you need perhaps updated util-linux mount (see bug #813563). #!/bin/bash -x mkdir -p /tmp/namespace/root/tmp mkdir -p /tmp/namespace/private mount --make-rshared / mount --rbind /tmp/namespace/root/tmp /tmp/namespace/private/ After that, even simple loop devices are busy after umount: # /aaa is image of some fs losetup /dev/loop0 /aaa mount /dev/loop0 /mnt/tst umount /mnt/tst # still everything ok # losetup -d /dev/loop0 losetup: /dev/loop0: detach failed: Device or resource busy (Can someone please try it to reproduce? I see that behaviour even on rawhide.)
I reproduced on F16. I used you steps exactly.
I'd say i can reproduce this today in f17 (yum upgraded from f16). I wanted to fsck my /home, so i logout from gnome, switched to text console, umounted /home and tried fsck. Device was still busy. After reboot (graphical target), i was still unable to do the fsck. I booted another OS, and i could fsck the partition. To test, i added a mount line in /etc/fstab, with a partition previously unused (ext4). After reboot, the partition was mounted, i umounted it, and fsck complained that the device is busy. I'll try systemctl disable the sandbox service and report back if this fixes the problem.
hm, there's no such service: # systemctl stop sandbox.service # # systemctl disable sandbox.service Failed to issue method call: No such file or directory # # chkconfig sandbox off error reading information on service sandbox: No such file or directory and the process is not running: # ps aux | grep sandbox root 1941 0.0 0.0 4736 780 pts/0 S+ 20:57 0:00 grep --color=auto sandbox # However, the device is still busy: # fsck /dev/sda1 fsck from util-linux 2.21.1 e2fsck 1.42 (29-Nov-2011) fsck.ext4: Device or resource busy while trying to open /dev/sda1 Filesystem mounted or opened exclusively by another program? # although there's nothing in it's mount point: # ls /mnt/firstpart/ # Still, as previously mentioned, jbd is doing something there: # ps aux | grep jbd | grep sda1 root 514 0.0 0.0 0 0 ? S 20:07 0:00 [jbd2/sda1-8] I must confess that i don't know what to do next :)
I think this is definitelly confirmed kernel bug. Here is proposed fix for this issue. Note that it has not been merged indo the mainline yet. http://www.spinics.net/lists/linux-fsdevel/msg55572.html Thanks! -Lukas
Lukas, I have reported two bugs (817227 and 823190) and am wondering if they might be related to this one. Can you or someone check them please? Regards, George...
Hi George, Since this bug does not caused any crashes and even I have not seen any while testing the bug and creating the fix I very much doubt that this could be related to Bug 817227. Regarding the Bug 823190, this is even less likely to be related since it has nothing to do with USB stack at all. I think that none of those are related to this bug. Is there anything what makes you think this can be related ? Thanks! -Lukas
Lukas, I've been having a heck of a time with these two bugs that I've reported. The SATA disks are connected via USB to external "docking stations" (for the lack of a better term). One of these two bugs had fixes I think but the problem is worse now than it has been. I seem to be able to create my crash/poweroff at will now. Both external disks AND keyboard AND mouse have started disappearing, not always together. Sigh. Do you know of a way that I can get extra information from these crashes? I am willing to do what I can to aid the effort. All I can see for my problem(s) is just the monitor going black and then the machine powers off IMMEDIATELY. NO other messages that I can see. Maybe it's just wishful thinking in my part thinking that these bugs might all be related since this bug seems to be relating to kernel device leaks which I have experienced (see comment #8). Thanks, George...
Hi George, it is definitely not related to this bug. Let's move the discussion to the Bug 817227. Thanks! -Lukas
pls. give some info when and in which kernel-ver this bug is assumed to be fixed. any other status update ? > Lukáš Czerner > I think this is definitelly confirmed kernel bug. Here is proposed fix for this issue. Note that it has not been merged indo the mainline yet. > > http://www.spinics.net/lists/linux-fsdevel/msg55572.html > Thanks! > -Lukas
The problem was fixed with upstream commit 63d37a84ab6004c235314ffd7a76c5eb28c2fae0 so it is in v3.5. -Lukas
(In reply to comment #62) > The problem was fixed with upstream commit > 63d37a84ab6004c235314ffd7a76c5eb28c2fae0 so it is in v3.5. That commit went into the 3.4.2 stable kernel. F16 is at 3.4.7 right now, so it should be fixed in the latest F16 kernel as well, correct?
Similar problem with Linux mn3v 3.5.2-3.fc17.x86_64 #1 SMP Tue Aug 21 19:06:52 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux The process 18297 ? D 0:08 [jbd2/sde2-8] hold a reference to /dev/sde2 lsof -n |grep sde2 jbd2/sde2 18297 root cwd DIR 9,1 4096 2 / jbd2/sde2 18297 root rtd DIR 9,1 4096 2 / jbd2/sde2 18297 root txt unknown /proc/18297/exe and the /u1 on /dev/sde2 cannot be unmounted. The usagre patterrn is that at /u1 various partitions gets mounted/unmounted for backups. Works OK for 2-3 days, then it gets stuck because of jbd2 leak. dmesg |grep sde .... [164306.036883] SELinux: initialized (dev sde2, type ext4), uses xattr [226417.206492] systemd[1]: sys-devices-pci0000:00-0000:00:14.1-ata5-host4-target4:0:0-4:0:0:0-block-sde.device changed dead -> plugged
Then the server cannot be rebooted. It stacks with a message Device /u1 cannot be unmounted after shutdown -r now. There was no any such problem since March-April. I expect it started to appear around kernel 3.5 when it was pushed to F17. It was no such problem in the past.
It was definitely no such problem with Linux comp2012 3.4.6-2.fc17.x86_64 #1 SMP Thu Jul 19 22:54:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ( a month ago) I expect some changes around 3.5 causes the problem.
It may not be related, but another computer with identical kernel and backup configuration does not experience this problem after a week. The only difference - external USB backup disk has ext3 filesystem while the computer with jdb2 problem has ext4 filesystem for backup disk.
I discovered two days ago that I can't perform maintenance tasks on seemingly unmounted ext4 filesystems as described in Comment #64 on F17. I can reproduce this with vanilla kernels 3.4.10 and with 3.5.3 as well. There are two workarounds so far: # init 1 or not mounting the ext4 filesystems on boot (noauto option for the device in fstab). Manually mounting and subsequently unmounting these filesystems doesn't leave jbd2 references to the device behind. The second workaround is IMVHO quite strange for a kernel bug.
I just disabled the journal on filesystems which are mounted/unmounted daily to perform backups fsck /dev/sde2 tune2fs -O^has_journal /dev/sde2 fsck /dev/sde2 no problem since then.
I can reproduce this as well with 3.5.3-1.fc17.x86_64 #1 SMP Wed Aug 29 18:46:34 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux I unmounted a ext4 partition contained in a LVM to check it and I cannot: # e2fsck -f /dev/dm-3 e2fsck 1.42.3 (14-May-2012) e2fsck: Device or resource busy while trying to open /dev/dm-3 Filesystem mounted or opened exclusively by another program? # lsof | grep dm-3 lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /run/user/matthieu/gvfs Output information may be incomplete. jbd2/dm-3 719 root cwd DIR 253,1 4096 2 / jbd2/dm-3 719 root rtd DIR 253,1 4096 2 / jbd2/dm-3 719 root txt unknown /proc/719/exe
Hi, I had a similar problem in Fedora 17 (currently kernel 3.5.2-3.fc17.x86_64). In my case, both cupsd and ntpd had the device mapper file open after the filesystem was unmounted. Killing ntpd and cupsd allowed the filesystem to be "fully unmounted", and fsck'd, etc.. lsof had listed nothing open on that fs, and fuser -m listed just about every process under systemd, so I wasn't sure what the case really was. In the end, grepping the LVM device name in /proc/*/mounts revealed the PIDs of cupsd and ntpd, e.g.: # grep backup /proc/*/mounts /proc/4739/mounts:/dev/mapper/vg01-backuplv /backup ext4 rw,relatime,data=ordered 0 0 /proc/568/mounts:/dev/mapper/vg01-backuplv /backup ext4 rw,relatime,data=ordered 0 0 I have no idea why cups or ntp had anything open on that filesystem, as nothing in their configs indicates that they had any business doing so. I do not have gnome or sandbox installed, and colord is not running. This is a console-only file server.
Matt, this is a bug (or a feature) in systemd. It is some sort of replacement for sandboxing services like httpd. The problem is that all filesystems mounted at boot in fstab cannot be unmounted without first stopping all kinds of important services like httpd, cupsd, ... and less important ones like colord. This is the case even for filesystems which are not even used by those services. It may be that in the very latest f17 updates this problem is solved. I need to check.
(In reply to comment #72) > this is a bug (or a feature) in systemd. Could you please point to an appropriate bug report for systemd? > It may be that in the very latest f17 updates this problem is solved. This very disturbing issue is not fixed as of the latest systemd-44-17.fc17.x86_64 and kernel-3.6.1-1.fc17.x86_64 on F17.
Ilja, I am running kernel 3.5.5-2.fc17.x86_64 and systemd-44-17.fc17.x86_64 The problem with the daemons keeping filesystems mounted is also present in that kernel/systemd combination. It prevents things like fsck, lvremove, creating new filesystems on a previously mounted but now unmounted partition. So such filesystem operations require a reboot or at least stopping all kinds of useful services, even if they do not use the filesystem in any way. I consider this a bug but I do not know of a specific bug report and would not be surprised if some people would not even want to call this behaviour a bug. Please note that what I am talking about (and what you are experiencing) is probably a different bug than the original bug reported because it is possible to bring the logical volume use count to 0 by stopping all services which have the filesystem listed in /proc/pid/mounts This is of course not practical but it shows that this particular problem is not due to a kernel leaking references. It seems that recently several different bugs cause the same symptoms. There may (or may not) be some conecttions with the problems reported here: https://bugs.freedesktop.org/show_bug.cgi?id=52039 and here https://bugzilla.redhat.com/show_bug.cgi?id=712089 I do not pretend the understand the details, but the messages seem to point to (mis)use of namespaces as a potential cause of problems.
Hi, I opened this bug (https://bugzilla.redhat.com/show_bug.cgi?id=802607) some time ago and am wondering if it's indirectly related to this bug here. I've seen automounted filesystems disappear when the system considers them to be unused. I think this is "normal" automount behavior. The filesystems in my bug are NOT automounted but I can see that if the system considered the filesystem unused and then, if it wanted to unmount the filesystem, it would be able to. George...
The Fedora 17 problem caused by systemd that Wilfried Philips mentions comes from services using PrivateTmp in their service file and systemd's incorrect use of private namespaces. The bug is fixed here, but unfortunately only in F18 not in F17 which is not a lot of help right now: https://bugzilla.redhat.com/show_bug.cgi?id=851970
So you are telling me right now I'm not able to unmount an filesystem and do a fsck with F17? I guess filesystems getting pretty irrelevant now we have the cloud...
Confirming this bug against 64bit Fedora 17 (kernel-v3.6.7 / systemd-v44)... # kpartx -a disk-image.dd # mount /dev/mapper/loop0p2 /mnt/P2 <work, work, work> # umount /mnt/P2 # losetup --detach-all losetup: /dev/loop0: detach failed: Device or resource busy
(In reply to comment #78) > Confirming this bug against 64bit Fedora 17 (kernel-v3.6.7 / systemd-v44)... > > > # kpartx -a disk-image.dd > > # mount /dev/mapper/loop0p2 /mnt/P2 > > <work, work, work> > > # umount /mnt/P2 > > # losetup --detach-all > losetup: /dev/loop0: detach failed: Device or resource busy kpartx -d