Bug 749178

Summary: Leftover dangling symlinks after a device is removed
Product: [Fedora] Fedora Reporter: Peter Rajnoha <prajnoha>
Component: systemdAssignee: udev-maint
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 19CC: agk, harald, johannbg, jonathan, jsynacek, kay, lnykryn, msekleta, s, systemd-maint, udev-maint, vpavlin, zbyszek, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-17 08:56:26 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
udevd log
none
A simple reproducer none

Description Peter Rajnoha 2011-10-26 08:14:22 EDT
Description of problem:
When running our lvm2 testsuite, we found out that there are leftover dangling symlinks in /dev/mapper. When narrowing down what's the cause of the problem, we found that it happened when a CHANGE event is followed by immediate REMOVE event (so a very close timing probably plays a primary role here).

Version-Release number of selected component (if applicable):
Tested and 100% reproducible on udev-174-2.fc17.x86_64.

How reproducible:
When running lvm2 testsuite, at the end of each test, we call "dmsetup remove -f" in a loop to remove all mappings created during testing.

If the device can't be removed immediately because it's still used by another device, the "-f" switch in the dmsetup will cause that the device won't be removed, but only the table replaced by the 'error' target to cut it off the use (this will generate a CHANGE event since the device is changed - table replaced!). 

The device will be removed if all the other devices layerd on top of it are removed. And this happens quickly. Snippet from the testsuite's "teardown_devs" script:

  while dmsetup table | grep -q ^$PREFIX; do
    for s in `dmsetup info -c -o name --noheading | grep ^$PREFIX`; do
      dmsetup remove -f $s >& /dev/null || true
    done
 done

When we tried to narrow down the problem, we found out that if the '-f' option is not used (so the CHANGE event is not generated before the REMOVE event), this problem does not occur under /dev/mapper, /dev/<vgname>, but the symlink in /dev/block still remains dangling!


Actual results:

[0] rawhide// # ls -l /dev/mapper/
total 0
crw------- 1 root root 10, 236 Oct 26 09:32 control
lrwxrwxrwx 1 root root       7 Oct 26 14:03 LVMTEST4403pv1 -> ../dm-0
lrwxrwxrwx 1 root root       7 Oct 26 14:03 LVMTEST4403pv2 -> ../dm-1
lrwxrwxrwx 1 root root       8 Oct 26 14:03 LVMTEST4403vg-mirror123_mimage_0 -> ../dm-14
lrwxrwxrwx 1 root root       8 Oct 26 14:03 LVMTEST4403vg-mirror123_mimage_1 -> ../dm-15


[0] rawhide// # ls -l /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root  9 Oct 26 09:32 ata-QEMU_DVD-ROM_QM00003 -> ../../sr0
lrwxrwxrwx 1 root root 10 Oct 26 14:03 dm-name-LVMTEST4403pv1 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Oct 26 14:03 dm-name-LVMTEST4403pv2 -> ../../dm-1
lrwxrwxrwx 1 root root 11 Oct 26 14:03 dm-name-LVMTEST4403vg-mirror123_mimage_0 -> ../../dm-14
lrwxrwxrwx 1 root root 11 Oct 26 14:03 dm-name-LVMTEST4403vg-mirror123_mimage_1 -> ../../dm-15
lrwxrwxrwx 1 root root 11 Oct 26 14:03 dm-uuid-LVM-8PD6Ngc07NGFhAhXaMeZriNxrehveZd44aQFz4zdxgxMQpyOenCpIxCvLF7zVa0b -> ../../dm-14
lrwxrwxrwx 1 root root 11 Oct 26 14:03 dm-uuid-LVM-8PD6Ngc07NGFhAhXaMeZriNxrehveZd4kzte3knj1KHLAeszapSNpbnGfXMErGur -> ../../dm-15
lrwxrwxrwx 1 root root 10 Oct 26 14:03 dm-uuid-TEST-LVMTEST4403pv1 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Oct 26 14:03 dm-uuid-TEST-LVMTEST4403pv2 -> ../../dm-1
lrwxrwxrwx 1 root root  9 Oct 26 10:58 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-1 -> ../../sdb
lrwxrwxrwx 1 root root  9 Oct 26 10:58 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-2 -> ../../sdc
lrwxrwxrwx 1 root root  9 Oct 26 10:58 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-3 -> ../../sdd
lrwxrwxrwx 1 root root  9 Oct 26 10:58 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-4 -> ../../sde
lrwxrwxrwx 1 root root  9 Oct 26 10:58 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-5 -> ../../sdf
lrwxrwxrwx 1 root root  9 Oct 26 10:58 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-6 -> ../../sdg
lrwxrwxrwx 1 root root  9 Oct 26 10:58 scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-0 -> ../../sdh
lrwxrwxrwx 1 root root  9 Oct 26 10:58 scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-1 -> ../../sdi

[0] rawhide// # ls -l /dev/block/
total 0
lrwxrwxrwx 1 root root 6 Oct 26 09:32 11:0 -> ../sr0
lrwxrwxrwx 1 root root 6 Oct 26 09:32 252:0 -> ../vda
lrwxrwxrwx 1 root root 7 Oct 26 09:32 252:1 -> ../vda1
lrwxrwxrwx 1 root root 7 Oct 26 09:32 252:2 -> ../vda2
lrwxrwxrwx 1 root root 7 Oct 26 09:32 252:3 -> ../vda3
lrwxrwxrwx 1 root root 8 Oct 26 14:03 253:17 -> ../dm-17
lrwxrwxrwx 1 root root 8 Oct 26 14:03 7:0 -> ../loop0
lrwxrwxrwx 1 root root 8 Oct 26 09:32 7:1 -> ../loop1
lrwxrwxrwx 1 root root 8 Oct 26 09:32 7:2 -> ../loop2
lrwxrwxrwx 1 root root 8 Oct 26 09:32 7:3 -> ../loop3
lrwxrwxrwx 1 root root 8 Oct 26 09:32 7:4 -> ../loop4
lrwxrwxrwx 1 root root 8 Oct 26 09:32 7:5 -> ../loop5
lrwxrwxrwx 1 root root 8 Oct 26 09:32 7:6 -> ../loop6
lrwxrwxrwx 1 root root 8 Oct 26 10:53 7:7 -> ../loop7
lrwxrwxrwx 1 root root 6 Oct 26 11:05 8:0 -> ../sda
lrwxrwxrwx 1 root root 6 Oct 26 10:58 8:112 -> ../sdh
lrwxrwxrwx 1 root root 6 Oct 26 10:58 8:128 -> ../sdi
lrwxrwxrwx 1 root root 6 Oct 26 10:58 8:16 -> ../sdb
lrwxrwxrwx 1 root root 6 Oct 26 10:58 8:32 -> ../sdc
lrwxrwxrwx 1 root root 6 Oct 26 10:58 8:48 -> ../sdd
lrwxrwxrwx 1 root root 6 Oct 26 10:58 8:64 -> ../sde
lrwxrwxrwx 1 root root 6 Oct 26 10:58 8:80 -> ../sdf
lrwxrwxrwx 1 root root 6 Oct 26 10:58 8:96 -> ../sdg

[0] rawhide// # ls -l /dev/LVMTEST4403vg/
total 0
lrwxrwxrwx 1 root root 8 Oct 26 14:03 mirror123_mimage_0 -> ../dm-14
lrwxrwxrwx 1 root root 8 Oct 26 14:03 mirror123_mimage_1 -> ../dm-15

All dm-X devices *do not exist anymore* and all those symlinks are *dangling*.

Expected results:
Symlinks removed for already removed devices.

Additional information:
If running the test several times, we end up with different number of dangling symlinks, so I guess this must be some form of a race.

If you need more logs, debugging or more info, please, let me know...
Comment 1 Peter Rajnoha 2011-10-26 08:17:53 EDT
Created attachment 530278 [details]
udevd log
Comment 2 Harald Hoyer 2011-11-01 06:06:38 EDT
[16300.087922] [4617] udev_event_execute_rules: no node name set, will use kernel supplied name 'dm-15'
[16300.087949] [4617] udev_node_add: creating device node '/dev/dm-15', devnum=253:15, mode=0660, uid=0, gid=6
[16300.087990] [4617] udev_node_mknod: preserve file '/dev/dm-15', because it has correct dev_t
[16300.088034] [4617] udev_node_mknod: preserve permissions /dev/dm-15, 060660, uid=0, gid=6
[16300.088117] [4617] node_symlink: preserve already existing symlink '/dev/block/253:15' to '../dm-15'
[16300.088195] [4617] link_update: creating link '/dev/mapper/LVMTEST4403vg-mirror123_mimage_0' to '/dev/dm-15'
[16300.088246] [4617] node_symlink: creating symlink '/dev/mapper/LVMTEST4403vg-mirror123_mimage_0' to '../dm-15'
[16300.088658] [4617] udev_device_update_db: created db file '/run/udev/data/b253:15' for '/devices/virtual/block/dm-15'


[16301.679302] [4629] link_update: no reference left, remove '/dev/mapper/LVMTEST4403vg-mirror123_mimage_0'
[16301.679409] [4629] udev_node_remove: device node '/dev/dm-15' not found

[16301.758867] [4605] udev_device_read_db: no db file to read /run/udev/data/b253:15: No such file or directory
[16301.759431] [4605] udev_node_remove: device node '/dev/dm-15' not found

[16309.582195] [4618] udev_rules_apply_to_event: IMPORT 'udisks-part-id /dev/dm-15' /lib/udev/rules.d/80-udisks.rules:88
[16309.590406] [5589] udev_event_spawn: starting 'udisks-part-id /dev/dm-15'

[16309.795930] [4618] spawn_read: 'udisks-part-id /dev/dm-15'(err) 'Error statting /dev/dm-15: No such file or directory'
[16309.796137] [4618] spawn_read: 'udisks-part-id /dev/dm-15'(err) 'using device_file=(null) syspath=(null), offset=0 ao=0 and number=0 for /dev/dm-15'
[16309.796218] [4618] spawn_read: 'udisks-part-id /dev/dm-15'(err) 'Error opening (null): Bad address'

[16309.839351] [4618] spawn_wait: 'udisks-part-id /dev/dm-15' [5589] exit with return code 0
[16309.839643] [4618] udev_rules_apply_to_event: RUN '$env{DM_SBIN_PATH}/dmsetup udevcomplete $env{DM_COOKIE}' /lib/udev/rules.d/95-dm-notify.rules:12
[16309.839687] [4618] udev_event_execute_rules: no node name set, will use kernel supplied name 'dm-15'
[16309.839716] [4618] udev_node_add: creating device node '/dev/dm-15', devnum=253:15, mode=0660, uid=0, gid=6
[16309.839785] [4618] udev_node_mknod: mknod '/dev/dm-15' 253:15 060660
[16309.839854] [4618] udev_node_mknod: set permissions '/dev/dm-15' 060660 uid=0 gid=6
[16309.839937] [4618] node_symlink: preserve already existing symlink '/dev/block/253:15' to '../dm-15'
[16309.840096] [4618] link_update: creating link '/dev/LVMTEST4403vg/mirror123_mimage_1' to '/dev/dm-15'
[16309.840159] [4618] node_symlink: creating symlink '/dev/LVMTEST4403vg/mirror123_mimage_1' to '../dm-15'
[16309.840391] [4618] link_update: creating link '/dev/disk/by-id/dm-name-LVMTEST4403vg-mirror123_mimage_1' to '/dev/dm-15'
[16309.840443] [4618] node_symlink: creating symlink '/dev/disk/by-id/dm-name-LVMTEST4403vg-mirror123_mimage_1' to '../../dm-15'
[16309.840671] [4618] link_update: creating link '/dev/disk/by-id/dm-uuid-LVM-8PD6Ngc07NGFhAhXaMeZriNxrehveZd4kzte3knj1KHLAeszapSNpbnGfXMErGur' to '/dev/dm-15'
[16309.840723] [4618] node_symlink: creating symlink '/dev/disk/by-id/dm-uuid-LVM-8PD6Ngc07NGFhAhXaMeZriNxrehveZd4kzte3knj1KHLAeszapSNpbnGfXMErGur' to '../../dm-15'
[16309.840964] [4618] link_find_prioritized: found 'b253:15' claiming '/run/udev/links/mapper\x2fLVMTEST4403vg-mirror123_mimage_1'
[16309.841000] [4618] link_update: creating link '/dev/mapper/LVMTEST4403vg-mirror123_mimage_1' to '/dev/dm-15'
[16309.841088] [4618] node_symlink: preserve already existing symlink '/dev/mapper/LVMTEST4403vg-mirror123_mimage_1' to '../dm-15'
[16309.841343] [4618] udev_device_update_db: created db file '/run/udev/data/b253:15' for '/devices/virtual/block/dm-15'

[16310.460871] [4618] spawn_read: 'udisks-part-id /dev/dm-15'(err) 'Error getting udev device for /dev/dm-15: No such file or directory'
[16310.461130] [5738] udev_event_spawn: starting 'udisks-part-id /dev/dm-14'
[16310.465065] [4618] spawn_read: 'udisks-part-id /dev/dm-15'(err) 'using device_file=(null) syspath=(null), offset=0 ao=0 and number=0 for /dev/dm-15'
[16310.465150] [4618] spawn_read: 'udisks-part-id /dev/dm-15'(err) 'Error opening (null): Bad address'
[16310.467340] [4618] spawn_wait: 'udisks-part-id /dev/dm-15' [5734] exit with return code 0
[16310.467483] [4618] udev_rules_apply_to_event: RUN '$env{DM_SBIN_PATH}/dmsetup udevcomplete $env{DM_COOKIE}' /lib/udev/rules.d/95-dm-notify.rules:12
[16310.467530] [4618] udev_event_execute_rules: no node name set, will use kernel supplied name 'dm-15'
[16310.467556] [4618] udev_node_add: creating device node '/dev/dm-15', devnum=253:15, mode=0660, uid=0, gid=6
[16310.467593] [4618] udev_node_mknod: preserve file '/dev/dm-15', because it has correct dev_t
[16310.467612] [4618] udev_node_mknod: preserve permissions /dev/dm-15, 060660, uid=0, gid=6
[16310.467676] [4618] node_symlink: preserve already existing symlink '/dev/block/253:15' to '../dm-15'
[16310.467786] [4618] link_find_prioritized: found 'b253:0' claiming '/run/udev/links/disk\x2fby-id\x2fdm-name-'
[16310.467925] [4618] link_update: creating link '/dev/disk/by-id/dm-name-' to '/dev/dm-15'
[16310.467975] [4618] node_symlink: atomically replace '/dev/disk/by-id/dm-name-'
[16310.468409] [4618] udev_device_update_db: created db file '/run/udev/data/b253:15' for '/devices/virtual/block/dm-15'
Comment 3 Peter Rajnoha 2011-11-09 07:25:01 EST
Harald, any insight of what's going on here from udev point of view?
Comment 4 Kay Sievers 2011-11-09 08:31:52 EST
The /dev/block/M:m bug should be fixed in rawhide.
Comment 5 Peter Rajnoha 2011-11-09 09:15:40 EST
Thanks, I've tried that, the /dev/block/M:m symlinks are properly cleaned up now. Now the only remaining problem is the one with the REMOVE event generated close to previous CHANGE event and those other dangling symlinks...
Comment 6 Kay Sievers 2011-11-10 04:43:52 EST
I guess it's a race we don't make any guarantees. The test claims the same
symlink names for different devices, dm14/15. There is no race-free protection
for that kind of concurrency.

[16300.088195] [4617] link_update: creating link
  '/dev/mapper/LVMTEST4403vg-mirror123_mimage_0' to '/dev/dm-15'

[16307.263139] [5311] link_update: creating link
  '/dev/mapper/LVMTEST4403vg-mirror123_mimage_0' to '/dev/dm-14'
Comment 7 Peter Rajnoha 2011-11-10 07:42:09 EST
(In reply to comment #6)
> The test claims the same symlink names for different devices, dm14/15.

Hmm, but the sequence in the log says:

=== snip ===
[16300.088195] [4617] link_update: creating link '/dev/mapper/LVMTEST4403vg-mirror123_mimage_0' to '/dev/dm-15'
...
[16301.679302] [4629] link_update: no reference left, remove '/dev/mapper/LVMTEST4403vg-mirror123_mimage_0'
...
[16307.263139] [5311] link_update: creating link '/dev/mapper/LVMTEST4403vg-mirror123_mimage_0' to '/dev/dm-14'
============

So there is the "no reference left, remove..." which is about 6 seconds before the second "link_update: creating link".

Now, I can't see the other "remove" for the second link to /dev/dm-14. This part of the log is interesting as well:

=== snip ====
491491] [4629] link_update: creating link '/dev/disk/by-id/dm-name-' to '/dev/dm-14'
[16310.491540] [4629] node_symlink: creating symlink '/dev/disk/by-id/dm-name-' to '../../dm-14'
[16310.491860] [4629] udev_device_update_db: created db file '/run/udev/data/b253:14' for '/devices/virtual/block/dm-14'
[16310.512766] [4605] udev_node_remove: removing device node '/dev/dm-14'
[16310.530904] [4605] udev_node_remove: device node '/dev/dm-14' not found
==============

So we're creating the symlink and just after that removing it (that could be the CHANGE event and the REMOVE event during processing the dmsetup remove -f that changes the table first to include the error target, generates the CHANGE, and then removes the device in next cycle which happens quickly as already mentioned in comment #0).

However, I don't quite understand why the node is being removed and no symlinks at all.
Comment 8 Kay Sievers 2011-11-10 09:36:53 EST
Maybe unreleated, but what is going wrong with the name?
  creating link '/dev/disk/by-id/dm-name-'
Comment 9 Peter Rajnoha 2011-11-10 10:11:45 EST
(In reply to comment #8)
> Maybe unreleated, but what is going wrong with the name?
>   creating link '/dev/disk/by-id/dm-name-'

Oh, nice, I didn't spot this one. It's a little bit related to the original problem in a way that there's the CHANGE event + immediate REMOVE event. I guess (and I hope) this can't be the source of the problem with dangling symlinks though...

I can imagine that as the CHANGE event is still being processed, the dm device underneath is already gone. We're reading /sys/block/dm-X/dm/name to get the device-mapper name and use this value to set symlinks in /dev/mapper and /dev/disk/ content. So that might be the cause for 99%.

(Also, we're not synchronizing with udev each device one by one in our testsuite's "teardown devs" script, since this is just a cleanup after the test is completed and those devices are not meant to be used anymore - we're synchronizing only at the end of the whole operation after all devices are removed in this situation.)

The whole problem here is that all those operations happen quickly so that as one event is still being processed in userspace, the kernel is already in another state with all its devices...
Comment 10 Kay Sievers 2011-11-10 12:16:30 EST
Probably also unrelated, what's the (DM_DIR) supposed to do?
  ENV{DM_NAME}=="?*", SYMLINK+="(DM_DIR)/$env{DM_NAME}"
Comment 11 Peter Rajnoha 2011-11-10 17:20:13 EST
(In reply to comment #10)
> Probably also unrelated, what's the (DM_DIR) supposed to do?
>   ENV{DM_NAME}=="?*", SYMLINK+="(DM_DIR)/$env{DM_NAME}"

That's just a template. We replace the "(DM_DIR)" with actual mapper directory that is defined in libdm/misc/dm-ioctl.h. Normally, it's the "/dev/mapper" of course... well take it as it's always the "mapper" dir :)
Comment 12 Peter Rajnoha 2012-01-31 07:57:03 EST
Kay, any idea what could go wrong here? We still have the /dev/mapper full of dangling symlinks after running our testsuite. It would be fine to have this resolved. I can provide more debug info if needed, just tell me what you need...
Comment 13 Kay Sievers 2012-01-31 10:22:58 EST
Have a simple command sequence I can try to see that issue on my box?
Comment 14 Peter Rajnoha 2012-02-01 10:32:26 EST
(In reply to comment #13)
> Have a simple command sequence I can try to see that issue on my box?

Yup, try this simple script - it's basically the same what we do in our testsuite.
Choose some backing store (only 1 sector needed) and use it as an argument for the script, e.g. "dangling_symlinks_reproducer.sh /dev/sda".
Comment 15 Peter Rajnoha 2012-02-01 10:33:33 EST
Created attachment 558852 [details]
A simple reproducer
Comment 16 Peter Rajnoha 2012-02-01 10:36:42 EST
Maybe you need to run it a few times to actually hit the race... With this  reproducer, I can hit it 90% of the time.
Comment 17 Peter Rajnoha 2012-02-02 07:31:41 EST
(please use libdevmapper >= v1.02.65, lvm2 >= v2.02.86 - before we used the fallback code if there's was some inconsistency detected in symlinks, from those version on, we completely rely on udev and we do not do any checks by default)
Comment 18 Fedora End Of Life 2013-04-03 13:56:18 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
Comment 19 Fedora End Of Life 2015-01-09 11:50:17 EST
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 20 Fedora End Of Life 2015-02-17 08:56:26 EST
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.