Bug 1119561 - Dangling symlinks left in /dev after device removal
Summary: Dangling symlinks left in /dev after device removal
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd
Version: 7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: systemd-maint
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
: 1091430 1152331 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-15 04:34 UTC by Mark Goodwin
Modified: 2015-02-18 21:59 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-14 14:22:35 UTC


Attachments (Terms of Use)
udevd failing to remove symlinks on remove event (8.84 KB, application/octet-stream)
2014-08-28 07:19 UTC, Peter Rajnoha
no flags Details

Description Mark Goodwin 2014-07-15 04:34:04 UTC
Description of problem: removing a snapshot sometimes doesn't delete the cow volume from /dev/mapper.

Version-Release number of selected component (if applicable):
kernel 3.10.0-121.el7.x86_64
lvm2-2.02.105-14.el7.x86_64
device-mapper-1.02.84-14.el7.x86_64

How reproducible: always

Steps to Reproduce:
1. create a snapshot
2. create a second snapshot of the same basevol
3. lvremove the first snapshot
4. create a third snapshot of the same basevol

Actual results:
cow volume of the first snapshot still remains in /dev/mapper, and points to the same dm-[0-9]+ device as the third snapshot. The current /sys/block/dm-X/dm/name correctly matches the third snapshot.

Expected results: cow volume for deleted snapshot should be removed from /dev/mapper

Comment 2 Mark Goodwin 2014-07-15 05:14:06 UTC
Here's an example using snapshots :

[root@ocean ~]# lvs
  LV     VG     Attr       LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  home   rootvg -wi-ao----  97.66g
  root   rootvg -wi-ao----  48.83g
  swap   rootvg -wi-ao----   5.88g
  backup virtvg -wi-ao---- 850.00g
  test   virtvg -wi-a----- 100.00g
[root@ocean ~]# ls -l /dev/mapper
total 0
crw-------. 1 root root 10, 236 Jul 14 20:15 control
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-home -> ../dm-4
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-root -> ../dm-1
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-swap -> ../dm-0
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 virtvg-backup -> ../dm-2
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 virtvg-test -> ../dm-3
[root@ocean ~]# lvcreate -L1M -n snap1 -s virtvg/backup
  Rounding up size to full physical extent 4.00 MiB
  Logical volume "snap1" created
[root@ocean ~]# ls -l /dev/mapper
total 0
crw-------. 1 root root 10, 236 Jul 14 20:15 control
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-home -> ../dm-4
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-root -> ../dm-1
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-swap -> ../dm-0
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-backup -> ../dm-2
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-backup-real -> ../dm-6
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-snap1 -> ../dm-5
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-snap1-cow -> ../dm-7
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 virtvg-test -> ../dm-3
[root@ocean ~]# lvcreate -L1M -n snap2 -s virtvg/backup
  Rounding up size to full physical extent 4.00 MiB
  Logical volume "snap2" created
[root@ocean ~]# ls -l /dev/mapper
total 0
crw-------. 1 root root 10, 236 Jul 14 20:15 control
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-home -> ../dm-4
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-root -> ../dm-1
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-swap -> ../dm-0
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-backup -> ../dm-2
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-backup-real -> ../dm-6
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-snap1 -> ../dm-5
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-snap1-cow -> ../dm-7
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-snap2 -> ../dm-8
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-snap2-cow -> ../dm-9
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 virtvg-test -> ../dm-3
[root@ocean ~]# lvremove /dev/mapper/virtvg-snap1
Do you really want to remove active logical volume snap1? [y/n]: y
  Logical volume "snap1" successfully removed
[root@ocean ~]# ls -l /dev/mapper
total 0
crw-------. 1 root root 10, 236 Jul 14 20:15 control
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-home -> ../dm-4
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-root -> ../dm-1
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-swap -> ../dm-0
lrwxrwxrwx. 1 root root       7 Jul 15 13:08 virtvg-backup -> ../dm-2
lrwxrwxrwx. 1 root root       7 Jul 15 13:08 virtvg-backup-real -> ../dm-6
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-snap1-cow -> ../dm-7
lrwxrwxrwx. 1 root root       7 Jul 15 13:08 virtvg-snap2 -> ../dm-8
lrwxrwxrwx. 1 root root       7 Jul 15 13:08 virtvg-snap2-cow -> ../dm-9
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 virtvg-test -> ../dm-3
[root@ocean ~]# lvcreate -L1M -n snap3 -s virtvg/backup
  Rounding up size to full physical extent 4.00 MiB
  Logical volume "snap3" created
[root@ocean ~]# ls -l /dev/mapper
total 0
crw-------. 1 root root 10, 236 Jul 14 20:15 control
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-home -> ../dm-4
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-root -> ../dm-1
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 rootvg-swap -> ../dm-0
lrwxrwxrwx. 1 root root       7 Jul 15 13:09 virtvg-backup -> ../dm-2
lrwxrwxrwx. 1 root root       7 Jul 15 13:09 virtvg-backup-real -> ../dm-6
lrwxrwxrwx. 1 root root       7 Jul 15 13:07 virtvg-snap1-cow -> ../dm-7
lrwxrwxrwx. 1 root root       7 Jul 15 13:09 virtvg-snap2 -> ../dm-8
lrwxrwxrwx. 1 root root       7 Jul 15 13:09 virtvg-snap2-cow -> ../dm-9
lrwxrwxrwx. 1 root root       7 Jul 15 13:09 virtvg-snap3 -> ../dm-5
lrwxrwxrwx. 1 root root       7 Jul 15 13:09 virtvg-snap3-cow -> ../dm-7
lrwxrwxrwx. 1 root root       7 Jul 14 20:20 virtvg-test -> ../dm-3 

virtvg-snap1-cow (dm-7) shouldn't exist since the snapshot was deleted
and dm-7 has been reused for virtvg-snap3-cow. The definitive mapping
is the sysfs exported name, e.g. :
[root@ocean ~]# cat /sys/block/dm-7/dm/name
virtvg-snap3-cow

Comment 3 Alasdair Kergon 2014-07-16 01:42:48 UTC
Reproduced.
vgmknodes fixes it.
As /dev/mapper nodes are for internal use, it's harmless.

We need to see if it's an old bug masked by the 'verify_udev_operations' code, or a regression in the snapshot processing or udev rules.

Comment 4 Mark Goodwin 2014-07-16 02:24:59 UTC
Thanks for looking at this one. It's not entirely harmless - there is code in PCP that walks /dev/mapper to determine dm-[0-9]+ mappings, so we can make sense of the dm entries in /proc/diskstats. Encountering two entries with the *same* mapping lead to unexpected results (and is why we switched to using /sys/block/$DMNAME/dm/name instead).

Comment 5 Peter Rajnoha 2014-08-28 07:19:25 UTC
Created attachment 931740 [details]
udevd failing to remove symlinks on remove event

We completely rely on udev in RHEL7 and libdevmapper/LVM code does not touch /dev unless someone changes verify_udev_operations=1 in lvm.conf to change the default behaviour - but this is not the case of this bugzilla. Even with that, in case there's REMOVE event, udev should just remove the node - this is not happening. Attaching logs for the failing case, these symlinks were not removed by udev:

ls -la /dev/mapper/
  ...
  8 lrwxrwxrwx  1 root root       7 Aug 28 02:56 vg-lvol0-real -> ../dm-4
  9 lrwxrwxrwx  1 root root       7 Aug 28 02:56 vg-lvol1-cow -> ../dm-5

A probable source of the problem might be quick sequence of the CHANGE event and REMOVE event which udev does not handle correctly for some reason, but systemd/udev team needs to inspect here more...

Attaching logs - /dev/mapper listing, systemd-udevd debug log and udevadm monitor log. If there's anything else you need for debugging, just tell me.

(changing component to systemd/udev)

Comment 6 Peter Rajnoha 2014-08-28 07:28:26 UTC
Note: the logs attached in attachment 931740 [details] are from:

[0] rhel7-a/~ # rpm -q kernel systemd lvm2
kernel-3.10.0-131.el7.x86_64
systemd-208-11.el7_0.2.x86_64
lvm2-2.02.105-14.el7.x86_64

Comment 8 Peter Rajnoha 2014-08-28 08:23:36 UTC
*** Bug 1091430 has been marked as a duplicate of this bug. ***

Comment 9 Lukáš Nykrýn 2014-08-28 12:05:54 UTC
Resetting devel-ack, since it was moved to different component.

Comment 11 Peter Rajnoha 2014-10-16 08:35:05 UTC
*** Bug 1152331 has been marked as a duplicate of this bug. ***

Comment 13 Kay Sievers 2014-10-17 07:52:11 UTC
This is a known and years old issue. We have really no idea why that happens
and what goes wrong here. It might be some race in the udev event serialization,
but, I am sorry, I have no idea how to debug that.

Comment 14 Nenad Peric 2014-12-05 15:52:13 UTC
Not sure if it is related to this bug but here:

[root@tardis-01 ~]# vgremove cache_6_8869
Do you really want to remove volume group "cache_6_8869" containing 1 logical volumes? [y/n]: y
Do you really want to remove active clustered logical volume cache_6_88690? [y/n]: y
  Logical volume "cache_6_88690_fast" successfully removed
  Logical volume "cache_6_88690" successfully removed
  Volume group "cache_6_8869" successfully removed
[root@tardis-01 ~]# dmsetup ls
rhel_tardis--01-swap	(253:1)
rhel_tardis--01-root	(253:0)
rhel_tardis--01-home	(253:2)
[root@tardis-01 ~]# ls -l /dev/mapper/
total 0
lrwxrwxrwx. 1 root root       7 Dec  5 16:37 cache_6_8869-cache_6_88690_corig -> ../dm-5
crw-------. 1 root root 10, 236 Dec  5 16:11 control
lrwxrwxrwx. 1 root root       7 Dec  5 16:11 rhel_tardis--01-home -> ../dm-2
lrwxrwxrwx. 1 root root       7 Dec  5 16:11 rhel_tardis--01-root -> ../dm-0
lrwxrwxrwx. 1 root root       7 Dec  5 16:11 rhel_tardis--01-swap -> ../dm-1


That first line with cache_lv is a danglink link pointing to nothing (non-existing dm-5)

Comment 15 Michal Sekletar 2015-01-14 14:22:35 UTC
As per comment #13.

Comment 16 Mark Goodwin 2015-01-14 23:25:29 UTC
Not convinced Comment #13 is sufficient reason NOT to fix this, but anyway it's your call. There must be a subtle udev bug here somewhere and I'd bet this BZ isn't the only victim.

-- Mark

Comment 17 Zdenek Kabelac 2015-01-15 10:07:32 UTC
I'm also unconvinced we should resolve bugs we do not yet understand as 'CANTFIX'

Comment 18 Corey Marthaler 2015-02-18 21:59:58 UTC
Still hitting this issue fairly reliably. Adding a note in this bugzilla for when I go searching for the state bug 1152331. 

I'll add hacks in our tests to get around this based on the current status of this bug.


Note You need to log in before you can comment on or make changes to this bug.