Bug 1119561
| Summary: | Dangling symlinks left in /dev after device removal | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Mark Goodwin <mgoodwin> | ||||
| Component: | systemd | Assignee: | systemd-maint | ||||
| Status: | CLOSED CANTFIX | QA Contact: | qe-baseos-daemons | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 7.0 | CC: | agk, cmarthal, heinzm, jbrassow, jscotka, lnykryn, msekleta, msnitzer, nathans, nperic, prajnoha, prockai, systemd-maint-list, zkabelac | ||||
| Target Milestone: | rc | Keywords: | TestBlocker | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-01-14 14:22:35 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Mark Goodwin
2014-07-15 04:34:04 UTC
Here's an example using snapshots : [root@ocean ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert home rootvg -wi-ao---- 97.66g root rootvg -wi-ao---- 48.83g swap rootvg -wi-ao---- 5.88g backup virtvg -wi-ao---- 850.00g test virtvg -wi-a----- 100.00g [root@ocean ~]# ls -l /dev/mapper total 0 crw-------. 1 root root 10, 236 Jul 14 20:15 control lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-home -> ../dm-4 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-root -> ../dm-1 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-swap -> ../dm-0 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 virtvg-backup -> ../dm-2 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 virtvg-test -> ../dm-3 [root@ocean ~]# lvcreate -L1M -n snap1 -s virtvg/backup Rounding up size to full physical extent 4.00 MiB Logical volume "snap1" created [root@ocean ~]# ls -l /dev/mapper total 0 crw-------. 1 root root 10, 236 Jul 14 20:15 control lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-home -> ../dm-4 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-root -> ../dm-1 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-swap -> ../dm-0 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-backup -> ../dm-2 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-backup-real -> ../dm-6 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-snap1 -> ../dm-5 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-snap1-cow -> ../dm-7 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 virtvg-test -> ../dm-3 [root@ocean ~]# lvcreate -L1M -n snap2 -s virtvg/backup Rounding up size to full physical extent 4.00 MiB Logical volume "snap2" created [root@ocean ~]# ls -l /dev/mapper total 0 crw-------. 1 root root 10, 236 Jul 14 20:15 control lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-home -> ../dm-4 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-root -> ../dm-1 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-swap -> ../dm-0 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-backup -> ../dm-2 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-backup-real -> ../dm-6 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-snap1 -> ../dm-5 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-snap1-cow -> ../dm-7 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-snap2 -> ../dm-8 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-snap2-cow -> ../dm-9 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 virtvg-test -> ../dm-3 [root@ocean ~]# lvremove /dev/mapper/virtvg-snap1 Do you really want to remove active logical volume snap1? [y/n]: y Logical volume "snap1" successfully removed [root@ocean ~]# ls -l /dev/mapper total 0 crw-------. 1 root root 10, 236 Jul 14 20:15 control lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-home -> ../dm-4 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-root -> ../dm-1 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-swap -> ../dm-0 lrwxrwxrwx. 1 root root 7 Jul 15 13:08 virtvg-backup -> ../dm-2 lrwxrwxrwx. 1 root root 7 Jul 15 13:08 virtvg-backup-real -> ../dm-6 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-snap1-cow -> ../dm-7 lrwxrwxrwx. 1 root root 7 Jul 15 13:08 virtvg-snap2 -> ../dm-8 lrwxrwxrwx. 1 root root 7 Jul 15 13:08 virtvg-snap2-cow -> ../dm-9 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 virtvg-test -> ../dm-3 [root@ocean ~]# lvcreate -L1M -n snap3 -s virtvg/backup Rounding up size to full physical extent 4.00 MiB Logical volume "snap3" created [root@ocean ~]# ls -l /dev/mapper total 0 crw-------. 1 root root 10, 236 Jul 14 20:15 control lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-home -> ../dm-4 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-root -> ../dm-1 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 rootvg-swap -> ../dm-0 lrwxrwxrwx. 1 root root 7 Jul 15 13:09 virtvg-backup -> ../dm-2 lrwxrwxrwx. 1 root root 7 Jul 15 13:09 virtvg-backup-real -> ../dm-6 lrwxrwxrwx. 1 root root 7 Jul 15 13:07 virtvg-snap1-cow -> ../dm-7 lrwxrwxrwx. 1 root root 7 Jul 15 13:09 virtvg-snap2 -> ../dm-8 lrwxrwxrwx. 1 root root 7 Jul 15 13:09 virtvg-snap2-cow -> ../dm-9 lrwxrwxrwx. 1 root root 7 Jul 15 13:09 virtvg-snap3 -> ../dm-5 lrwxrwxrwx. 1 root root 7 Jul 15 13:09 virtvg-snap3-cow -> ../dm-7 lrwxrwxrwx. 1 root root 7 Jul 14 20:20 virtvg-test -> ../dm-3 virtvg-snap1-cow (dm-7) shouldn't exist since the snapshot was deleted and dm-7 has been reused for virtvg-snap3-cow. The definitive mapping is the sysfs exported name, e.g. : [root@ocean ~]# cat /sys/block/dm-7/dm/name virtvg-snap3-cow Reproduced. vgmknodes fixes it. As /dev/mapper nodes are for internal use, it's harmless. We need to see if it's an old bug masked by the 'verify_udev_operations' code, or a regression in the snapshot processing or udev rules. Thanks for looking at this one. It's not entirely harmless - there is code in PCP that walks /dev/mapper to determine dm-[0-9]+ mappings, so we can make sense of the dm entries in /proc/diskstats. Encountering two entries with the *same* mapping lead to unexpected results (and is why we switched to using /sys/block/$DMNAME/dm/name instead). Created attachment 931740 [details]
udevd failing to remove symlinks on remove event
We completely rely on udev in RHEL7 and libdevmapper/LVM code does not touch /dev unless someone changes verify_udev_operations=1 in lvm.conf to change the default behaviour - but this is not the case of this bugzilla. Even with that, in case there's REMOVE event, udev should just remove the node - this is not happening. Attaching logs for the failing case, these symlinks were not removed by udev:
ls -la /dev/mapper/
...
8 lrwxrwxrwx 1 root root 7 Aug 28 02:56 vg-lvol0-real -> ../dm-4
9 lrwxrwxrwx 1 root root 7 Aug 28 02:56 vg-lvol1-cow -> ../dm-5
A probable source of the problem might be quick sequence of the CHANGE event and REMOVE event which udev does not handle correctly for some reason, but systemd/udev team needs to inspect here more...
Attaching logs - /dev/mapper listing, systemd-udevd debug log and udevadm monitor log. If there's anything else you need for debugging, just tell me.
(changing component to systemd/udev)
Note: the logs attached in attachment 931740 [details] are from:
[0] rhel7-a/~ # rpm -q kernel systemd lvm2
kernel-3.10.0-131.el7.x86_64
systemd-208-11.el7_0.2.x86_64
lvm2-2.02.105-14.el7.x86_64
*** Bug 1091430 has been marked as a duplicate of this bug. *** Resetting devel-ack, since it was moved to different component. *** Bug 1152331 has been marked as a duplicate of this bug. *** This is a known and years old issue. We have really no idea why that happens and what goes wrong here. It might be some race in the udev event serialization, but, I am sorry, I have no idea how to debug that. Not sure if it is related to this bug but here: [root@tardis-01 ~]# vgremove cache_6_8869 Do you really want to remove volume group "cache_6_8869" containing 1 logical volumes? [y/n]: y Do you really want to remove active clustered logical volume cache_6_88690? [y/n]: y Logical volume "cache_6_88690_fast" successfully removed Logical volume "cache_6_88690" successfully removed Volume group "cache_6_8869" successfully removed [root@tardis-01 ~]# dmsetup ls rhel_tardis--01-swap (253:1) rhel_tardis--01-root (253:0) rhel_tardis--01-home (253:2) [root@tardis-01 ~]# ls -l /dev/mapper/ total 0 lrwxrwxrwx. 1 root root 7 Dec 5 16:37 cache_6_8869-cache_6_88690_corig -> ../dm-5 crw-------. 1 root root 10, 236 Dec 5 16:11 control lrwxrwxrwx. 1 root root 7 Dec 5 16:11 rhel_tardis--01-home -> ../dm-2 lrwxrwxrwx. 1 root root 7 Dec 5 16:11 rhel_tardis--01-root -> ../dm-0 lrwxrwxrwx. 1 root root 7 Dec 5 16:11 rhel_tardis--01-swap -> ../dm-1 That first line with cache_lv is a danglink link pointing to nothing (non-existing dm-5) As per comment #13. Not convinced Comment #13 is sufficient reason NOT to fix this, but anyway it's your call. There must be a subtle udev bug here somewhere and I'd bet this BZ isn't the only victim. -- Mark I'm also unconvinced we should resolve bugs we do not yet understand as 'CANTFIX' Still hitting this issue fairly reliably. Adding a note in this bugzilla for when I go searching for the state bug 1152331. I'll add hacks in our tests to get around this based on the current status of this bug. |