Bug 2123648

Summary: Stale devlink device units are not cleaned up while dispatching UDEV change events
Product: Red Hat Enterprise Linux 9 Reporter: Neil Wilson <neil>
Component: systemdAssignee: systemd-maint
Status: CLOSED ERRATA QA Contact: Frantisek Sumsal <fsumsal>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.0CC: dtardon, jamacku, systemd-maint-list
Target Milestone: rcKeywords: FeatureBackport, TestOnly, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: systemd-252-3.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 08:21:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2138081    
Bug Blocks:    

Description Neil Wilson 2022-09-02 09:33:47 UTC
Description of problem:

dangling device units are not unloaded properly, which leads to increased memory usage by subsequent calls to systemctl. 


Version-Release number of selected component (if applicable):

systemd-239-58.el8_6.4
systemd-250-6.el9

How reproducible:


Steps to Reproduce:
1.$ sudo lvcreate -n test-a-1 -L1G test
  Logical volume "test-a-1" created.
2.$ systemctl list-units --all 'dev-disk-by\x2did-dm*'
  UNIT                                                                                                           LOAD   ACTIVE SUB     DESCRIPTION                                                                >
  dev-disk-by\x2did-dm\x2dname\x2dtest\x2dtest\x2d\x2da\x2d\x2d1.device                                          loaded active plugged /dev/disk/by-id/dm-name-test-test--a--1
  dev-disk-by\x2did-dm\x2duuid\x2dLVM\x2dUOSUxz2ZQYd27F1vpz3bAZFukcNK3aQeKgxkfRkc06BhYHctCHcIUhMgQEW1dlIF.device loaded active plugged /dev/disk/by-id/dm-uuid-LVM-UOSUxz2ZQYd27F1vpz3bAZFukcNK3aQeKgxkfRkc06BhYHc>

3.$ sudo lvrename /dev/test/test-a-1 test-a-2
  Renamed "test-a-1" to "test-a-2" in volume group "test"
4.$ systemctl list-units --all 'dev-disk-by\x2did-dm*'
  UNIT                                                                                                           LOAD   ACTIVE SUB     DESCRIPTION                                                                >
  dev-disk-by\x2did-dm\x2dname\x2dtest\x2dtest\x2d\x2da\x2d\x2d1.device                                          loaded active plugged /dev/disk/by-id/dm-name-test-test--a--1
  dev-disk-by\x2did-dm\x2dname\x2dtest\x2dtest\x2d\x2da\x2d\x2d2.device                                          loaded active plugged /dev/disk/by-id/dm-name-test-test--a--2
  dev-disk-by\x2did-dm\x2duuid\x2dLVM\x2dUOSUxz2ZQYd27F1vpz3bAZFukcNK3aQeKgxkfRkc06BhYHctCHcIUhMgQEW1dlIF.device loaded active plugged /dev/disk/by-id/dm-uuid-LVM-UOSUxz2ZQYd27F1vpz3bAZFukcNK3aQeKgxkfRkc06BhYHc>


Actual results:
Currently systemd will create new device units for any new paths appearing in devlinks in a UDEV change event, but will not remove previously-created device units for paths which are no longer present in devlinks. This leaves device units for non-existent paths hanging around until the parent device is removed.

Further in RHEL-8 systemd if a daemon-reload is issued before the parent device unit is removed, the stale devlinks units lose their association with the parent device (through matching syspath) during device enumeration because the paths no longer exist, leaving them hanging around for ever (until rebooting).

Expected results:
Systemd should remove stale devlinks device units while dispatching UDEV "change" events for the parent device (e.g. when renaming an LVM Logical Volume).


Additional info:

Impacts RHEL-8 the most as systemctl enumerates device units when running - causing ever increasing use of memory. 

Similar on RHEL-9 but can be cleared by daemon-reload. 

relevant PRs for the fixes upstream are
https://github.com/systemd/systemd/pull/16968, https://github.com/systemd/systemd/pull/24522
and https://github.com/systemd/systemd-stable/pull/203

Comment 1 David Tardon 2022-09-05 09:13:42 UTC
I'd expect such device renames and removals should be pretty rare in production, hence the impact of this should be low. Code-wise, the fix hasn't even been merged upstream yet (although it seems to be on the track). I think we could consider this for RHEL-9 (later, after v. 252 with the fix has been released and has got some real-life testing), but backport to RHEL-8 would be non-trivial and IMHO not worth the risk.

Comment 5 errata-xmlrpc 2023-05-09 08:21:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (systemd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2531