Bug 2155424

Summary: ListUnitFiles impacts systemd a lot, causing high CPU consumption and delay in operations
Product: Red Hat Enterprise Linux 9 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: systemd maint <systemd-maint>
Status: NEW --- QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: medium    
Version: 9.0CC: dtardon, fkrska, larutiun, sbalasub, smahanga, systemd-maint, systemd-maint-list, systemd-maint
Target Milestone: rcKeywords: Performance, Reproducer, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Renaud Métrich 2022-12-21 08:38:26 UTC
Description of problem:

This is a respin of BZ #1828759 ("systemctl list-unit-files" kills the system usability).

With RHEL8.7 / systemd-239-68.el8 (didn't check with older ones yet), we can see that the ListUnitFiles DBus operation is very costly.
When executing systemd under strace, we can observe that on a system having 145 units ("minimal installation" of RHEL8.7), the readlinkat/openat/getdents64 syscalls are called hundreds or even thousands of times on same files (see reproducer below), causing high CPU consumption.


Version-Release number of selected component (if applicable):

systemd-239-68.el8

How reproducible:

Always

Steps to Reproduce:
1. Execute the DBus ListUnitFiles operation while stracing systemd

    # strace -CttTvyy -s 128 -o list-unit-files.strace -p 1 &
    # time dbus-send --system --print-reply --reply-timeout=20000 --type=method_call --dest=org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager.ListUnitFiles
    # kill %1

2. Check number of syscalls

    # tail -20 list-unit-files.strace
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     30.08    0.277966           2    119048           readlinkat
     21.62    0.199787           2     79554     16869 openat
     21.01    0.194170           2     89605           getdents64
     10.02    0.092602           1     90426           fcntl
    ...

3. Check number of times readlinkat() is executed on a given path

    # grep " readlinkat(" list-unit-files.strace | cut -f2 -d'"' | sort | uniq -c | sort -k1 -rn > list-unit-files.strace.readlinkat
    # head list-unit-files.strace.readlinkat
    820 /etc/systemd/system/syslog.service
    819 /etc/systemd/system/multi-user.target.wants/remote-fs.target
    818 /etc/systemd/system/multi-user.target.wants/crond.service
    817 /etc/systemd/system/multi-user.target.wants/NetworkManager.service
    815 /etc/systemd/system/multi-user.target.wants/dnf-makecache.timer
    814 /etc/systemd/system/multi-user.target.wants/sssd.service
    ...

    # grep "NetworkManager.service" list-unit-files.strace.readlinkat
    817 /etc/systemd/system/multi-user.target.wants/NetworkManager.service
    793 /etc/systemd/system/dbus-org.freedesktop.NetworkManager.service

Actual results:

Slowness, CPU consumption

Expected results:

Files/Symlinks resolved once.