Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2148170

Summary: Failed services consume units, causing systemd limitations on max number of units to be reached
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: systemd maint <systemd-maint>
Status: CLOSED MIGRATED QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.7CC: dtardon, sbroz, systemd-maint-list
Target Milestone: rcKeywords: MigratedToJIRA, Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-21 12:19:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Renaud Métrich 2022-11-24 13:42:34 UTC
Description of problem:

When a transient service is failing, it continues consuming a unit until "reset-failed" is issued.
This is problematic when reaching the maximum number of units (hardcoded to 128K in the sources):

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
 21 #define MANAGER_MAX_NAMES 131072 /* 128K */
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Indeed, once limit is reached, many problems appear, including:

1. socket units die when getting triggered by incoming traffic

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    systemd[1]: fakeauth.socket: Failed to listen on sockets: Argument list too long
    systemd[1]: fakeauth.socket: Failed with result 'resources'.
    systemd[1]: Failed to listen on fakeauth.socket.
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

    This prevents service handling completely.

2. mount are not registered in systemd (but they are still working)

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    systemd[1]: Failed to set up mount unit: Argument list too long
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3. logins are not moved to expected user's cgroup

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    systemd-logind[905]: Failed to start session scope session-9.scope: Argument list too long
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

4. admin cannot reboot the system using "reboot/shutdown" command

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    systemd-initctl[368371]: Failed to change runlevel: Argument list too long
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------


Such limit can be "easily" reached when having a socket-triggered service which regularly dies.
This happens in real life with "authd.socket" whose service fails if the remote end vanished before "authd@.service" could read the addr/port of remote end.
Indeed, socket-triggered services of Stream type consume 2 units:
- one as "<service@instance>"
- one as "<service@instance-localaddr:localport-remoteaddr:remoteport>"

Hence, it's sufficient to have 64K failures in the past (which can be other a full year for example) to "take down" the system.

Additionally, failed services impact systemd's performance a lot: it appears that finding a slot in the "manager->units" hashmap takes more and more time, due to having these failed services consume buckets.
This can be easily seen when spawning transient failing services in loop: initially it's fast, then slows down and we see systemd taking up to 80% of a CPU (or more).

Version-Release number of selected component (if applicable):

systemd-239-68.el8.x86_64

How reproducible:

Always

Steps to Reproduce:

1. Create a dummy socket service listening on TCP stream that always fails

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    # cat /etc/systemd/system/fakeauth.socket 
    [Socket]
    ListenStream=113
    Accept=true
    
    # cat /etc/systemd/system/fakeauth@.service 
    [Service]
    ExecStart=/bin/false
    StandardInput=socket
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

2. Start the service socket

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    # systemctl start fakeauth.socket
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3. Trigger the service in loop until 64K instances are failing

    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
    # i=0; while [ $i -le 65535 ]; do ncat --send-only localhost 113 </dev/null; let i++; sleep 0.1; done
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

    Note: due to slowness with time, the socket may drop incoming requests, hence more than 65535 recursion is actually required.

Actual results:

After having ~65535 units on the system, "Argument list too long" is seen and "fakeauth.socket" unit dies.

Nov 24 13:18:05 vm-fakeauth8 systemd[1]: fakeauth.socket: Failed to listen on sockets: Argument list too long
Nov 24 13:18:05 vm-fakeauth8 systemd[1]: fakeauth.socket: Failed with result 'resources'.
Nov 24 13:18:05 vm-fakeauth8 systemd[1]: Failed to listen on fakeauth.socket.

Expected results:

Socket doesn't die, reboot can be issue, ssh logins are not in "sshd.service" cgroup, etc.

Additional info:

"Argument list too long" is not a good errno at all. The errno should be handled by caller to explain what's going on more clearly.

Comment 1 Renaud Métrich 2022-11-24 13:52:32 UTC
Using the simple reproducer below, we can see systemd taking more and more CPU when spawning transient services:

# i=1; while [ $i -lt 100000 ]; do systemd-run /bin/false; let i++; done

Initially, we see 85 services spawned per second and ~56% CPU.
Later this drops to 30 services per second and ~80% CPU.
Then finally drops to 15 services per second.

I then stopped because it was too slow.

Comment 2 Renaud Métrich 2022-11-24 13:55:59 UTC
# journalctl -b | grep "Main process exited" > errors

# awk '{ print $3 }' errors | uniq -c
    108 14:49:26
    108 14:49:27
    102 14:49:28
    100 14:49:29
     91 14:49:30
    102 14:49:31
    102 14:49:32
    101 14:49:33
     97 14:49:34
     96 14:49:35
     90 14:49:36
     90 14:49:37
     96 14:49:38
     87 14:49:39
     89 14:49:40
     91 14:49:41
     92 14:49:42
     94 14:49:43
     92 14:49:44
     92 14:49:45
     :
     64 14:50:59
     61 14:51:00
     64 14:51:01
     63 14:51:02
     64 14:51:03
     64 14:51:04
     62 14:51:05
     63 14:51:06
     63 14:51:07
     57 14:51:08
     60 14:51:09
     :
     43 14:54:07
     43 14:54:08
     44 14:54:09
     43 14:54:10
     42 14:54:11
     44 14:54:12
     :

Comment 3 Renaud Métrich 2022-11-24 16:02:33 UTC
The "Argument list too long" comes from this code:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
 182 int unit_add_name(Unit *u, const char *text) {
 :
 235         if (hashmap_size(u->manager->units) >= MANAGER_MAX_NAMES)
 236                 return -E2BIG;
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 4 Renaud Métrich 2022-11-24 16:13:35 UTC
Trying to hit the Power Button to stop the system (QEMU/KVM), this fails as well:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Nov 24 17:11:15 vm-fakeauth8 qemu-ga[884]: info: guest-shutdown called, mode: powerdown
Nov 24 17:11:15 vm-fakeauth8 systemd-logind[903]: Creating /run/nologin, blocking further logins...
Nov 24 17:11:15 vm-fakeauth8 systemd-logind[903]: Failed to get load state of poweroff.target: Unknown object '/org/freedesktop/systemd1/unit/poweroff_2etarget'.
Nov 24 17:11:15 vm-fakeauth8 systemd-logind[903]: Scheduled shutdown to poweroff.target failed: Invalid request descriptor
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 5 Lukáš Nykrýn 2022-11-30 12:31:21 UTC
We talked about this in upstream meeting, and there is an existing solution

https://www.freedesktop.org/software/systemd/man/systemd.unit.html#CollectMode=

CollectMode=inactive-or-failed should fix all their problems.

Comment 6 Frantisek Sumsal 2022-11-30 13:38:31 UTC
(In reply to Lukáš Nykrýn from comment #5)
> We talked about this in upstream meeting, and there is an existing solution
> 
> https://www.freedesktop.org/software/systemd/man/systemd.unit.
> html#CollectMode=
> 
> CollectMode=inactive-or-failed should fix all their problems.

Also, this option should be available all the way back to RHEL 7.9, since it was backported in https://bugzilla.redhat.com/show_bug.cgi?id=1817576.

Comment 8 Lukáš Nykrýn 2022-11-30 14:19:48 UTC
Adding "insights?".  Maybe we should have a check for a lot of failed template units and suggest adding CollectMode=inactive-or-failed  to the unit file.

Comment 10 RHEL Program Management 2023-09-21 12:15:56 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 11 RHEL Program Management 2023-09-21 12:19:40 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.