Bug 2148170
| Summary: | Failed services consume units, causing systemd limitations on max number of units to be reached | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> |
| Component: | systemd | Assignee: | systemd maint <systemd-maint> |
| Status: | NEW --- | QA Contact: | Frantisek Sumsal <fsumsal> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.7 | CC: | dtardon, sbroz, systemd-maint-list |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Using the simple reproducer below, we can see systemd taking more and more CPU when spawning transient services: # i=1; while [ $i -lt 100000 ]; do systemd-run /bin/false; let i++; done Initially, we see 85 services spawned per second and ~56% CPU. Later this drops to 30 services per second and ~80% CPU. Then finally drops to 15 services per second. I then stopped because it was too slow. # journalctl -b | grep "Main process exited" > errors
# awk '{ print $3 }' errors | uniq -c
108 14:49:26
108 14:49:27
102 14:49:28
100 14:49:29
91 14:49:30
102 14:49:31
102 14:49:32
101 14:49:33
97 14:49:34
96 14:49:35
90 14:49:36
90 14:49:37
96 14:49:38
87 14:49:39
89 14:49:40
91 14:49:41
92 14:49:42
94 14:49:43
92 14:49:44
92 14:49:45
:
64 14:50:59
61 14:51:00
64 14:51:01
63 14:51:02
64 14:51:03
64 14:51:04
62 14:51:05
63 14:51:06
63 14:51:07
57 14:51:08
60 14:51:09
:
43 14:54:07
43 14:54:08
44 14:54:09
43 14:54:10
42 14:54:11
44 14:54:12
:
The "Argument list too long" comes from this code:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
182 int unit_add_name(Unit *u, const char *text) {
:
235 if (hashmap_size(u->manager->units) >= MANAGER_MAX_NAMES)
236 return -E2BIG;
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Trying to hit the Power Button to stop the system (QEMU/KVM), this fails as well: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Nov 24 17:11:15 vm-fakeauth8 qemu-ga[884]: info: guest-shutdown called, mode: powerdown Nov 24 17:11:15 vm-fakeauth8 systemd-logind[903]: Creating /run/nologin, blocking further logins... Nov 24 17:11:15 vm-fakeauth8 systemd-logind[903]: Failed to get load state of poweroff.target: Unknown object '/org/freedesktop/systemd1/unit/poweroff_2etarget'. Nov 24 17:11:15 vm-fakeauth8 systemd-logind[903]: Scheduled shutdown to poweroff.target failed: Invalid request descriptor -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- We talked about this in upstream meeting, and there is an existing solution https://www.freedesktop.org/software/systemd/man/systemd.unit.html#CollectMode= CollectMode=inactive-or-failed should fix all their problems. (In reply to Lukáš Nykrýn from comment #5) > We talked about this in upstream meeting, and there is an existing solution > > https://www.freedesktop.org/software/systemd/man/systemd.unit. > html#CollectMode= > > CollectMode=inactive-or-failed should fix all their problems. Also, this option should be available all the way back to RHEL 7.9, since it was backported in https://bugzilla.redhat.com/show_bug.cgi?id=1817576. Adding "insights?". Maybe we should have a check for a lot of failed template units and suggest adding CollectMode=inactive-or-failed to the unit file. |
Description of problem: When a transient service is failing, it continues consuming a unit until "reset-failed" is issued. This is problematic when reaching the maximum number of units (hardcoded to 128K in the sources): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 21 #define MANAGER_MAX_NAMES 131072 /* 128K */ -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Indeed, once limit is reached, many problems appear, including: 1. socket units die when getting triggered by incoming traffic -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- systemd[1]: fakeauth.socket: Failed to listen on sockets: Argument list too long systemd[1]: fakeauth.socket: Failed with result 'resources'. systemd[1]: Failed to listen on fakeauth.socket. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- This prevents service handling completely. 2. mount are not registered in systemd (but they are still working) -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- systemd[1]: Failed to set up mount unit: Argument list too long -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 3. logins are not moved to expected user's cgroup -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- systemd-logind[905]: Failed to start session scope session-9.scope: Argument list too long -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 4. admin cannot reboot the system using "reboot/shutdown" command -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- systemd-initctl[368371]: Failed to change runlevel: Argument list too long -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Such limit can be "easily" reached when having a socket-triggered service which regularly dies. This happens in real life with "authd.socket" whose service fails if the remote end vanished before "authd@.service" could read the addr/port of remote end. Indeed, socket-triggered services of Stream type consume 2 units: - one as "<service@instance>" - one as "<service@instance-localaddr:localport-remoteaddr:remoteport>" Hence, it's sufficient to have 64K failures in the past (which can be other a full year for example) to "take down" the system. Additionally, failed services impact systemd's performance a lot: it appears that finding a slot in the "manager->units" hashmap takes more and more time, due to having these failed services consume buckets. This can be easily seen when spawning transient failing services in loop: initially it's fast, then slows down and we see systemd taking up to 80% of a CPU (or more). Version-Release number of selected component (if applicable): systemd-239-68.el8.x86_64 How reproducible: Always Steps to Reproduce: 1. Create a dummy socket service listening on TCP stream that always fails -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # cat /etc/systemd/system/fakeauth.socket [Socket] ListenStream=113 Accept=true # cat /etc/systemd/system/fakeauth@.service [Service] ExecStart=/bin/false StandardInput=socket -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 2. Start the service socket -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # systemctl start fakeauth.socket -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 3. Trigger the service in loop until 64K instances are failing -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # i=0; while [ $i -le 65535 ]; do ncat --send-only localhost 113 </dev/null; let i++; sleep 0.1; done -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Note: due to slowness with time, the socket may drop incoming requests, hence more than 65535 recursion is actually required. Actual results: After having ~65535 units on the system, "Argument list too long" is seen and "fakeauth.socket" unit dies. Nov 24 13:18:05 vm-fakeauth8 systemd[1]: fakeauth.socket: Failed to listen on sockets: Argument list too long Nov 24 13:18:05 vm-fakeauth8 systemd[1]: fakeauth.socket: Failed with result 'resources'. Nov 24 13:18:05 vm-fakeauth8 systemd[1]: Failed to listen on fakeauth.socket. Expected results: Socket doesn't die, reboot can be issue, ssh logins are not in "sshd.service" cgroup, etc. Additional info: "Argument list too long" is not a good errno at all. The errno should be handled by caller to explain what's going on more clearly.