Hide Forgot
Description of problem: One of our customers had a misconfigured instance unit: the ExecStart declaration lacked the "-" prefix. All was fine until the unit started failing. At one point, systemd was keeping track of 14,000 failed instances of the unit. In this situation, systemd was extremely unresponsive. It was constantly pegging the CPU and the vast majority of the time, all systemctl commands were erroring out with "Connection timed out". Eventually the issue was discovered and cleared up by getting a `systemctl reset-failed` command to succeed. Obviously instance units should use "-/usr/bin/somecommand" for their Exec.* directives; however, this brings to light a concern with systemd. Is there a heretofore hidden bug in the way systemd handles units ... something which can be optimized or fixed? Or are there simply limits to the number of units systemd can manage? If the latter, can engineering please provide some guidance on this? (NOTE: I'm not sure if this issue is specific to "failed" units; I suspect it would apply equally to large numbers of non-failed units, though I haven't tested this via a generator or something else. Also haven't looked at the code.) Version-Release number of selected component (if applicable): Initially discovered on systemd-208-20.el7_1.5.x86_64 Experienced on latest (systemd-219-19.el7.x86_64) How reproducible: I've had a hard time reproducing this reliably. That is to say: I can't nail it down to something specific like "with 14,398 failed units, systemd starts failing" or even "on an otherwise idle 1 CPU system w/ 512 MiB RAM, this issue always shows up around 10,000 failed units". That said, on an otherwise idle 1 CPU system w/ 512 MiB RAM, running a RHEL 7.2 base (non-gui) server install with the latest bits from the CDN (as of right now), I start to see high CPU usage immediately after adding a few thousand failed units and systemctl commands start slowing down as well. I usually see my first systemctl timeouts somewhere in the 40k-60k range. Steps to Reproduce: Generally speaking: 1. Generate tons of [failed?] units 2. Notice that PID1 pegs CPU 3. Notice that systemctl commands take a long time or timeout 4. Use a while/until loop to execute `systemctl reset-failed` and all is well again Specifically: 1. curl -O http://people.redhat.com/rsawhill/sysd-failtester.sh 2. bash sysd-failtester.sh # Runs initial setup 3. bash sysd-failtester.sh # Loop-creates failed instances 4. Wait (loop breaks when first systemctl command fails) 5. Notice that even after all sockets are closed PID1 still pegs CPU Take a look at reproducer script in action: 1. https://paste.fedoraproject.org/314725/ Cheers.
Looks like I have a correction to make. With: > Experienced on latest (systemd-219-19.el7.x86_64) Regarding this comment: > Notice that even after all sockets are closed PID1 still pegs CPU I've noticed that this particular point is no longer the case with systemd-219-19.el7 in RHEL 7.2 -- my reproducer script led me to believe this was the case because PID1 spends a lot of CPU handling all the instance units & their sockets (the mechanism I used to generate tons of units). A little while after running my reproducer script the systemd CPU usage settles back down to nothing. Furthermore, while systemctl commands certainly take considerably longer than normal, they do not fail. Going back and testing RHEL 7.1 now.
> Furthermore, while systemctl commands certainly take considerably longer than normal, they do not fail. * They do not fail after systemd CPU usage settles back down to a normal level.
I tested this again today on the latest systemd available (219-19.el7_2.7) and I wasn't able to clearly reproduce it. Of course systemctl still starts slowing down when there are thousands and thousands of units, but it's not nearly as dramatic as it was with systemd pre-RHEL7.2. For the record: I ran into other problems eventually (where systemd-logind and tons of other things on the system started complaining "Argument list too long") but that was after such a crazy-high number of failed units that I don't think we need to look into it. That said, it sure would be nice if the systemd project could put forth some official guidance for this kind of stuff. Or perhaps configure systemd to automatically trigger reset-failed when things get past a certain limit.
PS: The "Argument list too long" stuff starts happening after 65,500 connections are made to the sysd-failtester.socket in that reproducer script posted earlier (i.e., after 65k failed units were present).
I tried to reproduce this issue on both RHEL 7.2 and RHEL 7.5 with following results: Note: CPU usage settles at 0% after a few seconds after finishing each reproducer. Specs: 1 CPU system with 2 GB RAM ## Reproducer 1 Spawner script: # for i in {1..25000}; do systemd-run --remain --unit "test-$i" /bin/false; done RHEL 7.2 (systemd-219-19.el7.x86_64) ------------------------------------ Before reproducer: # systemctl --all | wc -l 185 # time systemctl status > /dev/null real 0m0.013s user 0m0.001s sys 0m0.011s After reproducer: # systemctl --all | wc -l 25186 # time systemctl status > /dev/null real 0m0.648s user 0m0.166s sys 0m0.476s RHEL 7.5 (systemd-219-57.el7.x86_64) ------------------------------------ Before reproducer: # systemctl --all | wc -l 201 # time systemctl status > /dev/null real 0m0.008s user 0m0.001s sys 0m0.005s After reproducer: # systemctl --all | wc -l 25202 # time systemctl status > /dev/null real 0m0.920s user 0m0.202s sys 0m0.710s ## Reproducer 2 (see comment 0) Script: curl -o sysd-failtester.sh http://people.redhat.com/rsawhill/sysd-failtester.sh Note: reproducer was manually stopped at 25000 units RHEL 7.2 (systemd-219-19.el7.x86_64) ------------------------------------ Before reproducer: # systemctl --all | wc -l 185 # time systemctl status > /dev/null real 0m0.014s user 0m0.003s sys 0m0.009s After reproducer: # systemctl --all | wc -l 25187 # time systemctl show > /dev/null real 0m0.003s user 0m0.002s sys 0m0.000s RHEL 7.5 (systemd-219-57.el7.x86_64) ------------------------------------ Before reproducer: # systemctl -all | wc -l 202 # time systemctl show > /dev/null real 0m0.004s user 0m0.002s sys 0m0.001s After reproducer: # systemctl -all | wc -l 25337 # time systemctl show > /dev/null real 0m0.003s user 0m0.001s sys 0m0.001s
Closing based on comment 8 and comment 9.