1301810 – systemd performance degradation with thousands of units (systemctl times out; pid1 high CPU usage when should be idle)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1301810 - systemd performance degradation with thousands of units (systemctl times out; pid1 high CPU usage when should be idle)

Summary: systemd performance degradation with thousands of units (systemctl times out;...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	systemd-maint
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1203710 1298243 1398314 1420851 1451294
TreeView+	depends on / blocked

Reported:	2016-01-26 02:47 UTC by Ryan Sawhill
Modified:	2021-12-10 14:35 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-02 09:52:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Ryan Sawhill 2016-01-26 02:47:29 UTC

Description of problem:

One of our customers had a misconfigured instance unit: the ExecStart declaration lacked the "-" prefix. All was fine until the unit started failing. At one point, systemd was keeping track of 14,000 failed instances of the unit.

In this situation, systemd was extremely unresponsive. It was constantly pegging the CPU and the vast majority of the time, all systemctl commands were erroring out with "Connection timed out".

Eventually the issue was discovered and cleared up by getting a `systemctl reset-failed` command to succeed.

Obviously instance units should use "-/usr/bin/somecommand" for their Exec.* directives; however, this brings to light a concern with systemd.

Is there a heretofore hidden bug in the way systemd handles units ... something which can be optimized or fixed? Or are there simply limits to the number of units systemd can manage? If the latter, can engineering please provide some guidance on this?

(NOTE: I'm not sure if this issue is specific to "failed" units; I suspect it would apply equally to large numbers of non-failed units, though I haven't tested this via a generator or something else. Also haven't looked at the code.)

Version-Release number of selected component (if applicable):

Initially discovered on systemd-208-20.el7_1.5.x86_64
Experienced on latest (systemd-219-19.el7.x86_64)

How reproducible:

I've had a hard time reproducing this reliably. That is to say: I can't nail it down to something specific like "with 14,398 failed units, systemd starts failing" or even "on an otherwise idle 1 CPU system w/ 512 MiB RAM, this issue always shows up around 10,000 failed units".

That said, on an otherwise idle 1 CPU system w/ 512 MiB RAM, running a RHEL 7.2 base (non-gui) server install with the latest bits from the CDN (as of right now), I start to see high CPU usage immediately after adding a few thousand failed units and systemctl commands start slowing down as well. I usually see my first systemctl timeouts somewhere in the 40k-60k range.

Steps to Reproduce:

Generally speaking:
1. Generate tons of [failed?] units
2. Notice that PID1 pegs CPU
3. Notice that systemctl commands take a long time or timeout
4. Use a while/until loop to execute `systemctl reset-failed` and all is well again

Specifically:
1. curl -O http://people.redhat.com/rsawhill/sysd-failtester.sh
2. bash sysd-failtester.sh # Runs initial setup
3. bash sysd-failtester.sh # Loop-creates failed instances
4. Wait (loop breaks when first systemctl command fails)
5. Notice that even after all sockets are closed PID1 still pegs CPU

Take a look at reproducer script in action:
1. https://paste.fedoraproject.org/314725/

Cheers.

Comment 1 Ryan Sawhill 2016-01-26 03:09:00 UTC

Looks like I have a correction to make.

With:

  > Experienced on latest (systemd-219-19.el7.x86_64)

Regarding this comment:

  > Notice that even after all sockets are closed PID1 still pegs CPU

I've noticed that this particular point is no longer the case with systemd-219-19.el7 in RHEL 7.2 -- my reproducer script led me to believe this was the case because PID1 spends a lot of CPU handling all the instance units & their sockets (the mechanism I used to generate tons of units). A little while after running my reproducer script the systemd CPU usage settles back down to nothing. 

Furthermore, while systemctl commands certainly take considerably longer than normal, they do not fail. Going back and testing RHEL 7.1 now.

Comment 2 Ryan Sawhill 2016-01-26 03:11:13 UTC

> Furthermore, while systemctl commands certainly take considerably longer than normal, they do not fail.

   * They do not fail after systemd CPU usage settles back down to a normal level.

Comment 3 Ryan Sawhill 2016-04-19 18:55:50 UTC

I tested this again today on the latest systemd available (219-19.el7_2.7) and I wasn't able to clearly reproduce it. Of course systemctl still starts slowing down when there are thousands and thousands of units, but it's not nearly as dramatic as it was with systemd pre-RHEL7.2.

For the record: I ran into other problems eventually (where systemd-logind and tons of other things on the system started complaining "Argument list too long") but that was after such a crazy-high number of failed units that I don't think we need to look into it.

That said, it sure would be nice if the systemd project could put forth some official guidance for this kind of stuff. Or perhaps configure systemd to automatically trigger reset-failed when things get past a certain limit.

Comment 4 Ryan Sawhill 2016-04-19 19:03:18 UTC

PS: The "Argument list too long" stuff starts happening after 65,500 connections are made to the sysd-failtester.socket in that reproducer script posted earlier (i.e., after 65k failed units were present).

Comment 8 Frantisek Sumsal 2018-03-06 13:26:22 UTC

I tried to reproduce this issue on both RHEL 7.2 and RHEL 7.5 with following results:

Note: CPU usage settles at 0% after a few seconds after finishing each reproducer.
Specs: 1 CPU system with 2 GB RAM

## Reproducer 1

Spawner script:
# for i in {1..25000}; do systemd-run --remain --unit "test-$i" /bin/false; done

RHEL 7.2 (systemd-219-19.el7.x86_64)
------------------------------------
Before reproducer:
# systemctl --all | wc -l
185
# time systemctl status > /dev/null

real	0m0.013s
user	0m0.001s
sys	0m0.011s

After reproducer:
# systemctl --all | wc -l
25186
# time systemctl status > /dev/null

real	0m0.648s
user	0m0.166s
sys	0m0.476s

RHEL 7.5 (systemd-219-57.el7.x86_64)
------------------------------------
Before reproducer:
# systemctl --all | wc -l
201
# time systemctl status > /dev/null

real	0m0.008s
user	0m0.001s
sys	0m0.005s

After reproducer:
# systemctl --all | wc -l
25202
# time systemctl status > /dev/null

real	0m0.920s
user	0m0.202s
sys	0m0.710s


## Reproducer 2 (see comment 0)

Script:
curl -o sysd-failtester.sh http://people.redhat.com/rsawhill/sysd-failtester.sh

Note:
reproducer was manually stopped at 25000 units

RHEL 7.2 (systemd-219-19.el7.x86_64)
------------------------------------
Before reproducer:
# systemctl --all | wc -l
185
# time systemctl status > /dev/null

real	0m0.014s
user	0m0.003s
sys	0m0.009s

After reproducer:
# systemctl --all | wc -l
25187
# time systemctl show > /dev/null

real	0m0.003s
user	0m0.002s
sys	0m0.000s

RHEL 7.5 (systemd-219-57.el7.x86_64)
------------------------------------
Before reproducer:
# systemctl -all | wc -l
202
# time systemctl show > /dev/null

real	0m0.004s
user	0m0.002s
sys	0m0.001s

After reproducer:
# systemctl -all | wc -l
25337
# time systemctl show > /dev/null

real	0m0.003s
user	0m0.001s
sys	0m0.001s

Comment 10 David Tardon 2020-01-02 09:52:04 UTC

Closing based on comment 8 and comment 9.

Note You need to log in before you can comment on or make changes to this bug.