1308780 – systemd Using 4GB RAM after 18 Days of Uptime

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1308780 - systemd Using 4GB RAM after 18 Days of Uptime

Summary: systemd Using 4GB RAM after 18 Days of Uptime

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	7.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	systemd-maint
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-02-16 03:42 UTC by meridionaljet
Modified:	2022-03-13 14:00 UTC (History)
CC List:	13 users (show)
Fixed In Version:	systemd-219-19.el7_2.20
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-25 13:38:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Output of "sudo journalctl" and "systemctl -t service" (10.12 KB, text/plain) 2016-02-16 03:42 UTC, meridionaljet	no flags	Details
Output of "systemd-analyze dump" (622.74 KB, text/plain) 2016-02-16 17:29 UTC, meridionaljet	no flags	Details
Output of atop during server high load (11.21 KB, text/plain) 2016-02-16 17:47 UTC, meridionaljet	no flags	Details
systemd-analyze_dump (1.34 MB, text/plain) 2016-11-28 13:09 UTC, SHAURYA	no flags	Details
View All

Description meridionaljet 2016-02-16 03:42:43 UTC

Created attachment 1127487 [details]
Output of "sudo journalctl" and "systemctl -t service"

Description of problem:

On a live webserver running CentOS 7.2, the systemd (PID 1) process has a memory leak of about 200 MB per day, currently up to 3.7 GB of RAM usage after 18 days uptime. Reboot of the server is periodically required to free the memory.


Version-Release number of selected component (if applicable):

systemd version 219

How reproducible:

Reproducible on this particular server by simply rebooting and watching RAM usage grow over time.

Actual results:

RAM usage of the PID 1 process increases by ~200 MB per day.

Expected results:

RAM usage should not increase.

Additional info:

The only heavy activity in the logs shown by "sudo journalctl" is related to numerous rsync SSH connections made by another production server. I've attached a sample of the journal log with real hostnames and IP addresses redacted. I've also attached the output of "systemctl -t service".

Comment 2 Lukáš Nykrýn 2016-02-16 08:50:02 UTC

Sounds like https://github.com/systemd/systemd/issues/1961

Comment 3 meridionaljet 2016-02-16 13:41:52 UTC

(In reply to Lukáš Nykrýn from comment #2)
> Sounds like https://github.com/systemd/systemd/issues/1961

Well not quite. The CPU is not spiked to 100% here, and running "systemctl list-unit-files" only results in ~60 session-*.scope* units. I also see no logind failures in "sudo journalctl -b -u systemd-logind" as in that issue.

There are 86 scope files and associated directories in /run/systemd/system/ on this server, which amount to ~20MB of disk space. I am seeing a lot of these files are up to 6 days old. Is this normal? The server has been up for 19 days so if this was the source of the leak I would have expected to see orphaned files as old as 19 days as well.

Comment 4 meridionaljet 2016-02-16 13:56:45 UTC

Here is some output from systemd-cgtop showing resource usage of each active control group. Note that the problem is only showing up in the "root" path.

>Path                                                                          Tasks   %CPU   Memory  Input/s Output/s
>
>                                                                               296   30.5    11.3G   657.8K   893.0K
>system.sliceNetworkManager.service                                              1      -        -        -        -
>system.sliceauditd.service                                                      1      -        -        -        -
>system.slicecrond.service                                                       1      -        -        -        -
>system.slicedbus.service                                                        1      -        -        -        -
>system.sliceirqbalance.service                                                  1      -        -        -        -
>system.slicelvm2-lvmetad.service                                                1      -        -        -        -
>system.slicemariadb.service                                                     2      -        -        -        -
>system.slicenginx.service                                                      10      -        -        -        -
>system.slicephp-fpm.service                                                   101      -        -        -        -
>system.slicepolkit.service                                                      1      -        -        -        -
>system.slicepostfix.service                                                     3      -        -        -        -
>system.slicersyslog.service                                                     1      -        -        -        -
>system.slicesmartd.service                                                      1      -        -        -        -
>system.slicesshd.service                                                        2      -        -        -        -
>system.slicesystem-getty.slicegetty                               1      -        -        -        -
>system.slicesystemd-journald.service                                            1      -        -        -        -
>system.slicesystemd-logind.service                                              1      -        -        -        -
>system.slicesystemd-udevd.service                                               1      -        -        -        -
>system.slicetuned.service                                                       1      -        -        -        -
>system.slicewpa_supplicant.service                                              1      -        -        -        -
>user.slice/user-1000.slice/session-7170741.scope                                 4      -        -        -        -

Comment 5 Lukáš Nykrýn 2016-02-16 16:31:37 UTC

If you run systemctl daemon-reexec does it decrease the amount of allocated memory?
Can you also attach output of systemd-analyze dump?

Comment 6 meridionaljet 2016-02-16 17:29:11 UTC

Created attachment 1127642 [details]
Output of "systemd-analyze dump"

Comment 7 meridionaljet 2016-02-16 17:31:10 UTC

(In reply to Lukáš Nykrýn from comment #5)
> If you run systemctl daemon-reexec does it decrease the amount of allocated
> memory?
> Can you also attach output of systemd-analyze dump?

Running systemctl daemon-reexec does release all of the used RAM. The question is whether the leak will continue. It has persisted through reboots before. Does the result of this command provide any insight into the cause of the leak?

I've attached the output of "systemd-analyze dump" prior to issueing the daemon-reexec command.

Comment 8 meridionaljet 2016-02-16 17:47:46 UTC

Created attachment 1127656 [details]
Output of atop during server high load

Here's another example of abnormal behavior of systemd.

I've attached the output of atop during a period when the server was under high load due to production tasks. These tasks involve downloading, reading, and writing lots of data on the /home partition. However, a huge fraction of the disk I/O is taking place in the root partition (LVM centos-root on the left-hand side of the output), which should not be the case. This is coupled with atop showing systemd being responsible for the majority of the disk usage, presumably taking place in that root partition. This is another example of what seems like abnormal behavior.

Comment 10 Lukáš Nykrýn 2016-02-17 11:14:04 UTC

Would you be willing to try a test build? We found one memory-leak.

https://people.redhat.com/lnykryn/systemd/bz1308780/

Comment 11 meridionaljet 2016-02-17 15:10:36 UTC

(In reply to Lukáš Nykrýn from comment #10)
> Would you be willing to try a test build? We found one memory-leak.
> 
> https://people.redhat.com/lnykryn/systemd/bz1308780/

Well this is a live web server so I'm a little wary. Is this leak you found capable of leaking 200 MB/day as I have observed?

Comment 12 Lukáš Nykrýn 2016-02-17 15:26:09 UTC

I am sorry, but I don't know. I will try to find some artificial reproducer and try the fix myself.

Comment 13 info 2016-09-21 22:53:18 UTC

I am observing this memory leak on my ubuntu xenial server.  Willing to give you whatever information you want and try whatever you have to fix it.

Comment 14 Lukáš Nykrýn 2016-09-22 07:10:34 UTC

This problem should be fixed in systemd-219-30. If anyone is willing to try that, we have a repo with test builds here: https://copr.fedorainfracloud.org/coprs/lnykryn/systemd-rhel-staging/

Comment 15 Benjamin Lefoul 2016-11-06 18:36:17 UTC

(In reply to Lukáš Nykrýn from comment #14)
> This problem should be fixed in systemd-219-30. If anyone is willing to try
> that, we have a repo with test builds here:
> https://copr.fedorainfracloud.org/coprs/lnykryn/systemd-rhel-staging/

Hi Lukáš,

We are experiencing this problem on a production server. Does 219-30 fix it? I see its available in RHEL 7.3 beta.

Comment 16 Lukáš Nykrýn 2016-11-07 07:58:29 UTC

If I am not mistaken 7.3 should be out now. So you can try the latest version there.

Comment 17 Benjamin Lefoul 2016-11-07 09:46:04 UTC

(In reply to Lukáš Nykrýn from comment #16)
> If I am not mistaken 7.3 should be out now. So you can try the latest
> version there.

Indeed. I will give an update here as soon as it shows up in the CentOS repository.

Comment 18 SHAURYA 2016-11-28 13:09:35 UTC

Created attachment 1225281 [details]
systemd-analyze_dump

Note You need to log in before you can comment on or make changes to this bug.