Bug 1308780

Summary:

systemd Using 4GB RAM after 18 Days of Uptime

Product:

Red Hat Enterprise Linux 7

Reporter:

meridionaljet

Component:

systemd

Assignee:

systemd-maint

Status:

CLOSED CURRENTRELEASE

QA Contact:

qe-baseos-daemons

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

7.2

CC:

abenaiss, bblaskov, emilovanov, hmatsumo, info, jsynacek, lef, lnykryn, msekleta, roger.hughs, sshaurya, systemd-maint-list, vyacheslav.sarzhan

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

systemd-219-19.el7_2.20

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-01-25 13:38:06 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Output of "sudo journalctl" and "systemctl -t service"	none
Output of "systemd-analyze dump"	none
Output of atop during server high load	none
systemd-analyze_dump	none

Description meridionaljet 2016-02-16 03:42:43 UTC

Created attachment 1127487 [details]
Output of "sudo journalctl" and "systemctl -t service"

Description of problem:

On a live webserver running CentOS 7.2, the systemd (PID 1) process has a memory leak of about 200 MB per day, currently up to 3.7 GB of RAM usage after 18 days uptime. Reboot of the server is periodically required to free the memory.


Version-Release number of selected component (if applicable):

systemd version 219

How reproducible:

Reproducible on this particular server by simply rebooting and watching RAM usage grow over time.

Actual results:

RAM usage of the PID 1 process increases by ~200 MB per day.

Expected results:

RAM usage should not increase.

Additional info:

The only heavy activity in the logs shown by "sudo journalctl" is related to numerous rsync SSH connections made by another production server. I've attached a sample of the journal log with real hostnames and IP addresses redacted. I've also attached the output of "systemctl -t service".

Comment 2 Lukáš Nykrýn 2016-02-16 08:50:02 UTC

Sounds like https://github.com/systemd/systemd/issues/1961

Comment 3 meridionaljet 2016-02-16 13:41:52 UTC

(In reply to Lukáš Nykrýn from comment #2)
> Sounds like https://github.com/systemd/systemd/issues/1961

Well not quite. The CPU is not spiked to 100% here, and running "systemctl list-unit-files" only results in ~60 session-*.scope* units. I also see no logind failures in "sudo journalctl -b -u systemd-logind" as in that issue.

There are 86 scope files and associated directories in /run/systemd/system/ on this server, which amount to ~20MB of disk space. I am seeing a lot of these files are up to 6 days old. Is this normal? The server has been up for 19 days so if this was the source of the leak I would have expected to see orphaned files as old as 19 days as well.

Comment 4 meridionaljet 2016-02-16 13:56:45 UTC

Here is some output from systemd-cgtop showing resource usage of each active control group. Note that the problem is only showing up in the "root" path.

>Path                                                                          Tasks   %CPU   Memory  Input/s Output/s
>
>                                                                               296   30.5    11.3G   657.8K   893.0K
>system.sliceNetworkManager.service                                              1      -        -        -        -
>system.sliceauditd.service                                                      1      -        -        -        -
>system.slicecrond.service                                                       1      -        -        -        -
>system.slicedbus.service                                                        1      -        -        -        -
>system.sliceirqbalance.service                                                  1      -        -        -        -
>system.slicelvm2-lvmetad.service                                                1      -        -        -        -
>system.slicemariadb.service                                                     2      -        -        -        -
>system.slicenginx.service                                                      10      -        -        -        -
>system.slicephp-fpm.service                                                   101      -        -        -        -
>system.slicepolkit.service                                                      1      -        -        -        -
>system.slicepostfix.service                                                     3      -        -        -        -
>system.slicersyslog.service                                                     1      -        -        -        -
>system.slicesmartd.service                                                      1      -        -        -        -
>system.slicesshd.service                                                        2      -        -        -        -
>system.slicesystem-getty.slicegetty                               1      -        -        -        -
>system.slicesystemd-journald.service                                            1      -        -        -        -
>system.slicesystemd-logind.service                                              1      -        -        -        -
>system.slicesystemd-udevd.service                                               1      -        -        -        -
>system.slicetuned.service                                                       1      -        -        -        -
>system.slicewpa_supplicant.service                                              1      -        -        -        -
>user.slice/user-1000.slice/session-7170741.scope                                 4      -        -        -        -

Comment 5 Lukáš Nykrýn 2016-02-16 16:31:37 UTC

If you run systemctl daemon-reexec does it decrease the amount of allocated memory?
Can you also attach output of systemd-analyze dump?

Comment 6 meridionaljet 2016-02-16 17:29:11 UTC

Created attachment 1127642 [details]
Output of "systemd-analyze dump"

Comment 7 meridionaljet 2016-02-16 17:31:10 UTC

(In reply to Lukáš Nykrýn from comment #5)
> If you run systemctl daemon-reexec does it decrease the amount of allocated
> memory?
> Can you also attach output of systemd-analyze dump?

Running systemctl daemon-reexec does release all of the used RAM. The question is whether the leak will continue. It has persisted through reboots before. Does the result of this command provide any insight into the cause of the leak?

I've attached the output of "systemd-analyze dump" prior to issueing the daemon-reexec command.

Comment 8 meridionaljet 2016-02-16 17:47:46 UTC

Created attachment 1127656 [details]
Output of atop during server high load

Here's another example of abnormal behavior of systemd.

I've attached the output of atop during a period when the server was under high load due to production tasks. These tasks involve downloading, reading, and writing lots of data on the /home partition. However, a huge fraction of the disk I/O is taking place in the root partition (LVM centos-root on the left-hand side of the output), which should not be the case. This is coupled with atop showing systemd being responsible for the majority of the disk usage, presumably taking place in that root partition. This is another example of what seems like abnormal behavior.

Comment 10 Lukáš Nykrýn 2016-02-17 11:14:04 UTC

Would you be willing to try a test build? We found one memory-leak.

https://people.redhat.com/lnykryn/systemd/bz1308780/

Comment 11 meridionaljet 2016-02-17 15:10:36 UTC

(In reply to Lukáš Nykrýn from comment #10)
> Would you be willing to try a test build? We found one memory-leak.
> 
> https://people.redhat.com/lnykryn/systemd/bz1308780/

Well this is a live web server so I'm a little wary. Is this leak you found capable of leaking 200 MB/day as I have observed?

Comment 12 Lukáš Nykrýn 2016-02-17 15:26:09 UTC

I am sorry, but I don't know. I will try to find some artificial reproducer and try the fix myself.

Comment 13 info 2016-09-21 22:53:18 UTC

I am observing this memory leak on my ubuntu xenial server.  Willing to give you whatever information you want and try whatever you have to fix it.

Comment 14 Lukáš Nykrýn 2016-09-22 07:10:34 UTC

This problem should be fixed in systemd-219-30. If anyone is willing to try that, we have a repo with test builds here: https://copr.fedorainfracloud.org/coprs/lnykryn/systemd-rhel-staging/

Comment 15 Benjamin Lefoul 2016-11-06 18:36:17 UTC

(In reply to Lukáš Nykrýn from comment #14)
> This problem should be fixed in systemd-219-30. If anyone is willing to try
> that, we have a repo with test builds here:
> https://copr.fedorainfracloud.org/coprs/lnykryn/systemd-rhel-staging/

Hi Lukáš,

We are experiencing this problem on a production server. Does 219-30 fix it? I see its available in RHEL 7.3 beta.

Comment 16 Lukáš Nykrýn 2016-11-07 07:58:29 UTC

If I am not mistaken 7.3 should be out now. So you can try the latest version there.

Comment 17 Benjamin Lefoul 2016-11-07 09:46:04 UTC

(In reply to Lukáš Nykrýn from comment #16)
> If I am not mistaken 7.3 should be out now. So you can try the latest
> version there.

Indeed. I will give an update here as soon as it shows up in the CentOS repository.

Comment 18 SHAURYA 2016-11-28 13:09:35 UTC

Created attachment 1225281 [details]
systemd-analyze_dump