Bug 740058

Summary: systemd retains socket-activated service failure records forever leading to memory exhaustion
Product: Red Hat Enterprise Linux 7 Reporter: Denise Dumas <ddumas>
Component: systemdAssignee: systemd-maint
Status: CLOSED NOTABUG QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: azelinka, lpoetter, mschmidt, msekleta, ppisar
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 739538 Environment:
Last Closed: 2013-05-06 18:17:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 739538    
Bug Blocks: 802465, 816135, 952670    

Description Denise Dumas 2011-09-20 19:32:39 UTC
+++ This bug was initially created as a clone of Bug #739538 +++

New init `systemd' in Fedora distribution provides inetd-like functionality. I.e. systemd (PID=1) listens on a socket:

# systemctl enable cvs.socket
# systemctl --all --full |grep cvs
cvs             loaded inactive dead          CVS Server
cvs.socket                loaded active   listening     CVS Server Activation Socket

and when a client connects, it will spawn a network service (server) connected to the socket through standard input and output.

# systemctl --all --full |grep cvs
cvs@::1:2401-::1:60544.service loaded active   running       CVS Server
cvs.socket                loaded active   listening     CVS Server Activation Socket

If server process exits with non-zero code (e.g. client violated server protocol), systemd keeps details about this failure (available through `systemctl --all --full' command):

# systemctl --all --full |grep cvs
cvs@::1:2401-::1:60544.service loaded failed   failed        CVS Server
cvs.socket                loaded active   listening     CVS Server Activation Socket

The problem is the failure records are stored indefinitely:

# systemctl --all --full |grep cvs
cvs@::1:2401-::1:60543.service loaded failed   failed        CVS Server
cvs@::1:2401-::1:60544.service loaded failed   failed        CVS Server
cvs@::1:2401-::1:60545.service loaded failed   failed        CVS Server
cvs@::1:2401-::1:60546.service loaded failed   failed        CVS Server
cvs@::1:2401-::1:60547.service loaded failed   failed        CVS Server
cvs@::1:2401-::1:60548.service loaded failed   failed        CVS Server
cvs@::1:2401-::1:60549.service loaded failed   failed        CVS Server
cvs@::1:2401-::1:60550.service loaded failed   failed        CVS Server
cvs@::1:2401-::1:60551.service loaded failed   failed        CVS Server
cvs@::1:2401-::1:60552.service loaded failed   failed        CVS Server
cvs.socket                loaded active   listening     CVS Server Activation Socket

# pidof cvs; echo $?
1

and each record costs memory:

# ps u -p1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.9  2.4  57408 24352 ?        Ss   14:26   0:08 /sbin/init

# for I in $(seq 1 $((2**10))); do echo "foo" >/dev/tcp/localhost/2401; done
# systemctl --all --full |grep -c cvs
1036
# ps u -p1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  4.2  3.4  68000 34480 ?        Ss   14:26   0:41 /sbin/init

This increases memory usage about 10 KB/record.

Provided a network service is designed to be remotely accessible, this exhibits remote DOS vulnerability.

Tested on Fedora 17 with systemd-35-1.fc16.x86_64 and cvs-1.11.23-22.fc17.x86_64. As a lot of services are being migrated to systemd in Fedora 16 which stable release is close, I consider this issue becomes general available soon.

--- Additional comment from lpoetter on 2011-09-19 13:19:59 EDT ---

To make systemd forget about the failure state of a service, use ExecStart=-/foo/bar, i.e. add the "-" in there.

--- Additional comment from ppisar on 2011-09-20 03:52:50 EDT ---

This is work-around specific for the service configuration. However this is generic problem.

How can I can I forget job statuses for already exited serviced?

Howe can I limit size of job status log (systemctl --all), if I have socket services without the "-".

This is vulnerability in systemd as such. If you want close this bug, fix it on systemd level (e.g. by making "-" in socket-services implicit) before.

Comment 4 Lennart Poettering 2013-05-06 18:17:14 UTC
There isn't anything to fix here... People should use ExecStart=-/bin/false rather than ExecStart=/bin/false if they want the units to be cleaned up automatically. And if they don't want to have them cleaned automatically they should do so manually via "systemctl reset-failed".

This has been this way since about always, hence closing, since there's no bug to fix.