Bug 2115396

Summary: systemd leaks pipes when executing Type=exec services
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: David Tardon <dtardon>
Status: CLOSED ERRATA QA Contact: Frantisek Sumsal <fsumsal>
Severity: high Docs Contact:
Priority: high    
Version: 8.6CC: dtardon, jamacku, jbreitwe, pdwyer, peter.vreman, systemd-maint-list
Target Milestone: rcKeywords: Bugfix, Reproducer, Triaged, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-239-65.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2116892 (view as bug list) Environment:
Last Closed: 2022-11-08 10:49:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2116892    

Description Renaud Métrich 2022-08-04 15:08:56 UTC
Description of problem:

This is a regression compared to latest RHEL8.4 systemd.
When systemd starts services configured with Type=exec (e.g. PCP services), a leak of pipes occurs.
Upon letting the system up for some weeks, we can then observe AVC related to sys_admin/sys_resource (not for systemd but all other services executing as root) because all user pipes buffers have been exhausted on the system, e.g.:
~~~
... mandb system_u:system_r:mandb_t:s0 22 capability sys_resource system_u:system_r:mandb_t:s0 denied 938702
... mandb system_u:system_r:mandb_t:s0 22 capability sys_admin system_u:system_r:mandb_t:s0 denied 938702
~~~

Sosreport shows the leak through checking lsof output:
~~~
$ grep -w ^systemd lsof | grep -c pipe
5397
~~~

Version-Release number of selected component (if applicable):

systemd-239-58.el8_6.3.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Craft a dummy Type=exec service (/etc/systemd/system/exec.service )

~~~
[Service]
Type=exec
ExecStart=/bin/true
~~~

2. Start the service multiple times

Actual results:

systemd never closes the pipes used to while the service (to detect child execution) even though after the service exits

~~~
# ls -l /proc/1/fd | grep pipe
r-x------. 1 root root 64 Aug  4 16:36 59 -> pipe:[21373]

# systemctl start exec
# ls -l /proc/1/fd | grep pipe
lr-x------. 1 root root 64 Aug  4 16:33 27 -> pipe:[68873]
lr-x------. 1 root root 64 Aug  4 16:36 59 -> pipe:[21373]

# systemctl start exec
# ls -l /proc/1/fd | grep pipe
lr-x------. 1 root root 64 Aug  4 16:33 27 -> pipe:[68873]
lr-x------. 1 root root 64 Aug  4 16:33 28 -> pipe:[69107]
lr-x------. 1 root root 64 Aug  4 16:36 59 -> pipe:[21373]

# systemctl start exec
# ls -l /proc/1/fd | grep pipe
lr-x------. 1 root root 64 Aug  4 16:33 27 -> pipe:[68873]
lr-x------. 1 root root 64 Aug  4 16:33 28 -> pipe:[69107]
lr-x------. 1 root root 64 Aug  4 16:33 29 -> pipe:[69345]
lr-x------. 1 root root 64 Aug  4 16:36 59 -> pipe:[21373]

...
~~~

Expected results:

No leak of pipes

Additional info:

Using strace, we can see the pipe FD is removed from monitoring but never closed:
~~~
17:01:31.940143 epoll_ctl(4<anon_inode:[eventpoll]>, EPOLL_CTL_DEL, 27<pipe:[68873]>, NULL) = 0 <0.000005>
...
17:01:36.164493 epoll_ctl(4<anon_inode:[eventpoll]>, EPOLL_CTL_DEL, 28<pipe:[69107]>, NULL) = 0 <0.000006>
...
17:01:40.233473 epoll_ctl(4<anon_inode:[eventpoll]>, EPOLL_CTL_DEL, 29<pipe:[69345]>, NULL) = 0 <0.000006>
...
~~~

Trying to backport https://github.com/systemd/systemd/commit/13bb1ffb912cacea4041910e38674e0984ac5772 and https://github.com/systemd/systemd/commit/bc989831e634123c2ff43bcbbeae19097ccc9ff9 doesn't help.

Because latest PCP uses Type=exec and the software regularly restarts, it's critical to fix this ASAP.

Comment 1 Renaud Métrich 2022-08-04 15:12:50 UTC
Forgot to mention that Fedora 36 works fine (systemd-250.8-1.fc36.x86_64) if that can help spot the missing commits.

Comment 2 Peter Vreman 2022-08-04 15:29:33 UTC
It is not a real regression compared RHEL8.4. It is a old bug that is now trigged by the RHEL8.6 PCP. The RHEL8.6 PCP started using 'Type=exec' compared to RHEL8.4/8.5 where it used 'Type=oneshot'
- It is also reproduceable with RHEL8.4-systemd+RHEL8.6-pcp
- The combination RHEL8.6-systemd+RHEL8.5-pcp works fine without leaking pipe fds.

It is gone unnoticed that long because there are few PCP users and it is not directly noticable as the number of fds/pipes is growning slowly and only after 10-20 days you suddenly see SELinux AVC for other services using a pipe() like rhsmcertd or winbindd.

Comment 3 Renaud Métrich 2022-08-05 06:25:28 UTC
Right, the issue seemed to have been always there, but since PCP was not using Type=exec in the past, this was unnoticed.
The workaround for now is to regularly execute "systemctl daemon-reexec", this can be triggered when we see a certain number of piped opened by systemd (through executing "ls -l /proc/1/fd | grep -c pipe" for example).

Comment 4 Peter Vreman 2022-08-05 08:50:57 UTC
Below the most simple and quick reproducer i could think of: Create a timer that runs every second that starts a service using type=exec

/etc/systemd/system/testexec.timer
~~~
[Unit]
Description=Test for type=exec timer

[Timer]
OnBootSec=0sec
OnUnitActiveSec=1sec
~~~


/etc/systemd/system/testexec.service
~~~
[Unit]
Description=Reproducer for type=exec

[Service]
Type=exec
ExecStart=/usr/bin/true
~~~

Now every second you will see the number of pipes growing with the command 'lsof -p1 | grep -c pipe'

Comment 5 Peter Vreman 2022-08-05 09:03:59 UTC
The timer is not even needed, you can jut call the testexec.service in a loop


~~~
[cb/Azure] root@li-lc-2623:~# systemctl cat testexec.service
# /etc/systemd/system/testexec.service
[Unit]
Description=Reproducer for type=exec

[Service]
Type=exec
ExecStart=/usr/bin/true

[cb/Azure] root@li-lc-2623:~# systemctl daemon-reexec

[cb/Azure] root@li-lc-2623:~# lsof -p1 | grep -c pipe
1

[cb/Azure] root@li-lc-2623:~# time for f in `seq 1 1000`; do systemctl start testexec; done

real    0m27.741s
user    0m4.630s
sys     0m5.338s

[cb/Azure] root@li-lc-2623:~# lsof -p1 | grep -c pipe
1001
~~~

Comment 6 Peter Vreman 2022-08-05 09:12:22 UTC
The problem is already every old, it is even reproduceable on RHEL8.2

~~~
myuser@rhel82:~$ sudo vi /etc/systemd/system/testexec.service

myuser@rhel82:~$ sudo systemctl daemon-reload
myuser@rhel82:~$ sudo systemctl cat testexec
# /etc/systemd/system/testexec.service
[Unit]
Description=Reproducer for type=exec

[Service]
Type=exec
ExecStart=/usr/bin/true
myuser@rhel82:~$ sudo lsof -p1 | grep -c pipe
1
myuser@rhel82:~$ sudo systemctl start testexec
myuser@rhel82:~$ sudo lsof -p1 | grep -c pipe
2
myuser@rhel82:~$ sudo systemctl start testexec
myuser@rhel82:~$ sudo lsof -p1 | grep -c pipe
3
myuser@rhel82:~$ rpm -q systemd
systemd-239-31.el8_2.8.x86_64
myuser@rhel82:~$
~~~

Comment 9 Plumber Bot 2022-08-09 14:50:54 UTC
fix merged to github master branch -> https://github.com/redhat-plumbers/systemd-rhel8/pull/304

Comment 13 errata-xmlrpc 2022-11-08 10:49:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (systemd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7727