Bug 1439989

Summary: why so many sessions scope file didn't been deleted?
Product: Red Hat Enterprise Linux 7 Reporter: muahao <muahao>
Component: dbusAssignee: David King <dking>
Status: CLOSED CURRENTRELEASE QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: high    
Version: 7.2CC: amigo.elite, apmukher, bs168, dtardon, hmatsumo, kfujii, kwalker, mboisver, mclasen, msekleta, muahao, rmetrich, s.chernomorets, systemd-maint-list
Target Milestone: rc   
Target Release: 7.6   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: dbus-1.10.14 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-29 09:27:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1643104, 1716963, 1719445    

Description muahao 2017-04-07 03:23:16 UTC
Description of problem:

Hello , I have so many docker container , and appear this problem, why there 
are so many session-xxxx.scope.d files and directorys in /run/systemd/system/ , I find my container ssh delays too slowly about 30s , I am puzzled that why so many "session-xxxx.scope.d" directorys didn't be deleted? I don't know if this is relate to https://github.com/systemd/systemd/issues/2925



Version-Release number of selected component (if applicable):


```
#rpm -q dbus
dbus-1.6.12-13.1.el7.x86_64
```

```
#rpm -q systemd
systemd-219-19.el7_2.13.x86_64.rpm 
```

```
[root /home/ahao.mah]
#ls -ld /run/systemd/system/session-*.scope* |wc -l
25981



```
[root /home/ahao.mah]
#ls -ld /run/systemd/system/session-*.scope* |wc -l
25981
```

```
[root /home/ahao.mah]
#ls -ld /run/systemd/system/session-*.scope* |head -n 100
-rw-r--r-- 1 root root  17 Mar  7 17:53 /run/systemd/system/session-1000019.scope
drwxr-xr-x 2 root root 140 Mar  7 17:53 /run/systemd/system/session-1000019.scope.d
-rw-r--r-- 1 root root  17 Mar  7 17:53 /run/systemd/system/session-1000020.scope
drwxr-xr-x 2 root root 140 Mar  7 17:53 /run/systemd/system/session-1000020.scope.d
-rw-r--r-- 1 root root  17 Mar  7 17:56 /run/systemd/system/session-1000404.scope
drwxr-xr-x 2 root root 140 Mar  7 17:56 /run/systemd/system/session-1000404.scope.d
-rw-r--r-- 1 root root  17 Mar  1 19:17 /run/systemd/system/session-100056.scope
drwxr-xr-x 2 root root 140 Mar  1 19:17 /run/systemd/system/session-100056.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:00 /run/systemd/system/session-1000871.scope
drwxr-xr-x 2 root root 140 Mar  7 18:00 /run/systemd/system/session-1000871.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:01 /run/systemd/system/session-1001130.scope
drwxr-xr-x 2 root root 140 Mar  7 18:01 /run/systemd/system/session-1001130.scope.d
-rw-r--r-- 1 root root  17 Mar  1 19:20 /run/systemd/system/session-100129.scope
drwxr-xr-x 2 root root 140 Mar  1 19:20 /run/systemd/system/session-100129.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:04 /run/systemd/system/session-1001450.scope
drwxr-xr-x 2 root root 140 Mar  7 18:04 /run/systemd/system/session-1001450.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:05 /run/systemd/system/session-1001572.scope
drwxr-xr-x 2 root root 140 Mar  7 18:05 /run/systemd/system/session-1001572.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:05 /run/systemd/system/session-1001573.scope
drwxr-xr-x 2 root root 140 Mar  7 18:05 /run/systemd/system/session-1001573.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:08 /run/systemd/system/session-1001866.scope
drwxr-xr-x 2 root root 140 Mar  7 18:08 /run/systemd/system/session-1001866.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:08 /run/systemd/system/session-1001867.scope
drwxr-xr-x 2 root root 140 Mar  7 18:08 /run/systemd/system/session-1001867.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:10 /run/systemd/system/session-1002216.scope
drwxr-xr-x 2 root root 140 Mar  7 18:10 /run/systemd/system/session-1002216.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:10 /run/systemd/system/session-1002217.scope
drwxr-xr-x 2 root root 140 Mar  7 18:10 /run/systemd/system/session-1002217.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:16 /run/systemd/system/session-1002918.scope
drwxr-xr-x 2 root root 140 Mar  7 18:16 /run/systemd/system/session-1002918.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:20 /run/systemd/system/session-1003641.scope
drwxr-xr-x 2 root root 140 Mar  7 18:20 /run/systemd/system/session-1003641.scope.d
-rw-r--r-- 1 root root  17 Mar  7 18:25 /run/systemd/system/session-1004120.scope
drwxr-xr-x 2 root root 140 Mar  7 18:25 /run/systemd/system/session-1004120.scope.d
-rw-r--r-- 1 root root  17 Mar  1 19:25 /run/systemd/system/session-100418.scope
```

```
[root /home/ahao.mah]
#cat /run/systemd/system/session-*.scope.d/50-Description.conf
[Unit]
Description=Session 1000019 of user mysql
[Unit]
Description=Session 1000020 of user root
[Unit]
Description=Session 1000404 of user mysql
[Unit]
Description=Session 100056 of user root
[Unit]
Description=Session 1000871 of user mysql
[Unit]
Description=Session 1001130 of user mysql
[Unit]
Description=Session 100129 of user mysql
[Unit]
Description=Session 1001450 of user root
[Unit]
Description=Session 1001572 of user mysql
[Unit]
Description=Session 1001573 of user root
[Unit]
Description=Session 1001866 of user root
[Unit]
Description=Session 1001867 of user mysql
[Unit]
Description=Session 1002216 of user mysql
[Unit]
Description=Session 1002217 of user mysql
[Unit]
Description=Session 1002918 of user mysql
```

looks like there is not so many connections 

```
[root /home/ahao.mah]
#netstat -antlupe | wc -l
35

[root /home/ahao.mah]
#lsof -i :22 |wc -l
6
```

Comment 2 Lukáš Nykrýn 2017-04-13 08:17:57 UTC
*** Bug 1439990 has been marked as a duplicate of this bug. ***

Comment 4 Michal Sekletar 2017-05-11 07:17:26 UTC
systemd versions prior to systemd-219-19.el7_2.13 were susceptible to this leak. Version of systemd that you are currently running contains the fix. Here is excerpt from changelog,

* Wed Jul 27 2016 Lukas Nykryn <lnykryn> - 219-19.13
- core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification (#1305608)

Is it possible that all those scopes originate from before you upgraded systemd? Do you see this abnormal accumulation of scope units after you reboot the box? 

Btw, relevant github issue is here https://github.com/systemd/systemd/issues/1961

Comment 5 muahao 2017-05-15 02:30:18 UTC
(In reply to Michal Sekletar from comment #4)
> systemd versions prior to systemd-219-19.el7_2.13 were susceptible to this
> leak. Version of systemd that you are currently running contains the fix.
> Here is excerpt from changelog,
> 
> * Wed Jul 27 2016 Lukas Nykryn <lnykryn> - 219-19.13
> - core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification
> (#1305608)
> 
> Is it possible that all those scopes originate from before you upgraded
> systemd? Do you see this abnormal accumulation of scope units after you
> reboot the box? 
> 
> Btw, relevant github issue is here
> https://github.com/systemd/systemd/issues/1961


I am sure those scopes originate after I upgraded systemd ,and occurred at a group of docker container who is running mysql server

I think this problem caused by dbus of this bug:https://bugs.freedesktop.org/show_bug.cgi?id=95263  

so I backport this patch : https://bugs.freedesktop.org/attachment.cgi?id=124842&action=edit ,but theses scopes still appear and log this log message: 

Apr 22 17:01:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 12
Apr 22 20:35:02 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 99
Apr 22 20:36:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 99
Apr 22 20:38:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 99
Apr 22 20:46:02 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 99
Apr 22 20:48:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 99

Comment 6 Michal Sekletar 2017-05-15 08:16:56 UTC
 (In reply to muahao from comment #5)
> I am sure those scopes originate after I upgraded systemd ,and occurred at a
> group of docker container who is running mysql server
> 
> I think this problem caused by dbus of this
> bug:https://bugs.freedesktop.org/show_bug.cgi?id=95263  

Well, this bug could potentially be triggered only when new sessions are being created without pause and dbus has always some messages with attached FDs in its receive queue for the connection (logind's connection in this case). 

Are you sure this matches conditions on your system? 

> 
> so I backport this patch :
> https://bugs.freedesktop.org/attachment.cgi?id=124842&action=edit ,but
> theses scopes still appear and log this log message: 

You applied workaround (note that it is now reverted upstream and better fixes were merged instead) and dbus then shouldn't kick logind out of bus, but since logind runs as root it should only log about the problem. Do you see this log message in journal? Because if not, then you are not running into this problem in the first place.

Also I think this issue is orthogonal to the original problem (leaking scope unis).

> 
> Apr 22 17:01:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close
> file descriptor: Could not close fd 12
> Apr 22 20:35:02 docker011160164087.xxx dbus-daemon[30585]: Failed to close
> file descriptor: Could not close fd 99
> Apr 22 20:36:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close
> file descriptor: Could not close fd 99
> Apr 22 20:38:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close
> file descriptor: Could not close fd 99
> Apr 22 20:46:02 docker011160164087.xxx dbus-daemon[30585]: Failed to close
> file descriptor: Could not close fd 99
> Apr 22 20:48:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close
> file descriptor: Could not close fd 99

Can you provide a sos-report from the affected system?

Comment 7 muahao 2017-05-17 03:20:32 UTC
I have almost 300 physical host,and each physical host running about 60 docker containers . I think maybe it is too much ,and almost docker containers are running mysql test ,I don't kown if it is resulting in too much connections ,but I don't think it is this reason;

I think it relates to this issue:
https://github.com/systemd/systemd/issues/1961
and
https://bugs.freedesktop.org/show_bug.cgi?id=95263

As we can see, it has bellow phenomenon:
1. when I login these problem docker container delay 25 s
2. login in container run ```journal -u dbus``` you will see

```
Apr 02 07:51:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 10
Apr 02 07:57:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 270
Apr 02 08:34:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 10
Apr 02 08:35:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 10
Apr 02 10:33:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 10
Apr 02 11:44:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 273
Apr 02 12:04:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 273
Apr 02 12:09:02 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 273
Apr 02 16:16:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 273
Apr 02 22:08:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 273
Apr 03 00:51:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 10
....
...

Apr 19 01:45:57 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Apr 19 01:46:01 docker011160.xxx dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Apr 19 01:46:01 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Apr 19 01:46:26 docker011160.xxx dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Apr 19 01:46:26 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Apr 19 01:46:32 docker011160.xxx dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Apr 19 01:46:32 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Apr 19 01:46:57 docker011160.xxx dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Apr 19 01:46:57 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Apr 19 01:47:02 docker011160.xxx dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Apr 19 01:47:02 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Apr 19 01:47:27 docker011160.xxx dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Apr 19 01:47:27 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Apr 19 01:47:29 docker011160.xxx dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Apr 19 01:47:29 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
Apr 19 01:47:54 docker011160.xxx dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Apr 19 01:47:54 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Apr 19 01:48:00 docker011160.xxx dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
```

top will see systemd-logind dbus-daemon too high:

```
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 91255 root      20   0   50344  18000   1404 R  76.0  0.0   0:25.84 systemd-logind
    79 dbus      20   0   34348   9024   1216 S  12.7  0.0   2761:36 dbus-daemon
  6941 mysql     20   0 1989064 280536   7288 S   1.5  0.1 865:02.70 mysqld

```

systemd-logind.service restart almost every minutes

```
Apr 29 06:29:57 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:30:57 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:31:58 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:32:59 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:34:00 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:35:02 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:36:04 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:37:04 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:38:05 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:39:07 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:40:08 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:41:09 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:42:10 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:43:10 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:44:12 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:45:14 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:46:14 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
Apr 29 06:47:15 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart.
```

There are too many abandon scopes

```
#systemctl | grep abandon | wc -l
25207

#systemctl | grep abandon  | grep mysql | wc -l
8799

#systemctl | grep abandon  | grep agent | wc -l
9224
```

man systemd.scope  we all know, Scope units are not configured via unit configuration files, but are only created programmatically using the bus interfaces of systemd

busctl can't see immornal problem

```
#busctl
NAME                              PID PROCESS         USER             CONNECTION    UNIT                      SESSION    DESCRIPTION
:1.0                                1 systemd         root             :1.0          -                         -          -
:1.2011552                      99310 polkitd         polkitd          :1.2011552    polkit.service            -          -
:1.2028946                      86791 systemd-logind  root             :1.2028946    systemd-logind.service    -          -
:1.2028954                      86976 su              root             :1.2028954    session-2950.scope        2950       -
:1.2028955                      87001 su              root             :1.2028955    session-2950.scope        2950       -
:1.2028956                      87011 su              root             :1.2028956    session-2950.scope        2950       -
:1.2028957                      87254 busctl          root             :1.2028957    sshd.service              -          -
net.reactivated.Fprint              - -               -                (activatable) -                         -
org.freedesktop.DBus                - -               -                -             -                         -          -
org.freedesktop.PolicyKit1      99310 polkitd         polkitd          :1.2011552    polkit.service            -          -
org.freedesktop.hostname1           - -               -                (activatable) -                         -
org.freedesktop.locale1             - -               -                (activatable) -                         -
org.freedesktop.login1          86791 systemd-logind  root             :1.2028946    systemd-logind.service    -          -
org.freedesktop.machine1            - -               -                (activatable) -                         -
org.freedesktop.systemd1            1 systemd         root             :1.0          -                         -          -
org.freedesktop.timedate1           - -               -                (activatable) -                         -

```


Look at logs ,it looks like first ```dbus-daemon[80]: Failed to close file descriptor: Could not close fd xxx
``` and then ```dbus alway active service 'org.freedesktop.login1' : timed out```


My try:
1. Just only restart dbus or systemd-login ,this problem will have.
2. If excute command ```systemctl | grep "abandoned" | grep -e "-[[:digit:]]" | sed "s/\.scope.*/.scope/" | xargs systemctl stop```  this problem will be solved short-lived

This can not fundamentally solve this problem

SO, I try to backport this patch to dbus:
1. Only backport this patch: https://bugs.freedesktop.org/page.cgi?id=splinter.html&bug=95263&attachment=124841  ----->   problem still exist,abandon scopes still increasing
2. Only backport this patch: https://bugs.freedesktop.org/page.cgi?id=splinter.html&bug=95263&attachment=124842
-----> problem still exsit,abandon scopes still increasing

we can see dbus changelog(https://cgit.freedesktop.org/dbus/dbus/tree/NEWS?h=master):

D-Bus 1.11.8 (2016-11-28)
Fixes:

• Work around an undesired effect of the fix for CVE-2014-3637
  (fd.o#80559), in which processes that frequently send fds, such as
  logind during a flood of new PAM sessions, can get disconnected for
  continuously having at least one fd "in flight" for too long;
  dbus-daemon interprets that as a potential denial of service attack.
  The workaround is to disable that check for uid 0 process such as
  logind, with a message in the system log. The bug remains open while
  we look for a more general solution.
  (fd.o#95263, LP#1591411; Simon McVittie)


So, I am so curious about is really my containers have immornal connnections too much and looks like ddos attack?

Comment 9 Michal Sekletar 2018-07-16 12:06:29 UTC
*** Bug 1438155 has been marked as a duplicate of this bug. ***

Comment 10 Lukáš Nykrýn 2019-05-13 12:27:56 UTC
The backport of upstream fix is non-trivial and potentially risky. Let's move that to 7.8.

Comment 14 David King 2019-08-15 06:33:53 UTC
dbus 1.10.24, which is in RHEL 7, already includes the fix mentioned in comment #7.

Comment 15 David King 2019-08-29 09:27:20 UTC
Closing as per comment #14.