Bug 1439989
Summary: | why so many sessions scope file didn't been deleted? | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | muahao <muahao> |
Component: | dbus | Assignee: | David King <dking> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Desktop QE <desktop-qa-list> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.2 | CC: | amigo.elite, apmukher, bs168, dtardon, hmatsumo, kfujii, kwalker, mboisver, mclasen, msekleta, muahao, rmetrich, s.chernomorets, systemd-maint-list |
Target Milestone: | rc | ||
Target Release: | 7.6 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | dbus-1.10.14 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-29 09:27:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1643104, 1716963, 1719445 |
Description
muahao
2017-04-07 03:23:16 UTC
*** Bug 1439990 has been marked as a duplicate of this bug. *** systemd versions prior to systemd-219-19.el7_2.13 were susceptible to this leak. Version of systemd that you are currently running contains the fix. Here is excerpt from changelog, * Wed Jul 27 2016 Lukas Nykryn <lnykryn> - 219-19.13 - core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification (#1305608) Is it possible that all those scopes originate from before you upgraded systemd? Do you see this abnormal accumulation of scope units after you reboot the box? Btw, relevant github issue is here https://github.com/systemd/systemd/issues/1961 (In reply to Michal Sekletar from comment #4) > systemd versions prior to systemd-219-19.el7_2.13 were susceptible to this > leak. Version of systemd that you are currently running contains the fix. > Here is excerpt from changelog, > > * Wed Jul 27 2016 Lukas Nykryn <lnykryn> - 219-19.13 > - core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification > (#1305608) > > Is it possible that all those scopes originate from before you upgraded > systemd? Do you see this abnormal accumulation of scope units after you > reboot the box? > > Btw, relevant github issue is here > https://github.com/systemd/systemd/issues/1961 I am sure those scopes originate after I upgraded systemd ,and occurred at a group of docker container who is running mysql server I think this problem caused by dbus of this bug:https://bugs.freedesktop.org/show_bug.cgi?id=95263 so I backport this patch : https://bugs.freedesktop.org/attachment.cgi?id=124842&action=edit ,but theses scopes still appear and log this log message: Apr 22 17:01:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 12 Apr 22 20:35:02 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 99 Apr 22 20:36:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 99 Apr 22 20:38:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 99 Apr 22 20:46:02 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 99 Apr 22 20:48:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close file descriptor: Could not close fd 99 (In reply to muahao from comment #5) > I am sure those scopes originate after I upgraded systemd ,and occurred at a > group of docker container who is running mysql server > > I think this problem caused by dbus of this > bug:https://bugs.freedesktop.org/show_bug.cgi?id=95263 Well, this bug could potentially be triggered only when new sessions are being created without pause and dbus has always some messages with attached FDs in its receive queue for the connection (logind's connection in this case). Are you sure this matches conditions on your system? > > so I backport this patch : > https://bugs.freedesktop.org/attachment.cgi?id=124842&action=edit ,but > theses scopes still appear and log this log message: You applied workaround (note that it is now reverted upstream and better fixes were merged instead) and dbus then shouldn't kick logind out of bus, but since logind runs as root it should only log about the problem. Do you see this log message in journal? Because if not, then you are not running into this problem in the first place. Also I think this issue is orthogonal to the original problem (leaking scope unis). > > Apr 22 17:01:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close > file descriptor: Could not close fd 12 > Apr 22 20:35:02 docker011160164087.xxx dbus-daemon[30585]: Failed to close > file descriptor: Could not close fd 99 > Apr 22 20:36:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close > file descriptor: Could not close fd 99 > Apr 22 20:38:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close > file descriptor: Could not close fd 99 > Apr 22 20:46:02 docker011160164087.xxx dbus-daemon[30585]: Failed to close > file descriptor: Could not close fd 99 > Apr 22 20:48:01 docker011160164087.xxx dbus-daemon[30585]: Failed to close > file descriptor: Could not close fd 99 Can you provide a sos-report from the affected system? I have almost 300 physical host,and each physical host running about 60 docker containers . I think maybe it is too much ,and almost docker containers are running mysql test ,I don't kown if it is resulting in too much connections ,but I don't think it is this reason; I think it relates to this issue: https://github.com/systemd/systemd/issues/1961 and https://bugs.freedesktop.org/show_bug.cgi?id=95263 As we can see, it has bellow phenomenon: 1. when I login these problem docker container delay 25 s 2. login in container run ```journal -u dbus``` you will see ``` Apr 02 07:51:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 10 Apr 02 07:57:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 270 Apr 02 08:34:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 10 Apr 02 08:35:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 10 Apr 02 10:33:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 10 Apr 02 11:44:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 273 Apr 02 12:04:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 273 Apr 02 12:09:02 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 273 Apr 02 16:16:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 273 Apr 02 22:08:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 273 Apr 03 00:51:01 docker011160.xxx dbus-daemon[80]: Failed to close file descriptor: Could not close fd 10 .... ... Apr 19 01:45:57 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out Apr 19 01:46:01 docker011160.xxx dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' Apr 19 01:46:01 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' Apr 19 01:46:26 docker011160.xxx dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out Apr 19 01:46:26 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out Apr 19 01:46:32 docker011160.xxx dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' Apr 19 01:46:32 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' Apr 19 01:46:57 docker011160.xxx dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out Apr 19 01:46:57 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out Apr 19 01:47:02 docker011160.xxx dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' Apr 19 01:47:02 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' Apr 19 01:47:27 docker011160.xxx dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out Apr 19 01:47:27 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out Apr 19 01:47:29 docker011160.xxx dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' Apr 19 01:47:29 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' Apr 19 01:47:54 docker011160.xxx dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out Apr 19 01:47:54 docker011160.xxx dbus-daemon[129907]: dbus[129907]: [system] Failed to activate service 'org.freedesktop.login1': timed out Apr 19 01:48:00 docker011160.xxx dbus[129907]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' ``` top will see systemd-logind dbus-daemon too high: ``` PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 91255 root 20 0 50344 18000 1404 R 76.0 0.0 0:25.84 systemd-logind 79 dbus 20 0 34348 9024 1216 S 12.7 0.0 2761:36 dbus-daemon 6941 mysql 20 0 1989064 280536 7288 S 1.5 0.1 865:02.70 mysqld ``` systemd-logind.service restart almost every minutes ``` Apr 29 06:29:57 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:30:57 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:31:58 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:32:59 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:34:00 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:35:02 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:36:04 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:37:04 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:38:05 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:39:07 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:40:08 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:41:09 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:42:10 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:43:10 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:44:12 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:45:14 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:46:14 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. Apr 29 06:47:15 docker01116.xxx systemd[1]: systemd-logind.service has no holdoff time, scheduling restart. ``` There are too many abandon scopes ``` #systemctl | grep abandon | wc -l 25207 #systemctl | grep abandon | grep mysql | wc -l 8799 #systemctl | grep abandon | grep agent | wc -l 9224 ``` man systemd.scope we all know, Scope units are not configured via unit configuration files, but are only created programmatically using the bus interfaces of systemd busctl can't see immornal problem ``` #busctl NAME PID PROCESS USER CONNECTION UNIT SESSION DESCRIPTION :1.0 1 systemd root :1.0 - - - :1.2011552 99310 polkitd polkitd :1.2011552 polkit.service - - :1.2028946 86791 systemd-logind root :1.2028946 systemd-logind.service - - :1.2028954 86976 su root :1.2028954 session-2950.scope 2950 - :1.2028955 87001 su root :1.2028955 session-2950.scope 2950 - :1.2028956 87011 su root :1.2028956 session-2950.scope 2950 - :1.2028957 87254 busctl root :1.2028957 sshd.service - - net.reactivated.Fprint - - - (activatable) - - org.freedesktop.DBus - - - - - - - org.freedesktop.PolicyKit1 99310 polkitd polkitd :1.2011552 polkit.service - - org.freedesktop.hostname1 - - - (activatable) - - org.freedesktop.locale1 - - - (activatable) - - org.freedesktop.login1 86791 systemd-logind root :1.2028946 systemd-logind.service - - org.freedesktop.machine1 - - - (activatable) - - org.freedesktop.systemd1 1 systemd root :1.0 - - - org.freedesktop.timedate1 - - - (activatable) - - ``` Look at logs ,it looks like first ```dbus-daemon[80]: Failed to close file descriptor: Could not close fd xxx ``` and then ```dbus alway active service 'org.freedesktop.login1' : timed out``` My try: 1. Just only restart dbus or systemd-login ,this problem will have. 2. If excute command ```systemctl | grep "abandoned" | grep -e "-[[:digit:]]" | sed "s/\.scope.*/.scope/" | xargs systemctl stop``` this problem will be solved short-lived This can not fundamentally solve this problem SO, I try to backport this patch to dbus: 1. Only backport this patch: https://bugs.freedesktop.org/page.cgi?id=splinter.html&bug=95263&attachment=124841 -----> problem still exist,abandon scopes still increasing 2. Only backport this patch: https://bugs.freedesktop.org/page.cgi?id=splinter.html&bug=95263&attachment=124842 -----> problem still exsit,abandon scopes still increasing we can see dbus changelog(https://cgit.freedesktop.org/dbus/dbus/tree/NEWS?h=master): D-Bus 1.11.8 (2016-11-28) Fixes: • Work around an undesired effect of the fix for CVE-2014-3637 (fd.o#80559), in which processes that frequently send fds, such as logind during a flood of new PAM sessions, can get disconnected for continuously having at least one fd "in flight" for too long; dbus-daemon interprets that as a potential denial of service attack. The workaround is to disable that check for uid 0 process such as logind, with a message in the system log. The bug remains open while we look for a more general solution. (fd.o#95263, LP#1591411; Simon McVittie) So, I am so curious about is really my containers have immornal connnections too much and looks like ddos attack? *** Bug 1438155 has been marked as a duplicate of this bug. *** The backport of upstream fix is non-trivial and potentially risky. Let's move that to 7.8. dbus 1.10.24, which is in RHEL 7, already includes the fix mentioned in comment #7. Closing as per comment #14. |