Bug 1532105
Summary: | Slow login over SSH because of dbus systemd-logind issues | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | SHAURYA <sshaurya> |
Component: | systemd | Assignee: | systemd-maint |
Status: | CLOSED NEXTRELEASE | QA Contact: | qe-baseos-daemons |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | avettath, ayadav, dtardon, johannespfau, jonathan, kwalker, mike.ely, msekleta, rmetrich, rupatel, sbroz, sshaurya, systemd-maint-list |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-02-16 21:34:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 8
David Tardon
2019-01-31 18:13:56 UTC
Since detail on this bug is completely absent (and the github repo is private?!), I'm pasting in my comment from 1623651 so people searching on this issue will be able to find the bug actively being worked rather than the dupe: Description of problem: After restarting dbus with "systemctl restart dbus.service" new SSH sessions take 25 seconds to complete, and the following is logged: pam_systemd(sshd:session): Failed to create session : Connection timed out dbus[30997]: [system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' dbus[30997]: [system] Failed to activate service 'org.freedesktop.login1': timed out Running 'systemctl status -l org.freedesktop.login' returns: Failed to abandon session scope: Transport endpoint is not connected Restarting systemd-logind resolves the issue. How reproducible: Usually Steps to Reproduce: 1. systemctl restart dbus.service 2. attempt to login via ssh 3. observe logs similar to above Actual results: The systemd-logind service enters a broken state. New connections via SSH (or presumably other service calling logind) take 25 seconds to complete. Expected results: The systemd-logind service does not enter a broken state after restarting dbus. @Mike Ely, I apologize for the delay in response. The process you have outlined would absolutely incur a problem with ssh logins, but that is actually by design. When you restart dbus, you disconnect all of the endpoints that were connected to the bus. I believe you had mentioned that in another bug that had been marked as duplicate of this instance. From a representative system: # busctl NAME PID PROCESS USER CONNECTION UNIT SESSION DESCRIPTION :1.0 1 systemd root :1.0 - - - :1.1 689 systemd-logind root :1.1 systemd-logind.service - - :1.19 1182 tuned root :1.19 tuned.service - - :1.2 673 polkitd polkitd :1.2 polkit.service - - :1.3 751 firewalld root :1.3 firewalld.service - - :1.464 10773 abrt-dbus root :1.464 dbus.service - - :1.468 10821 busctl root :1.468 session-176.scope 176 - :1.48 7197 NetworkManager root :1.48 NetworkManager.service - - :1.52 7197 NetworkManager root :1.52 NetworkManager.service - - com.redhat.RHSM1 - - - (activatable) - - com.redhat.RHSM1.Facts - - - (activatable) - - com.redhat.SubscriptionManager - - - (activatable) - - com.redhat.ifcfgrh1 7197 NetworkManager root :1.52 NetworkManager.service - - com.redhat.problems.configuration - - - (activatable) - - com.redhat.tuned 1182 tuned root :1.19 tuned.service - - fi.epitest.hostap.WPASupplicant - - - (activatable) - - fi.w1.wpa_supplicant1 - - - (activatable) - - net.reactivated.Fprint - - - (activatable) - - org.fedoraproject.FirewallD1 751 firewalld root :1.3 firewalld.service - - org.freedesktop.DBus 690 dbus-daemon dbus org.freedesktop.DBus dbus.service - - org.freedesktop.NetworkManager 7197 NetworkManager root :1.48 NetworkManager.service - - org.freedesktop.PolicyKit1 673 polkitd polkitd :1.2 polkit.service - - org.freedesktop.hostname1 - - - (activatable) - - org.freedesktop.import1 - - - (activatable) - - org.freedesktop.locale1 - - - (activatable) - - org.freedesktop.login1 689 systemd-logind root :1.1 systemd-logind.service - - org.freedesktop.machine1 - - - (activatable) - - org.freedesktop.nm_dispatcher - - - (activatable) - - org.freedesktop.problems 10773 abrt-dbus root :1.464 dbus.service - - org.freedesktop.systemd1 1 systemd root :1.0 - - - org.freedesktop.timedate1 - - - (activatable) - - I restart dbus, and I get the following: # systemctl restart dbus # busctl NAME PID PROCESS USER CONNECTION UNIT SESSION DESCRIPTION :1.0 1 systemd root :1.0 - - - :1.1 10851 busctl root :1.1 session-176.scope 176 - com.redhat.RHSM1 - - - (activatable) - - com.redhat.RHSM1.Facts - - - (activatable) - - com.redhat.SubscriptionManager - - - (activatable) - - com.redhat.problems.configuration - - - (activatable) - - fi.epitest.hostap.WPASupplicant - - - (activatable) - - fi.w1.wpa_supplicant1 - - - (activatable) - - net.reactivated.Fprint - - - (activatable) - - org.freedesktop.DBus 10839 dbus-daemon dbus org.freedesktop.DBus dbus.service - - org.freedesktop.NetworkManager - - - (activatable) - - org.freedesktop.PolicyKit1 - - - (activatable) - - org.freedesktop.hostname1 - - - (activatable) - - org.freedesktop.import1 - - - (activatable) - - org.freedesktop.locale1 - - - (activatable) - - org.freedesktop.login1 - - - (activatable) - - org.freedesktop.machine1 - - - (activatable) - - org.freedesktop.nm_dispatcher - - - (activatable) - - org.freedesktop.problems - - - (activatable) - - org.freedesktop.systemd1 1 systemd root :1.0 - - - org.freedesktop.timedate1 - - - (activatable) - - Notice that systemd-logind is not on the bus after the reboot. The delays in login will be seen until systemd-logind is restarted. # busctl | grep org.freedesktop.login1 org.freedesktop.login1 - - - (activatable) - - # ps aux | grep systemd-logind -m 1 root 11091 0.0 0.1 26376 1748 ? Ss 15:34 0:00 /usr/lib/systemd/systemd-logind <Logins take 25-30 seconds to timeout in communicating to systemd-logind> # systemctl restart systemd-logind # busctl | grep org.freedesktop.login1 org.freedesktop.login1 11365 systemd-logind root :1.9 systemd-logind.service - - <Logins are no longer slow> Does the above information and illustration meet the need? The technical limitations around restarting dbus directly have been discussed in a few venues, but each result in guidance to avoid restarting as it distributes the need to reconnect to the bus to all the components on the bus. Please let me know if you have any questions or concerns regarding the above, or if it would be acceptable to close this bug report. Hi Kyle, Thanks for getting back to me. As I understand your response you are saying "yes, this is expected behavior." The problem is that there are several packages that will auto-update (using for example the vendor-supplied "yum-cron" package) in a way that will trigger this unwanted condition. In other words, there's a bug here. Is there no way to create a dependency chaining mechanism by which a restart of dbus triggers a restart of dependent services? Either that or we need to bring this to the yum-cron maintainers and say "exclude packages X Y and Z from auto-update." My strong preference is the former as the latter feels an awful lot like pushing design deficiencies from here downstream to another, in this case guiltless, package maintainer. (In reply to Mike Ely from comment #11) > Thanks for getting back to me. As I understand your response you are saying > "yes, this is expected behavior." The problem is that there are several > packages that will auto-update (using for example the vendor-supplied > "yum-cron" package) in a way that will trigger this unwanted condition. In > other words, there's a bug here. That is extremely unfortunate, and I can absolutely commiserate on that front. In the packaging ecosystem, there are quite a few instances where normal operations can subject end users to behaviours like these. One small positive note that that I waited to raise, at least in the initial failure condition noted, the RHEL 8 release seems to no longer be impacted to the same degree. When testing on that release, instead of D-Bus having to notify the endpoints that a reconnection is necessary, the systemd side takes on this role directly. We can see the following during a "systemctl restart dbus" on that release we see the loss of name on the bus resulting in an automatic termination of the applicable service: Feb 16 16:14:08 ... systemd[1]: Stopping D-Bus System Message Bus... <snip> Feb 16 16:14:08 ... systemd[1]: systemd-logind.service: D-Bus name org.freedesktop.login1 no longer registered by :1.1 Feb 16 16:14:08 ... systemd[1]: systemd-logind.service: Changed running -> stop-sigterm Feb 16 16:14:08 ... systemd[1]: polkit.service: D-Bus name org.freedesktop.PolicyKit1 no longer registered by :1.4 Feb 16 16:14:08 ... systemd[1]: systemd-journald.service: Got notification message from PID 608 (FDSTORE=1) Feb 16 16:14:08 ... systemd[1]: systemd-journald.service: Added fd 19 (n/a) to fd store. Feb 16 16:14:08 ... systemd[1]: Got cgroup empty notification for: /system.slice/polkit.service Feb 16 16:14:08 ... systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/agent interface=org.freedesktop.systemd1.Agent member=Released cookie=11 reply_cookie=0 signature=s erro> Feb 16 16:14:08 ... systemd[1]: systemd-logind.service: Got notification message from PID 1644 (STOPPING=1, STATUS=Shutting down...) <snip> Since the systemd-logind service is set to Restart=always, this results in tying the restart of dbus and systemd-logind together. From there, the startup of systemd-logind results in the service regaining the endpoint on the bus, and the same delays are no longer seen. To that end, I would like to close this particular bug. When Red Hat shipped 7.7 on Aug 6, 2019 Red Hat Enterprise Linux 7 entered Maintenance Support 1 Phase. https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_1_Phase That means only "Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released". This BZ does not appear to meet Maintenance Support 1 Phase criteria so is being closed NEXTRELEASE. If this is critical for your environment please open a case in the Red Hat Customer Portal, https://access.redhat.com, provide a thorough business justification and ask that the BZ be re-opened for consideration in the next minor release. Hi Mike, I would like to inform you that a RFE has been filed to increase robustness around DBus: BZ#1654779 There, I'm proposing a hardening of systemd-logind which consists in restarting systemd-logind if for some reason dbus restarts: For systemd-logind, the following workaround or hardening can be used: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # mkdir -p /etc/systemd/system/systemd-logind.service.d # cat > /etc/systemd/system/systemd-logind.service.d/bz1654779.conf << EOF [Unit] After=dbus.service BindsTo=dbus.service EOF # systemctl daemon-reload -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- The After directive is needed otherwise it may happen that upon restarting dbus, systemd-logind restarts before dbus, causing it to register against the "old" dbus instance. Renaud. |