Bug 1531486
Summary: | Make connection to dbus asynchronous | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Michal Sekletar <msekleta> |
Component: | systemd | Assignee: | systemd-maint |
Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-daemons |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.4 | CC: | akaiser, amarirom, andreas.luik, apmukher, arjan.oosting, ayadav, bschubert, carsten.grohmann, cbesson, cwarfiel, jblaine, jmagrini, jreuter, kwalker, mbliss, mmezynsk, msekleta, mssmurthy.tech, neo, nermolov1, pdwyer, qguo, rmetrich, robert.weaver, sbroz, steved, swhiteho, systemd-maint-list, systemd-maint, tcrider, yoyang |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-11-11 21:54:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1530721 | ||
Bug Blocks: | 1551061, 1643104, 1716963, 1719445 |
Description
Michal Sekletar
2018-01-05 09:54:42 UTC
My take on this is that this is going to make things super complex to reason about and even harder to debug than they are now. I would suggest first to try thinking about why the dependency cycle is there and maybe redesigning things a bit. Just a thought. (In reply to Jan Synacek from comment #3) > My take on this is that this is going to make things super complex to reason > about and even harder to debug than they are now. I would suggest first to > try thinking about why the dependency cycle is there and maybe redesigning > things a bit. Just a thought. Are you suggesting that we should push dbus down to the kernel level ;) ? On a more serious note. We've already discussed this w/ Lennart on multiple occasions and making that code async seems to be the only viable solution. The problem here is that while systemd deliberately avoid NSS stack calls, dbus-daemon doesn't (for valid reasons btw). And since we don't control either random NSS modules users might use or /etc/nsswitch.conf configuration then we can't do much about the deadlock. IOW, we can't guarantee that those modules don't wait on systemd while called from dbus-daemon on which systemd is blocked. Unless we bite the bullet and do the move towards async code for connection establishment. This one still needs a solution, but we can't deliver it in 7.6 timeframe. This is not fully fixed in upstream yet, so it will not be fixed in 7.7 timeframe. 11 months and still unfixed. It's a shame. This is exactly why systemd should not have been pushed as default until it is working as it should. Together with the various bugs alone in the installer (limited swap space, luks fail), one could say, that RH really have had better times in terms of reliability. I really hope the release 8 will be stable again. even 15 months. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. And you can't just make the workaround given here the default? https://access.redhat.com/discussions/3536621 I mean here: https://access.redhat.com/solutions/3900301 FWIW, this issue becomes an order of magnitude worse when running multiple systems as VMs and rebooting them en masse. I am converting a complex, unmanaged, and unmaintained VM mess to be Ansible installed and managed. Currently I have deployed about 60 test VMs. If I do updates with Ansible and it needs to reboot a bunch of these systems at the same time (remember they are VMs so they are really running on some shared hardware, which can impact scheduling) several of them will come up in the broken polkit/systemd state (about 5 of 30 in a few trials). Similarly, if I use Ansible to clone multiple new VMs from a template and turn them on, which it seems to do five at a time, some will come up in the broken polkit/systemd state. This isn't a configuration issue, it seems like a timing issue. Going through and manually rebooting the broken systems one by one brings them back. This is a giant pain in the rear when managing a large farm of VMs. Running up to date CentOS 7.7 (minimal installs plus my own list of packages, no desktop env) with systemd 67.el7_7.3. We're seeing this on RHEL 7.8. The workaround presented in https://access.redhat.com/solutions/3900301 does not apply anymore as there ARE NO BindIPv6Only,ListenStream,ListenDatagram,ListenStream,ListenDatagram lines even in /usr/lib/systemd/system/rpcbind.socket in 7.8. There is no NIS or NIS+ mentioned in /etc/nsswitch.conf as the KB article alludes to. It is still happening on RHEL7.8 as well. Any fix for this? Ours happened (reported above) EXACTLY right after this: Jun 24 05:36:45 Updated: systemd-libs-219-73.el7_8.8.x86_64 Jun 24 05:37:52 Updated: systemd-219-73.el7_8.8.x86_64 Jun 24 05:37:54 Updated: systemd-sysv-219-73.el7_8.8.x86_64 Jun 24 05:38:37 Updated: systemd-python-219-73.el7_8.8.x86_64 Jun 24 05:38:37 Updated: systemd-devel-219-73.el7_8.8.x86_64 Jun 24 05:39:21 Updated: systemd-libs-219-73.el7_8.8.i686 [m26560@neon ~]$ sudo grep freedesktop /var/log/messages Jun 24 04:00:46 neon dbus[1195]: [system] Successfully activated service 'org.freedesktop.hostname1' Jun 24 04:02:12 neon dbus[1195]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) Jun 24 04:02:12 neon dbus[1195]: [system] Successfully activated service 'org.freedesktop.problems' Jun 24 05:02:18 neon dbus[1195]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' Jun 24 05:02:18 neon dbus[1195]: [system] Successfully activated service 'org.freedesktop.hostname1' Jun 24 05:04:45 neon dbus[1195]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) Jun 24 05:04:45 neon dbus[1195]: [system] Successfully activated service 'org.freedesktop.problems' Jun 24 05:52:23 neon dbus[1195]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out Jun 24 05:52:48 neon dbus[1195]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out Jun 24 05:53:13 neon dbus[1195]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out Jun 24 05:53:38 neon dbus[1195]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out Jun 24 05:54:03 neon dbus[1195]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out ... Hi , As stated by - [ Jeff Blaine 2020-06-24 19:02:01 UTC ] Redhat Fix is not applicable . I have just upgraded a Centos 7.8 server, and if failed to come back after reboot. It bounces back into Emergency mode. Error message listed. is [ Authorization not available. Check if polkit service is running or see debug message for more information. polkitd reports that the [lost the name org.freedesktop.PolicyKit1 ] I would try and give more info , but as this is emergency mode , and the fact that no services seem to have permission to do anything there no logs. Things tried. Removed all entries in nsswitch.conf regarding NIS . Tried RHEL fix [3900301] but this is not relevant in 7.8 centos. Tried booking from a different kernel , but same error persists. Same upgrade was completed on 70 other likewise machines an no issues. Checking /usr/share/polkit-1/ on both machines shows identical files. same for /etc/polkit as well. RPMS Installed on machines is : polkit-pkla-compat-0.1-4.el7.x86_64 polkit-0.112-26.el7.x86_64 Any help to understand why this is happening would be great , i can rebuild this machine . And Since this patch run was done on UAT ,not going forward with Live till i know how i can fix this issue. help. ? Thanks Rob. Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7. From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. From the RHEL life cycle page: https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase "During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available." If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes: https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns. [0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7 Dropping the stale needinfo. If our input is still needed, please set the needinfo again. |