Red Hat Bugzilla – Bug 1568856
DBUS Breaks on OS Point-Release Upgrade
Last modified: 2018-10-26 22:28:49 EDT
Description of problem: When upgrading from 7.x to 7.5, DBUS irrecoverably breaks Version-Release number of selected component (if applicable): How reproducible: 100% reproducible Steps to Reproduce: 1. Launch a RedHat-publish 7.4 (or earlier 7.x releas) AMI 2. Execute a `yum updat` 3. Update hangs - usually around "vim-common" - due to the updating RPM getting stuck performing a `dbus-send` operation 4. Terminate hung `dbus-send` process (either manually or allowing update to time out) or issue a `systemctl restart debus` 5. All DBUS-related services begin to fail (reboot hangs on various services' DBUS-signaling) 6. Login to rebooted system, tools depending on DBUS (e.g., timedatectl) receive connection refused errors from DBUS Actual results: All DBUS-enabled services receive connection refused errors when attempting to communicate with DBUS Expected results: All DBUS-enabled services function normally Additional info: First observed when running an `oscap --remediate` (see previous BugZilla https://bugzilla.redhat.com/show_bug.cgi?id=1566089). As part of a larger remediation profile, a `yum update` is executed. When oscap attempts its remediation activities, the first DBUS-enabled service it attempts to remediate tends to be autofs. This step hangs. After much debugging, ultimately traced it back to the deranged state of DBUS post `yum update`. We'd also previously encountered - but had not isolated the cause of - the issue last summer when we were upgrading from 7.3 to 7.4. At the time, our users' RHEL 7 adoption-rates were exceedingly low, so we only ever noticed the occasional "quark 39" errors while deriving customized AMIs from the Red Hat published ones. The errors had not been fatal to that process and did not show up in instances launched from those AMIs, so it was a low-order priority to investigate (basically an "academic issue" or "things that make you go 'hmm...'). In retrospect the underlying issue appears to be the same as what caused us to open this ticket: we can provoke the same error by attempting to create a 7.5 AMI when deriving from an earlier 7.x release. In looking through BugZilla, seems there's a number of people who've encountered DBUS issues when doing OS point-release upgrades (7.x to 7.x'). Most are either in some pending or a WONTFIX state but saw none with a fix-pending state. Thus far, our only recourse has been to tell our users: - Re-deploy your application onto a 7.5 AMI or - Install the yum versionlock plugin and version-lock your current DBUS installation and any packages that depend on it (thus far, that requires excluding the dbus, dbus-libs, teamd, libteam and wpa_supplicant packages in yum.conf and locking openscap-1.2.14 and openscap-scanner-1.2.14 in /etc/yum/pluginconf.d/versionlock.list The former is sub-ideal for our users who haven't fully-automated their application deployments. The latter is sub-ideal from both a "complexity for non RHEL-guru users" and a leaving critical system components unpatched perspective (our IA teams take a very dim view of this, obviously!).
I would say this is basically clone of bz1550582 It is not expected updating and therefore restarting dbus from e.g. graphical session (and it is not clear from this bz if you did the restart from multi-user or graphical target) would work properly. But in this case it seems dbus is not running or not working after update, correct. What yum reinstall dbus\* do (in multi-user taget), can you seem something from systemctl status dbus or in journal? Thanks -Tom
We don't do graphical in AWS - AWS doesn't offer (virtual) console access. Due to lack of out of band access (console — graphical, serial or otherwise), all update actions happen at run-level 3 (or whatever the systemd equivalent is). In general, updating this way works. It's really only when there's a (mercifully-infrequent) DBUS update that we tend to have these issues. Typically what we see after a generic `yum -y update` is that DBUS becomes deranged and there's no communicating with it. The deranged state persists across reboots. While we haven't explicitly tried doing a `yum reinstall dbus\*` once things have reached this state, my suspicion is that it will fail. As to where we're observing logged symptoms/outputs: we're seeing them in the journals and the legacy log-files. Doing a `timedatectl status` is just a shorthand way of checking "is DBUS pissed off."
This is caused by dbus-daemon being rebased to a new version (dbus-1.10.24-7.el7) in RHEL 7.5, and the locations of several dbus tools being migrated. dbus-send moved from /bin to /usr/bin, and dbus-daemon-launch-helper moved from under libdir to under libexecdir. The location of dbus-daemon-launch-helper is described in a configuration file (/usr/share/dbus-1/system.conf), but until dbus-daemon is updated, any running dbus-daemon instance (for the system bus, specifically) will be unaware of the new location, and will fail when trying to launch a system service. The running (old) dbus-daemon will fail to read the system.conf configuration file (because the canonical location changed, from under /etc to under /usr), and restarting dbus-daemon will disconnect all currently-connected services, which will not reconnect unless they are restarted afer dbus. If a scriptlet in a package called dbus-send and triggered a service activation, the activation would likely fail (because the helper binary would not be found). I could not find any uses of dbus-send in the vim package (nor vim-common subpackage), but it may be another package that is calling dbus-send, or it may be called as a side effect of the scriptlets in vim-common. A workaround for this problem would be to create a symlink between the old and new locations of dbus-daemon-launch-helper, so that the running dbus-daemon for the system bus can still call out to it. An alternative would be to update on the dbus packages, and then to restart the system immediately, before updating any other packages, although this may not be feasible if the shutdown process triggers any service activations.
Cool. Thanks for the detailed info. I'll try setting up a symlink as part of the upgrade process to see if that helps us. I'm reasonably certain that the vim landmark is simply "last thing yum actually completed" rather than being the stuck process. I'm at a different work-site for the next day or so: depending on after-hours time-constraints, I may not have a yay/neigh update till Thursday.
About to try some of the suggestions you made. Tested our problem across several AWS regions and found that simply doing a `yum update dbus` was sufficient to break a system (and it's reproducible 100% of the time with thatt). So, it's definitely in that subsystem that we're experiencing problems.
Alright, I'm on an instance launched from a 7.4 AMI. In looking at the current "/bin/dbus-send", `readlnk` is telling me that, even though the RPM's `-ql` output says "/bin/dbus-send", the true location is already "/usr/bus/dbus-send" I'm currently a little unclear how pre-creating /usr/libexec/dbus-1/dbus-daemon-launch-helper as a symlink is going to help me? Won't updating the RPM blow away that symlink, leaving me in the same place I was before? Or, are you saying I should do something more like `mv /lib64/dbus-1/dbus-daemon-launch-helper /usr/libexec/dbus-1/dbus-daemon-launch-helper && ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper /lib64/dbus-1/dbus-daemon-launch-helper` I'd have a similar question for the system.conf file, but the new RPM appears to have both an "/etc/dbus-1/system.conf" file (like the 7.4 packaging) as well as a "/usr/share/dbus-1/system.conf". That said, the file in /etc is only 833bytes while the one in /usr/share is 4362bytes. Since this is a test-rig, I can blow it up, so I'll probably try out any permutations I can think of. However, it might prove helpful if you were able to provide further, detailed instructions. Thanks and advance.
Launched instance from AMI. Placed the following into the instance's UserData: ``` #!/bin/bash if [[ -d /usr/libexec/dbus-1 ]] then echo "Directory already exists" else printf "Creating new directory" install -d -m 000755 /usr/libexec/dbus-1 -o root -g root && echo Success || echo FAILED printf "Updating SEL labels... " chcon --reference /lib64/dbus-1 /usr/libexec/dbus-1 && echo Success || echo FAILED fi printf "Moving dbus-daemon-launch-helper... " mv /lib64/dbus-1/dbus-daemon-launch-helper /usr/libexec/dbus-1/dbus-daemon-launch-helper && echo Success || echo FAILED printf "Creating symlink... " ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper mv /lib64/dbus-1/dbus-daemon-launch-helper && echo Success || echo FAILED sleep 10 yum update -y dbus && init 6 ``` After rebooting, system was in the same broken state as it gets to without attempting to fix paths.
Any other tests to run or fixes to try?
Please share the AMI ID that you are using to test. I've tried this in a local VM and in AWS but haven't been able to reproduce.
Any AMI returned by: https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Images:visibility=public-images;search=spel-minimal-rhel-7.4-hvm;sort=desc:creationDate Exhibits the issue. The most recent of the above would be ami-05aa42022a79b86e7 We had no projects that were using 7.3. No DBUS issues were reported when going from 7.3 to 7.4. That said, as part of testing for this BugZilla submission, we verified that the issue is triggered when upgrading from 7.3 directly to 7.5. At this point, all but one of the 7.3 AMIs (ami-28b5b23e) have aged off.
The most recent AMI from that query is technically RHEL 7.5 and does not reproduce the issue in question. Use any AMI from March or earlier, such as `ami-0338b428e333e97eb`.
(In reply to Loren Gordon from comment #12) > The most recent AMI from that query is technically RHEL 7.5 and does not > reproduce the issue in question. Use any AMI from March or earlier, such as > `ami-0338b428e333e97eb`. These are public AMIs so shouldn't need to share them to you. Lemme know if you run into access issues with the AMI ID Loren noted.
(In reply to Thomas Jones from comment #14) > (In reply to Loren Gordon from comment #12) > > The most recent AMI from that query is technically RHEL 7.5 and does not > > reproduce the issue in question. Use any AMI from March or earlier, such as > > `ami-0338b428e333e97eb`. > > These are public AMIs so shouldn't need to share them to you. Lemme know if > you run into access issues with the AMI ID Loren noted. Any news? Thanks -Tom
(In reply to Tomas Pelka from comment #15) > Any news? > From us? No. We're currently waiting on you guys to see if the AMI listed by @loren was locatable and useable and if Red Hat was had had a chance to use it to reproduce the problem and start diagnostics. Was actually checking the case to see if I needed to Bueller it as I'd not received any case update notifications. If your diagnosticians prefer to do work in different regions, we can provide AMI-IDs equivalent to `ami-0338b428e333e97eb` in us-east-2, us-west-1, us-west-2 and even us-gov-west-1. They're all created by the same processes.
@Thomas: There might be a syntax error in your UserData. printf "Creating symlink... " ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper mv /lib64/dbus-1/dbus-daemon-launch-helper && echo Success || echo FAILED Three arguments to `ln`?
Thank you for that catch. Copy-paystah error. Fixed to: ``` mv /lib64/dbus-1/dbus-daemon-launch-helper \ /usr/libexec/dbus-1/dbus-daemon-launch-helper && \ ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper \ /lib64/dbus-1/dbus-daemon-launch-helper ``` Either way: does not seem to change the defective behavior encountered when the dbus RPM is updated.
Probably worth noting that all of the UserData actions return success. However, after the `yum update` (and associated `init 6`) runs, the /lib64/dbus-1/ directory wholly disappears. When the system (eventually) returns from the `init 6`, DBUS is in its usual, "unhappy" state.
I've run into the same problem and addressed it. I noticed that dbus.socket had changed from /var/run/dbus/system_bus_socket to /run/dbus/system_bus_socket. In CentOS 7.X, originally /var/run is symlink point to /run directory. So this change wouldn't be a problem in most cases. But in my case, unintendedly, /var/run wasn't a symbolic link. So many processes had failed to handle the socket. After fix /var/run to a symlink, the upgrade had been succeeded.
I made a mistake in writing... s/CentOS/RHEL/
(In reply to Satoshi Tajima from comment #25) > But in my case, unintendedly, /var/run wasn't a symbolic link. > So many processes had failed to handle the socket. > > After fix /var/run to a symlink, the upgrade had been succeeded. If this is the case, it is not easy to fix inside the dbus package, and arguably the wrong place, as the filesystem package owns the /var/run symlink. The easiest thing at this point is to mention in the release notes that dbus upgrades require that /var/run is a symlink to /run (which is the default case in all versions of RHEL7, as far as I am aware).