Bug 1568856

Summary: DBUS Breaks on OS Point-Release Upgrade
Product: Red Hat Enterprise Linux 7 Reporter: Thomas Jones <redhat>
Component: dbusAssignee: David King <dking>
Status: CLOSED CURRENTRELEASE QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact: Marie Hornickova <mdolezel>
Priority: urgent    
Version: 7.5CC: amike, cobrown, cww, dking, dkochuka, dwilloug, fweimer, jkoten, joboyer, jomurphy, kwalker, lmiksik, loren, masanari.iida, mclasen, mjahoda, mmckinst, ncrawford, pdwyer, ptalbert, redhat, rmullett, salmy, sdodson, shane.seymour, snavale, tajima1989, takirby, tpelka, vbenes
Target Milestone: rcKeywords: Documentation, PrioBumpGSS, Regression, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: dbus-1.10.24-13.el7 Doc Type: Bug Fix
Doc Text:
.Running *dbus-daemon* no longer fails to activate a system service With the rebase of the D-Bus message bus daemon (*dbus-daemon*) to version 1.10.24, locations of several *dbus* tools were migrated. The `dbus-send` executable was moved from the `/bin` directory to the `/usr/bin` directory; the `dbus-daemon-launch-helper` executable was moved from the `libdir` directory to the `libexecdir` directory. Consequently, if a scriptlet in a package called the `dbus-send` command to send a message to D-Bus, and triggered a service activation, the activation could fail. With this update, the bug has been fixed by creating compatibility symlinks between the old and new locations of `dbus-daemon-launch-helper`. As a result, any running instance of *dbus-daemon* can now call the system bus and activate a system service.
Story Points: ---
Clone Of:
: 1660160 1660162 (view as bug list) Environment:
Last Closed: 2019-08-28 08:43:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1539427, 1565828, 1566309, 1566312, 1566313, 1630904, 1660160, 1660162    

Description Thomas Jones 2018-04-18 10:38:45 UTC
Description of problem:

When upgrading from 7.x to 7.5, DBUS irrecoverably breaks

Version-Release number of selected component (if applicable):




How reproducible:

100% reproducible

Steps to Reproduce:
1. Launch a RedHat-publish 7.4 (or earlier 7.x releas) AMI
2. Execute a `yum updat`
3. Update hangs - usually around "vim-common" - due to the updating RPM getting stuck performing a `dbus-send` operation
4. Terminate hung `dbus-send` process (either manually or allowing update to time out) or issue a `systemctl restart debus`
5. All DBUS-related services begin to fail (reboot hangs on various services' DBUS-signaling)
6. Login to rebooted system, tools depending on DBUS (e.g., timedatectl) receive connection refused errors from DBUS

Actual results:

All DBUS-enabled services receive connection refused errors when attempting to communicate with DBUS

Expected results:

All DBUS-enabled services function normally

Additional info:

First observed when running an `oscap --remediate` (see previous BugZilla https://bugzilla.redhat.com/show_bug.cgi?id=1566089). As part of a larger remediation profile, a `yum update` is executed. When oscap attempts its remediation activities, the first DBUS-enabled service it attempts to remediate tends to be autofs. This step hangs. After much debugging, ultimately traced it back to the deranged state of DBUS post `yum update`.

We'd also previously encountered - but had not isolated the cause of - the issue last summer when we were upgrading from 7.3 to 7.4. At the time, our users' RHEL 7 adoption-rates were exceedingly low, so we only ever noticed the occasional "quark 39" errors while deriving customized AMIs from the Red Hat published ones. The errors had not been fatal to that process and did not show up in instances launched from those AMIs, so it was a low-order priority to investigate (basically an "academic issue" or "things that make you go 'hmm...'). In retrospect the underlying issue appears to be the same as what caused us to open this ticket: we can provoke the same error by attempting to create a 7.5 AMI when deriving from an earlier 7.x release.

In looking through BugZilla, seems there's a number of people who've encountered DBUS issues when doing OS point-release upgrades (7.x to 7.x'). Most are either in some pending or a WONTFIX state but saw none with a fix-pending state.

Thus far, our only recourse has been to tell our users:
- Re-deploy your application onto a 7.5 AMI
or
- Install the yum versionlock plugin and version-lock your current DBUS installation and any packages that depend on it (thus far, that requires excluding the dbus, dbus-libs, teamd, libteam and wpa_supplicant packages in yum.conf and locking openscap-1.2.14 and openscap-scanner-1.2.14 in /etc/yum/pluginconf.d/versionlock.list

The former is sub-ideal for our users who haven't fully-automated their application deployments.

The latter is sub-ideal from both a "complexity for non RHEL-guru users" and a leaving critical system components unpatched perspective (our IA teams take a very dim view of this, obviously!).

Comment 2 Tomas Pelka 2018-04-18 12:35:39 UTC
I would say this is basically clone of bz1550582

It is not expected updating and therefore restarting dbus from e.g. graphical session (and it is not clear from this bz if you did the restart from multi-user or graphical target) would work properly. 

But in this case it seems dbus is not running or not working after update, correct. What

yum reinstall dbus\* 

do (in multi-user taget), can you seem something from 

systemctl status dbus 

or in journal?

Thanks
-Tom

Comment 3 Thomas Jones 2018-04-18 13:51:06 UTC
We don't do graphical in AWS - AWS doesn't offer (virtual) console access.

Due to lack of out of band  access (console — graphical, serial or otherwise), all update actions happen at run-level 3 (or whatever the systemd equivalent is). In general, updating this way works. It's really only when there's a (mercifully-infrequent) DBUS update that we tend to have these issues.

Typically what we see after a generic `yum -y update` is that DBUS becomes deranged and there's no communicating with it. The deranged state persists across reboots. While we haven't explicitly tried doing a `yum reinstall dbus\*` once things have reached this state, my suspicion is that it will fail.

As to where we're observing logged symptoms/outputs: we're seeing them in the journals and the legacy log-files. Doing a `timedatectl status` is just a shorthand way of checking "is DBUS pissed off."

Comment 4 David King 2018-04-24 08:47:07 UTC
This is caused by dbus-daemon being rebased to a new version (dbus-1.10.24-7.el7) in RHEL 7.5, and the locations of several dbus tools being migrated. dbus-send moved from /bin to /usr/bin, and dbus-daemon-launch-helper moved from under libdir to under libexecdir.

The location of dbus-daemon-launch-helper is described in a configuration file (/usr/share/dbus-1/system.conf), but until dbus-daemon is updated, any running dbus-daemon instance (for the system bus, specifically) will be unaware of the new location, and will fail when trying to launch a system service. The running (old) dbus-daemon will fail to read the system.conf configuration file (because the canonical location changed, from under /etc to under /usr), and restarting dbus-daemon will disconnect all currently-connected services, which will not reconnect unless they are restarted afer dbus.

If a scriptlet in a package called dbus-send and triggered a service activation, the activation would likely fail (because the helper binary would not be found). I could not find any uses of dbus-send in the vim package (nor vim-common subpackage), but it may be another package that is calling dbus-send, or it may be called as a side effect of the scriptlets in vim-common.

A workaround for this problem would be to create a symlink between the old and new locations of dbus-daemon-launch-helper, so that the running dbus-daemon for the system bus can still call out to it. An alternative would be to update on the dbus packages, and then to restart the system immediately, before updating any other packages, although this may not be feasible if the shutdown process triggers any service activations.

Comment 5 Thomas Jones 2018-04-24 12:04:56 UTC
Cool. Thanks for the detailed info. I'll try setting up a symlink as part of the upgrade process to see if that helps us.

I'm reasonably certain that the vim landmark is simply "last thing yum actually completed" rather than being the stuck process.

I'm at a different work-site for the next day or so: depending on after-hours time-constraints, I may not have a yay/neigh update till Thursday.

Comment 6 Thomas Jones 2018-05-01 11:43:48 UTC
About to try some of the suggestions you made. Tested our problem across several AWS regions and found that simply doing a `yum update dbus` was sufficient to break a system (and it's reproducible 100% of the time with thatt). So, it's definitely in that subsystem that we're experiencing problems.

Comment 7 Thomas Jones 2018-05-01 12:06:03 UTC
Alright, I'm on an instance launched from a 7.4 AMI.

In looking at the current "/bin/dbus-send", `readlnk` is telling me that, even though the RPM's `-ql` output says "/bin/dbus-send", the true location is already "/usr/bus/dbus-send"

I'm currently a little unclear how pre-creating /usr/libexec/dbus-1/dbus-daemon-launch-helper as a symlink is going to help me? Won't updating the RPM blow away that symlink, leaving me in the same place I was before? Or, are you saying I should do something more like `mv /lib64/dbus-1/dbus-daemon-launch-helper /usr/libexec/dbus-1/dbus-daemon-launch-helper && ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper /lib64/dbus-1/dbus-daemon-launch-helper`

I'd have a similar question for the system.conf file, but the new RPM appears to have both an "/etc/dbus-1/system.conf" file (like the 7.4 packaging) as well as a "/usr/share/dbus-1/system.conf". That said, the file in /etc is only 833bytes while the one in /usr/share is 4362bytes.

Since this is a test-rig, I can blow it up, so I'll probably try out any permutations I can think of. However, it might prove helpful if you were able to provide further, detailed instructions.

Thanks and advance.

Comment 8 Thomas Jones 2018-05-01 12:34:38 UTC
Launched instance from AMI. Placed the following into the instance's UserData:

```
#!/bin/bash

if [[ -d /usr/libexec/dbus-1 ]]
then
   echo "Directory already exists"
else
  printf "Creating new directory"
  install -d -m 000755 /usr/libexec/dbus-1 -o root -g root && echo Success || echo FAILED

   printf "Updating SEL labels... "
   chcon --reference /lib64/dbus-1 /usr/libexec/dbus-1 && echo Success || echo FAILED
fi

printf "Moving dbus-daemon-launch-helper... "
mv /lib64/dbus-1/dbus-daemon-launch-helper /usr/libexec/dbus-1/dbus-daemon-launch-helper && echo Success || echo FAILED

printf "Creating symlink... "
ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper mv /lib64/dbus-1/dbus-daemon-launch-helper && echo Success || echo FAILED

sleep 10

yum update -y dbus && init 6
```

After rebooting, system was in the same broken state as it gets to without attempting to fix paths.

Comment 9 Thomas Jones 2018-05-03 16:38:39 UTC
Any other tests to run or fixes to try?

Comment 10 dwilloug 2018-05-11 02:22:42 UTC
Please share the AMI ID that you are using to test.  I've tried this in a local VM and in AWS but haven't been able to reproduce.

Comment 11 Thomas Jones 2018-05-11 13:06:56 UTC
Any AMI returned by:

https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Images:visibility=public-images;search=spel-minimal-rhel-7.4-hvm;sort=desc:creationDate

Exhibits the issue. The most recent of the above would be ami-05aa42022a79b86e7

We had no projects that were using 7.3. No DBUS issues were reported when going from 7.3 to 7.4. That said, as part of testing for this BugZilla submission, we verified that the issue is triggered when upgrading from 7.3 directly to 7.5. At this point, all but one of the 7.3 AMIs (ami-28b5b23e) have aged off.

Comment 12 Loren Gordon 2018-05-11 13:09:37 UTC
The most recent AMI from that query is technically RHEL 7.5 and does not reproduce the issue in question. Use any AMI from March or earlier, such as `ami-0338b428e333e97eb`.

Comment 13 Thomas Jones 2018-05-11 18:22:38 UTC
(In reply to Loren Gordon from comment #12)
> The most recent AMI from that query is technically RHEL 7.5 and does not
> reproduce the issue in question. Use any AMI from March or earlier, such as
> `ami-0338b428e333e97eb`.

These are public AMIs so shouldn't need to share them to you. Lemme know if you run into access issues with the AMI ID Loren noted.

Comment 14 Thomas Jones 2018-05-11 18:22:49 UTC
(In reply to Loren Gordon from comment #12)
> The most recent AMI from that query is technically RHEL 7.5 and does not
> reproduce the issue in question. Use any AMI from March or earlier, such as
> `ami-0338b428e333e97eb`.

These are public AMIs so shouldn't need to share them to you. Lemme know if you run into access issues with the AMI ID Loren noted.

Comment 15 Tomas Pelka 2018-05-30 08:39:26 UTC
(In reply to Thomas Jones from comment #14)
> (In reply to Loren Gordon from comment #12)
> > The most recent AMI from that query is technically RHEL 7.5 and does not
> > reproduce the issue in question. Use any AMI from March or earlier, such as
> > `ami-0338b428e333e97eb`.
> 
> These are public AMIs so shouldn't need to share them to you. Lemme know if
> you run into access issues with the AMI ID Loren noted.

Any news?

Thanks
-Tom

Comment 16 Thomas Jones 2018-06-01 12:12:35 UTC
(In reply to Tomas Pelka from comment #15)
> Any news?
> 

From us? No. We're currently waiting on you guys to see if the AMI listed by @loren was locatable and useable and if Red Hat was had had a chance to use it to reproduce the problem and start diagnostics.

Was actually checking the case to see if I needed to Bueller it as I'd not received any case update notifications.

If your diagnosticians prefer to do work in different regions, we can provide AMI-IDs equivalent to `ami-0338b428e333e97eb` in us-east-2, us-west-1, us-west-2 and even us-gov-west-1. They're all created by the same processes.

Comment 17 Reid Wahl 2018-06-03 04:53:41 UTC
@Thomas: There might be a syntax error in your UserData.

printf "Creating symlink... "
ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper mv /lib64/dbus-1/dbus-daemon-launch-helper && echo Success || echo FAILED


Three arguments to `ln`?

Comment 18 Thomas Jones 2018-06-04 14:05:57 UTC
Thank you for that catch. Copy-paystah error. Fixed to:

```
mv /lib64/dbus-1/dbus-daemon-launch-helper \
   /usr/libexec/dbus-1/dbus-daemon-launch-helper && \
ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper \
   /lib64/dbus-1/dbus-daemon-launch-helper
```

Either way: does not seem to change the defective behavior encountered when the dbus RPM is updated.

Comment 19 Thomas Jones 2018-06-04 14:28:28 UTC
Probably worth noting that all of the UserData actions return success. However, after the `yum update` (and associated `init 6`) runs, the /lib64/dbus-1/ directory wholly disappears. When the system (eventually) returns from the `init 6`, DBUS is in its usual, "unhappy" state.

Comment 25 Satoshi Tajima 2018-08-16 09:06:06 UTC
I've run into the same problem and addressed it.

I noticed that dbus.socket had changed from /var/run/dbus/system_bus_socket to /run/dbus/system_bus_socket.
In CentOS 7.X, originally /var/run is symlink point to /run directory.
So this change wouldn't be a problem in most cases.

But in my case, unintendedly, /var/run wasn't a symbolic link.
So many processes had failed to handle the socket.

After fix /var/run to a symlink, the upgrade had been succeeded.

Comment 26 Satoshi Tajima 2018-08-16 09:10:20 UTC
I made a mistake in writing...
s/CentOS/RHEL/

Comment 28 David King 2018-10-12 11:30:46 UTC
(In reply to Satoshi Tajima from comment #25)
> But in my case, unintendedly, /var/run wasn't a symbolic link.
> So many processes had failed to handle the socket.
> 
> After fix /var/run to a symlink, the upgrade had been succeeded.

If this is the case, it is not easy to fix inside the dbus package, and arguably the wrong place, as the filesystem package owns the /var/run symlink. The easiest thing at this point is to mention in the release notes that dbus upgrades require that /var/run is a symlink to /run (which is the default case in all versions of RHEL7, as far as I am aware).

Comment 34 Matthias Clasen 2018-11-05 16:59:25 UTC
Moving this bug to the filesystem package, in the light of comment 28

Comment 35 Ondrej Vasik 2018-11-06 13:50:04 UTC
While I see reason behind "switch to filesystem", I don't see any way how to fix it from filesystem package - and it is not an issue in filesystem package originally - it was change in behaviour caused by dbus rebase (and fixable through symlink in the old locations as mentioned in comment #4).

Package filesystem installs /var/run as a symlink to ../run . I can imagine this to be an issue on systems that were RHEL 6 upgraded to RHEL 7 - as /var/run was not symlink there.

However, original issue is not in filesystem.

Still it doesn't really matter, unless we want to ship symlinks in old locations with new dbus package. I think this is something what has to be documented as known issue anyway.

Comment 36 Ondrej Vasik 2018-11-06 13:56:06 UTC
Marking Documentation, as I don't plan to do any changes in filesystem package and I can't fix it there. Only fix would be to ship symlinks to helper binaries in the old locations, but it wouldn't help for the existing issues. It will only help for the new updates.

Comment 37 Ondrej Vasik 2018-11-07 07:33:53 UTC
Anyway, switching back to dbus, filesystem package update can't fix this issue - and proper fix would be to ship symlinks in the old locations of the binaries to keep backward compatibility...

Comment 39 masanari iida 2018-11-21 04:12:12 UTC
I logged yet another "dbus update break abrt-dbus" as BZ#1650062
I think you guys are not notice the case, because I select it as 
"abrt" as component. 
Thanks

Comment 42 Vladimir Benes 2018-12-17 16:02:26 UTC
Qa_ack+ for providing symlinks to original locations.

Comment 45 David King 2018-12-17 16:35:53 UTC
*** Bug 1550582 has been marked as a duplicate of this bug. ***