Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1568856 - DBUS Breaks on OS Point-Release Upgrade
DBUS Breaks on OS Point-Release Upgrade
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: dbus (Show other bugs)
7.5
x86_64 Unspecified
high Severity high
: rc
: ---
Assigned To: David King
Desktop QE
: PrioBumpGSS, Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-04-18 06:38 EDT by Thomas Jones
Modified: 2018-10-26 22:28 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Release Note
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 1566089 None CLOSED oscap --remediate hangs at autofs 2018-11-02 11:03 EDT

  None (edit)
Description Thomas Jones 2018-04-18 06:38:45 EDT
Description of problem:

When upgrading from 7.x to 7.5, DBUS irrecoverably breaks

Version-Release number of selected component (if applicable):




How reproducible:

100% reproducible

Steps to Reproduce:
1. Launch a RedHat-publish 7.4 (or earlier 7.x releas) AMI
2. Execute a `yum updat`
3. Update hangs - usually around "vim-common" - due to the updating RPM getting stuck performing a `dbus-send` operation
4. Terminate hung `dbus-send` process (either manually or allowing update to time out) or issue a `systemctl restart debus`
5. All DBUS-related services begin to fail (reboot hangs on various services' DBUS-signaling)
6. Login to rebooted system, tools depending on DBUS (e.g., timedatectl) receive connection refused errors from DBUS

Actual results:

All DBUS-enabled services receive connection refused errors when attempting to communicate with DBUS

Expected results:

All DBUS-enabled services function normally

Additional info:

First observed when running an `oscap --remediate` (see previous BugZilla https://bugzilla.redhat.com/show_bug.cgi?id=1566089). As part of a larger remediation profile, a `yum update` is executed. When oscap attempts its remediation activities, the first DBUS-enabled service it attempts to remediate tends to be autofs. This step hangs. After much debugging, ultimately traced it back to the deranged state of DBUS post `yum update`.

We'd also previously encountered - but had not isolated the cause of - the issue last summer when we were upgrading from 7.3 to 7.4. At the time, our users' RHEL 7 adoption-rates were exceedingly low, so we only ever noticed the occasional "quark 39" errors while deriving customized AMIs from the Red Hat published ones. The errors had not been fatal to that process and did not show up in instances launched from those AMIs, so it was a low-order priority to investigate (basically an "academic issue" or "things that make you go 'hmm...'). In retrospect the underlying issue appears to be the same as what caused us to open this ticket: we can provoke the same error by attempting to create a 7.5 AMI when deriving from an earlier 7.x release.

In looking through BugZilla, seems there's a number of people who've encountered DBUS issues when doing OS point-release upgrades (7.x to 7.x'). Most are either in some pending or a WONTFIX state but saw none with a fix-pending state.

Thus far, our only recourse has been to tell our users:
- Re-deploy your application onto a 7.5 AMI
or
- Install the yum versionlock plugin and version-lock your current DBUS installation and any packages that depend on it (thus far, that requires excluding the dbus, dbus-libs, teamd, libteam and wpa_supplicant packages in yum.conf and locking openscap-1.2.14 and openscap-scanner-1.2.14 in /etc/yum/pluginconf.d/versionlock.list

The former is sub-ideal for our users who haven't fully-automated their application deployments.

The latter is sub-ideal from both a "complexity for non RHEL-guru users" and a leaving critical system components unpatched perspective (our IA teams take a very dim view of this, obviously!).
Comment 2 Tomas Pelka 2018-04-18 08:35:39 EDT
I would say this is basically clone of bz1550582

It is not expected updating and therefore restarting dbus from e.g. graphical session (and it is not clear from this bz if you did the restart from multi-user or graphical target) would work properly. 

But in this case it seems dbus is not running or not working after update, correct. What

yum reinstall dbus\* 

do (in multi-user taget), can you seem something from 

systemctl status dbus 

or in journal?

Thanks
-Tom
Comment 3 Thomas Jones 2018-04-18 09:51:06 EDT
We don't do graphical in AWS - AWS doesn't offer (virtual) console access.

Due to lack of out of band  access (console — graphical, serial or otherwise), all update actions happen at run-level 3 (or whatever the systemd equivalent is). In general, updating this way works. It's really only when there's a (mercifully-infrequent) DBUS update that we tend to have these issues.

Typically what we see after a generic `yum -y update` is that DBUS becomes deranged and there's no communicating with it. The deranged state persists across reboots. While we haven't explicitly tried doing a `yum reinstall dbus\*` once things have reached this state, my suspicion is that it will fail.

As to where we're observing logged symptoms/outputs: we're seeing them in the journals and the legacy log-files. Doing a `timedatectl status` is just a shorthand way of checking "is DBUS pissed off."
Comment 4 David King 2018-04-24 04:47:07 EDT
This is caused by dbus-daemon being rebased to a new version (dbus-1.10.24-7.el7) in RHEL 7.5, and the locations of several dbus tools being migrated. dbus-send moved from /bin to /usr/bin, and dbus-daemon-launch-helper moved from under libdir to under libexecdir.

The location of dbus-daemon-launch-helper is described in a configuration file (/usr/share/dbus-1/system.conf), but until dbus-daemon is updated, any running dbus-daemon instance (for the system bus, specifically) will be unaware of the new location, and will fail when trying to launch a system service. The running (old) dbus-daemon will fail to read the system.conf configuration file (because the canonical location changed, from under /etc to under /usr), and restarting dbus-daemon will disconnect all currently-connected services, which will not reconnect unless they are restarted afer dbus.

If a scriptlet in a package called dbus-send and triggered a service activation, the activation would likely fail (because the helper binary would not be found). I could not find any uses of dbus-send in the vim package (nor vim-common subpackage), but it may be another package that is calling dbus-send, or it may be called as a side effect of the scriptlets in vim-common.

A workaround for this problem would be to create a symlink between the old and new locations of dbus-daemon-launch-helper, so that the running dbus-daemon for the system bus can still call out to it. An alternative would be to update on the dbus packages, and then to restart the system immediately, before updating any other packages, although this may not be feasible if the shutdown process triggers any service activations.
Comment 5 Thomas Jones 2018-04-24 08:04:56 EDT
Cool. Thanks for the detailed info. I'll try setting up a symlink as part of the upgrade process to see if that helps us.

I'm reasonably certain that the vim landmark is simply "last thing yum actually completed" rather than being the stuck process.

I'm at a different work-site for the next day or so: depending on after-hours time-constraints, I may not have a yay/neigh update till Thursday.
Comment 6 Thomas Jones 2018-05-01 07:43:48 EDT
About to try some of the suggestions you made. Tested our problem across several AWS regions and found that simply doing a `yum update dbus` was sufficient to break a system (and it's reproducible 100% of the time with thatt). So, it's definitely in that subsystem that we're experiencing problems.
Comment 7 Thomas Jones 2018-05-01 08:06:03 EDT
Alright, I'm on an instance launched from a 7.4 AMI.

In looking at the current "/bin/dbus-send", `readlnk` is telling me that, even though the RPM's `-ql` output says "/bin/dbus-send", the true location is already "/usr/bus/dbus-send"

I'm currently a little unclear how pre-creating /usr/libexec/dbus-1/dbus-daemon-launch-helper as a symlink is going to help me? Won't updating the RPM blow away that symlink, leaving me in the same place I was before? Or, are you saying I should do something more like `mv /lib64/dbus-1/dbus-daemon-launch-helper /usr/libexec/dbus-1/dbus-daemon-launch-helper && ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper /lib64/dbus-1/dbus-daemon-launch-helper`

I'd have a similar question for the system.conf file, but the new RPM appears to have both an "/etc/dbus-1/system.conf" file (like the 7.4 packaging) as well as a "/usr/share/dbus-1/system.conf". That said, the file in /etc is only 833bytes while the one in /usr/share is 4362bytes.

Since this is a test-rig, I can blow it up, so I'll probably try out any permutations I can think of. However, it might prove helpful if you were able to provide further, detailed instructions.

Thanks and advance.
Comment 8 Thomas Jones 2018-05-01 08:34:38 EDT
Launched instance from AMI. Placed the following into the instance's UserData:

```
#!/bin/bash

if [[ -d /usr/libexec/dbus-1 ]]
then
   echo "Directory already exists"
else
  printf "Creating new directory"
  install -d -m 000755 /usr/libexec/dbus-1 -o root -g root && echo Success || echo FAILED

   printf "Updating SEL labels... "
   chcon --reference /lib64/dbus-1 /usr/libexec/dbus-1 && echo Success || echo FAILED
fi

printf "Moving dbus-daemon-launch-helper... "
mv /lib64/dbus-1/dbus-daemon-launch-helper /usr/libexec/dbus-1/dbus-daemon-launch-helper && echo Success || echo FAILED

printf "Creating symlink... "
ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper mv /lib64/dbus-1/dbus-daemon-launch-helper && echo Success || echo FAILED

sleep 10

yum update -y dbus && init 6
```

After rebooting, system was in the same broken state as it gets to without attempting to fix paths.
Comment 9 Thomas Jones 2018-05-03 12:38:39 EDT
Any other tests to run or fixes to try?
Comment 10 dwilloug 2018-05-10 22:22:42 EDT
Please share the AMI ID that you are using to test.  I've tried this in a local VM and in AWS but haven't been able to reproduce.
Comment 11 Thomas Jones 2018-05-11 09:06:56 EDT
Any AMI returned by:

https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Images:visibility=public-images;search=spel-minimal-rhel-7.4-hvm;sort=desc:creationDate

Exhibits the issue. The most recent of the above would be ami-05aa42022a79b86e7

We had no projects that were using 7.3. No DBUS issues were reported when going from 7.3 to 7.4. That said, as part of testing for this BugZilla submission, we verified that the issue is triggered when upgrading from 7.3 directly to 7.5. At this point, all but one of the 7.3 AMIs (ami-28b5b23e) have aged off.
Comment 12 Loren Gordon 2018-05-11 09:09:37 EDT
The most recent AMI from that query is technically RHEL 7.5 and does not reproduce the issue in question. Use any AMI from March or earlier, such as `ami-0338b428e333e97eb`.
Comment 13 Thomas Jones 2018-05-11 14:22:38 EDT
(In reply to Loren Gordon from comment #12)
> The most recent AMI from that query is technically RHEL 7.5 and does not
> reproduce the issue in question. Use any AMI from March or earlier, such as
> `ami-0338b428e333e97eb`.

These are public AMIs so shouldn't need to share them to you. Lemme know if you run into access issues with the AMI ID Loren noted.
Comment 14 Thomas Jones 2018-05-11 14:22:49 EDT
(In reply to Loren Gordon from comment #12)
> The most recent AMI from that query is technically RHEL 7.5 and does not
> reproduce the issue in question. Use any AMI from March or earlier, such as
> `ami-0338b428e333e97eb`.

These are public AMIs so shouldn't need to share them to you. Lemme know if you run into access issues with the AMI ID Loren noted.
Comment 15 Tomas Pelka 2018-05-30 04:39:26 EDT
(In reply to Thomas Jones from comment #14)
> (In reply to Loren Gordon from comment #12)
> > The most recent AMI from that query is technically RHEL 7.5 and does not
> > reproduce the issue in question. Use any AMI from March or earlier, such as
> > `ami-0338b428e333e97eb`.
> 
> These are public AMIs so shouldn't need to share them to you. Lemme know if
> you run into access issues with the AMI ID Loren noted.

Any news?

Thanks
-Tom
Comment 16 Thomas Jones 2018-06-01 08:12:35 EDT
(In reply to Tomas Pelka from comment #15)
> Any news?
> 

From us? No. We're currently waiting on you guys to see if the AMI listed by @loren was locatable and useable and if Red Hat was had had a chance to use it to reproduce the problem and start diagnostics.

Was actually checking the case to see if I needed to Bueller it as I'd not received any case update notifications.

If your diagnosticians prefer to do work in different regions, we can provide AMI-IDs equivalent to `ami-0338b428e333e97eb` in us-east-2, us-west-1, us-west-2 and even us-gov-west-1. They're all created by the same processes.
Comment 17 Reid Wahl 2018-06-03 00:53:41 EDT
@Thomas: There might be a syntax error in your UserData.

printf "Creating symlink... "
ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper mv /lib64/dbus-1/dbus-daemon-launch-helper && echo Success || echo FAILED


Three arguments to `ln`?
Comment 18 Thomas Jones 2018-06-04 10:05:57 EDT
Thank you for that catch. Copy-paystah error. Fixed to:

```
mv /lib64/dbus-1/dbus-daemon-launch-helper \
   /usr/libexec/dbus-1/dbus-daemon-launch-helper && \
ln -s /usr/libexec/dbus-1/dbus-daemon-launch-helper \
   /lib64/dbus-1/dbus-daemon-launch-helper
```

Either way: does not seem to change the defective behavior encountered when the dbus RPM is updated.
Comment 19 Thomas Jones 2018-06-04 10:28:28 EDT
Probably worth noting that all of the UserData actions return success. However, after the `yum update` (and associated `init 6`) runs, the /lib64/dbus-1/ directory wholly disappears. When the system (eventually) returns from the `init 6`, DBUS is in its usual, "unhappy" state.
Comment 25 Satoshi Tajima 2018-08-16 05:06:06 EDT
I've run into the same problem and addressed it.

I noticed that dbus.socket had changed from /var/run/dbus/system_bus_socket to /run/dbus/system_bus_socket.
In CentOS 7.X, originally /var/run is symlink point to /run directory.
So this change wouldn't be a problem in most cases.

But in my case, unintendedly, /var/run wasn't a symbolic link.
So many processes had failed to handle the socket.

After fix /var/run to a symlink, the upgrade had been succeeded.
Comment 26 Satoshi Tajima 2018-08-16 05:10:20 EDT
I made a mistake in writing...
s/CentOS/RHEL/
Comment 28 David King 2018-10-12 07:30:46 EDT
(In reply to Satoshi Tajima from comment #25)
> But in my case, unintendedly, /var/run wasn't a symbolic link.
> So many processes had failed to handle the socket.
> 
> After fix /var/run to a symlink, the upgrade had been succeeded.

If this is the case, it is not easy to fix inside the dbus package, and arguably the wrong place, as the filesystem package owns the /var/run symlink. The easiest thing at this point is to mention in the release notes that dbus upgrades require that /var/run is a symlink to /run (which is the default case in all versions of RHEL7, as far as I am aware).

Note You need to log in before you can comment on or make changes to this bug.