Bug 1569122 - Undercloud installation fails with "Execution of '/bin/getcert list' returned 1: Error org.freedesktop.DBus.Error.TimedOut"
Summary: Undercloud installation fails with "Execution of '/bin/getcert list' returned...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: dbus
Version: 7.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: pre-dev-freeze
: ---
Assignee: David King
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-18 16:01 UTC by Raoul Scarazzini
Modified: 2019-02-01 18:27 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-01 18:27:37 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1772807 None None None 2018-07-30 12:11:20 UTC

Description Raoul Scarazzini 2018-04-18 16:01:19 UTC
Description of problem:

Latest RHOS-11 Puddle fails during the overcloud installation with this message:

2018-04-17 12:59:23 | 2018-04-17 12:59:23,995 INFO: Error: Failed to apply catalog: Execution of '/bin/getcert list' returned 1: Error org.freedesktop.DBus.Error.TimedOut
2018-04-17 12:59:25 | 2018-04-17 12:59:25,752 INFO: + rc=1
2018-04-17 12:59:25 | 2018-04-17 12:59:25,753 INFO: + set -e
2018-04-17 12:59:25 | 2018-04-17 12:59:25,753 INFO: + echo 'puppet apply exited with exit code 1'
2018-04-17 12:59:25 | 2018-04-17 12:59:25,753 INFO: puppet apply exited with exit code 1
2018-04-17 12:59:25 | 2018-04-17 12:59:25,753 INFO: + '[' 1 '!=' 2 -a 1 '!=' 0 ']'
2018-04-17 12:59:25 | 2018-04-17 12:59:25,753 INFO: + exit 1
2018-04-17 12:59:25 | 2018-04-17 12:59:25,753 INFO: [2018-04-17 12:59:25,753] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 1]
2018-04-17 12:59:25 | 2018-04-17 12:59:25,753 INFO: 
2018-04-17 12:59:25 | 2018-04-17 12:59:25,753 INFO: [2018-04-17 12:59:25,753] (os-refresh-config) [ERROR] Aborting...
2018-04-17 12:59:25 | 2018-04-17 12:59:25,760 ERROR:

Version-Release number of selected component (if applicable):

Puddle 2018-04-02.3

How reproducible:

Always.

Steps to Reproduce:

Just run openstack undercloud install

Comment 2 Alex Schultz 2018-04-18 19:55:04 UTC
Cert monger is failing to start.

2018-04-18 05:43:32,435 INFO: Error: Systemd start for certmonger failed!
2018-04-18 05:43:32,435 INFO: journalctl log for certmonger:
2018-04-18 05:43:32,435 INFO: -- Logs begin at Wed 2018-04-18 01:23:05 EDT, end at Wed 2018-04-18 05:43:32 EDT. --
2018-04-18 05:43:32,435 INFO: Apr 18 05:43:32 hac-01.ha.lab.eng.bos.redhat.com systemd[1]: Starting Certificate monitoring and PKI enrollment...
2018-04-18 05:43:32,436 INFO: Apr 18 05:43:32 hac-01.ha.lab.eng.bos.redhat.com certmonger[67448]: 2018-04-18 05:43:32 [67448] Unable to set well-known bus name "org.fedorahosted.certmonger": Connection ":1.435" is
 not allowed to own the service "org.fedorahosted.certmonger" due to security policies in the configuration file(-1).
2018-04-18 05:43:32,436 INFO: Apr 18 05:43:32 hac-01.ha.lab.eng.bos.redhat.com certmonger[67448]: Error connecting to D-Bus.
2018-04-18 05:43:32,436 INFO: Apr 18 05:43:32 hac-01.ha.lab.eng.bos.redhat.com systemd[1]: certmonger.service: main process exited, code=exited, status=1/FAILURE
2018-04-18 05:43:32,436 INFO: Apr 18 05:43:32 hac-01.ha.lab.eng.bos.redhat.com systemd[1]: Failed to start Certificate monitoring and PKI enrollment.
2018-04-18 05:43:32,436 INFO: Apr 18 05:43:32 hac-01.ha.lab.eng.bos.redhat.com systemd[1]: Unit certmonger.service entered failed state.
2018-04-18 05:43:32,436 INFO: Apr 18 05:43:32 hac-01.ha.lab.eng.bos.redhat.com systemd[1]: certmonger.service failed.
2018-04-18 05:43:32,436 INFO: 
2018-04-18 05:43:32,436 INFO: Error: /Stage[main]/Certmonger/Service[certmonger]/ensure: change from stopped to running failed: Systemd start for certmonger failed!

Comment 3 Michele Baldessari 2018-05-08 11:55:10 UTC
Seems to me that this is because we update dbus without doing a reboot:
Apr 18 05:31:55 Updated: 1:dbus-1.10.24-7.el7.x86_64

See https://bugzilla.redhat.com/show_bug.cgi?id=1531506 and https://bugzilla.redhat.com/show_bug.cgi?id=1519206 and https://bugzilla.redhat.com/show_bug.cgi?id=1531506

Comment 4 Harry Rybacki 2018-05-11 19:05:50 UTC
Raoul, does this affect any other versions of OSP? We will not be able to get a fix in before OSP11 retires (May-18-2018).

Comment 5 Alex Schultz 2018-05-16 15:15:16 UTC
It affects all versions. It's a 7.4 -> 7.5 upgrade problem

Comment 6 Raoul Scarazzini 2018-05-16 17:32:32 UTC
Just for completeness, one thing that we need to consider is that the error I'm getting in OSP11 happens if I use RHEL 7.4 on the undercloud and try to do the undercloud installation, so no upgrade involved in this case, just old packages.

Comment 7 Michele Baldessari 2018-05-16 17:48:49 UTC
(In reply to Raoul Scarazzini from comment #6)
> Just for completeness, one thing that we need to consider is that the error
> I'm getting in OSP11 happens if I use RHEL 7.4 on the undercloud and try to
> do the undercloud installation, so no upgrade involved in this case, just
> old packages.
You are updating dbus packages (from your sosreport):
$ grep dbus var/log/yum.log
Apr 18 05:31:39 Updated: 1:dbus-libs-1.10.24-7.el7.x86_64
Apr 18 05:31:55 Updated: 1:dbus-1.10.24-7.el7.x86_64
Apr 18 05:38:16 Updated: python-slip-dbus-0.4.0-4.el7.noarch

So likely you start off with a dbus version that is older in image/iso/whatever you use to create your undercloud and then puppet will pull in a newer dbus because the repos you have, offer a new one.

Comment 8 Raoul Scarazzini 2018-05-17 07:27:38 UTC
Oh, I see, you're 100% right. I was confirming that my procedure started with a 7.4 undercloud. As you're saying, this means in conclusion that since in my job I don't update any package, when undercloud installs (after using rhos-release to add OSP 11 repository) it tries to update also the dbus, causing the problem.
Harry, we might want to involve dbus people in here, what do you think?

Comment 9 Jakub Hrozek 2018-06-08 08:01:02 UTC
I think this error message might be important:
2018-04-18 05:43:32,436 INFO: Apr 18 05:43:32 hac-01.ha.lab.eng.bos.redhat.com certmonger[67448]: 2018-04-18 05:43:32 [67448] Unable to set well-known bus name "org.fedorahosted.certmonger": Connection ":1.435" is
 not allowed to own the service "org.fedorahosted.certmonger" due to security policies in the configuration file(-1).

And looking at /etc/dbus-1/system.d/certmonger.conf the policy says:

  <!-- Only root can own and communicate with the service -->
  <policy user="root">
    <allow own="org.fedorahosted.certmonger"/>
...etc

so the error reads to me that either certmonger is not being started as root or perhaps SELinux is kicking in.

Comment 11 Juan Antonio Osorio 2018-06-14 14:01:32 UTC
We reproduced the issue and we can say that it's not SELinux as it's set to Permissive. Also, puppet is ran as root, so it should be starting certmonger as root as well.

Comment 12 Juan Antonio Osorio 2018-06-14 14:07:57 UTC
It seems to me that the best workaround is to restart dbus and then run the undercloud install again. That worked for us.

Comment 15 Harry Rybacki 2019-02-01 18:27:37 UTC
After team re-review and discussion, this is no longer an issue. Closing as NOTABUG. Please feel free to re-open this if you find the issue still persists.


Note You need to log in before you can comment on or make changes to this bug.