Old puppet manifests were reapplied during the update process when they should not have been. This had the potential to take the cluster services down in the Overcloud. The agent on the overcloud nodes caused the reapplication of the old Puppet manifests because their state was saved in tmpfs mounted directory under /var/run/. This directory is lost on reboot. This update moves the directory from /var/run/heat-config/deployed to /var/lib/heat-config/deployed, which allows the deployed state to persist across reboots.
Update OSP-D from 7.0 to 7.1 Failed : systemd stop functioning on the controller node (Failed to get D-Bus connection)
Environment :
--------------
Controller:
-------------
dbus-1.6.12-11.el7.x86_64
dbus-glib-0.100-7.el7.x86_64
dbus-python-1.1.1-9.el7.x86_64
dbus-libs-1.6.12-11.el7.x86_64
python-slip-dbus-0.4.0-2.el7.noarch
Undercloud:
------------
instack-undercloud-2.1.2-29.el7ost.noarch
instack-0.0.7-1.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch
openstack-heat-api-2015.1.0-4.el7ost.noarch
openstack-heat-api-cfn-2015.1.1-6.el7ost.noarch
heat-cfntools-1.2.8-2.el7.noarch
openstack-heat-common-2015.1.0-4.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.1-6.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch
python-heatclient-0.6.0-1.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch
openstack-heat-common-2015.1.1-6.el7ost.noarch
openstack-heat-api-2015.1.1-6.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch
openstack-heat-engine-2015.1.1-6.el7ost.noarch
openstack-heat-engine-2015.1.0-4.el7ost.noarch
Description :
-------------
It happened after applying this patch : https://review.openstack.org/#/c/239368/ to workaround :https://bugzilla.redhat.com/show_bug.cgi?id=1274859
and then attempted to update ospd UC+OC from 7.0 to 7.1
Steps:
-------
(1) Install Undercloud and Overcloud 7.0 (with 7.0 Images)
(2) Update the undercloud to 7.1 ( using rhos-release )
(3) make sure you have 7.1 repos on the overcloud nodes
(4) attempt to run the overcloud update command :
(More details: http://etherpad.corp.redhat.com/update-ospd-7-0-to-7-1 )
openstack overcloud update stack overcloud -i --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /home/stack/update.yaml
Results:
---------
(1)It looks like during the yum update that was running on the controller - one package failed to update :
59/363 \nFailed to get D-Bus connect
/run/systemd/private: No such file or directory\nwarning: %post(glusterfs-3.7.1-16.el7.x86_64) scriptlet failed, exit status 1\n
(2) then during the update systemctl stopped functioning on the controller machie :
[root@overcloud-controller-0 ~]# systemctl
Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: No such file or directory
(3) Overcloud 'Update failed'
----------------------------------------------------------
[root@overcloud-controller-0 ~]# ps auxf|grep systemd
root 1 0.3 0.0 51260 2340 ? Ss Oct22 28:11 /usr/lib/systemd/systemd --system --deserialize 27
root 346 0.2 0.5 80496 20016 ? Ss Oct22 20:04 /usr/lib/systemd/systemd-journald
root 437 0.0 0.0 0 0 ? Zs Oct22 3:17 [systemd-logind] <defunct>
dbus 438 0.1 0.0 100492 2024 ? Ssl Oct22 7:06 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root 3444 0.0 0.0 112640 928 pts/0 S+ 15:43 0:00 \_ grep --color=auto systemd
/var/log/messages (from controller)
------------------------------------
\n tzdata.noarch 0:2015g-1.el7 \n util-linux.x86_64 0:2.23.2-22.el7_1.1
\n\nComplete!\nyum return code: 0\nStarting cluster node\nStarting Cluster...\nRedirecting to /bin/systemctl start corosync.service\nFailed to get D-Bu
to socket /run/systemd/private: No such file or directory\n\nERROR overcloud-controller-0 failed to join cluster in 360 seconds\n", "deploy_stderr": "Non-fata
m package glusterfs-3.7.1-16.el7.x86_64\nNon-fatal POSTUN scriptlet failure in rpm package glusterfs-3.6.0.29-2.el7.x86_64\nError: unable to start corosync\nE
unning on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not current
cluster is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nErr
ning on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not currently
uster is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nError
ng on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not currently r
ter is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nError:
on this no
I think systemd got into a broken state on the controller node:
[root@overcloud-controller-0 ~]# systemctl
Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: No such file or directory
Because systemctl doesn't work also any services can't be started.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2015:2651