Bug 1275814 - Update OSP-D from 7.0 to 7.1 Failed : systemd stop functioning on the controller node (Failed to get D-Bus connection)
Update OSP-D from 7.0 to 7.1 Failed : systemd stop functioning on the control...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
7.0 (Kilo)
x86_64 Linux
high Severity high
: y2
: 7.0 (Kilo)
Assigned To: James Slagle
Alexander Chuzhoy
: TestOnly, Triaged
Depends On:
Blocks: 1272254
  Show dependency treegraph
 
Reported: 2015-10-27 16:06 EDT by Omri Hochman
Modified: 2015-12-21 11:57 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Old puppet manifests were reapplied during the update process when they should not have been. This had the potential to take the cluster services down in the Overcloud. The agent on the overcloud nodes caused the reapplication of the old Puppet manifests because their state was saved in tmpfs mounted directory under /var/run/. This directory is lost on reboot. This update moves the directory from /var/run/heat-config/deployed to /var/lib/heat-config/deployed, which allows the deployed state to persist across reboots.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-21 11:57:27 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
messages (5.08 MB, application/x-bzip)
2015-10-27 16:25 EDT, Omri Hochman
no flags Details

  None (edit)
Description Omri Hochman 2015-10-27 16:06:07 EDT
Update OSP-D from 7.0 to 7.1 Failed : systemd stop functioning on the controller node (Failed to get D-Bus connection) 

Environment :
--------------

Controller: 
-------------
dbus-1.6.12-11.el7.x86_64
dbus-glib-0.100-7.el7.x86_64
dbus-python-1.1.1-9.el7.x86_64
dbus-libs-1.6.12-11.el7.x86_64
python-slip-dbus-0.4.0-2.el7.noarch


Undercloud: 
------------
instack-undercloud-2.1.2-29.el7ost.noarch
instack-0.0.7-1.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch
openstack-heat-api-2015.1.0-4.el7ost.noarch
openstack-heat-api-cfn-2015.1.1-6.el7ost.noarch
heat-cfntools-1.2.8-2.el7.noarch
openstack-heat-common-2015.1.0-4.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.1-6.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch
python-heatclient-0.6.0-1.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch
openstack-heat-common-2015.1.1-6.el7ost.noarch
openstack-heat-api-2015.1.1-6.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch
openstack-heat-engine-2015.1.1-6.el7ost.noarch
openstack-heat-engine-2015.1.0-4.el7ost.noarch


Description : 
-------------
It happened after applying this patch : https://review.openstack.org/#/c/239368/ to workaround :https://bugzilla.redhat.com/show_bug.cgi?id=1274859  
and then attempted to update ospd UC+OC from 7.0 to 7.1

Steps:
-------
(1) Install Undercloud and Overcloud 7.0 (with 7.0 Images)
(2) Update the undercloud to 7.1 ( using rhos-release )
(3) make sure you have 7.1 repos on the overcloud nodes
(4) attempt to run the overcloud update command :  

(More details: http://etherpad.corp.redhat.com/update-ospd-7-0-to-7-1  )

openstack overcloud update stack overcloud -i --templates  -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /home/stack/update.yaml 

Results: 
---------

(1)It looks like during the yum update that was running on the controller - one package failed to update :

   59/363 \nFailed to get D-Bus connect
 /run/systemd/private: No such file or directory\nwarning: %post(glusterfs-3.7.1-16.el7.x86_64) scriptlet failed, exit status 1\n 


(2) then during the update systemctl stopped functioning on the controller machie :

[root@overcloud-controller-0 ~]# systemctl 
Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: No such file or directory

(3) Overcloud 'Update failed' 

----------------------------------------------------------
[root@overcloud-controller-0 ~]# ps auxf|grep systemd
root         1  0.3  0.0  51260  2340 ?        Ss   Oct22  28:11 /usr/lib/systemd/systemd --system --deserialize 27
root       346  0.2  0.5  80496 20016 ?        Ss   Oct22  20:04 /usr/lib/systemd/systemd-journald
root       437  0.0  0.0      0     0 ?        Zs   Oct22   3:17 [systemd-logind] <defunct>
dbus       438  0.1  0.0 100492  2024 ?        Ssl  Oct22   7:06 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root      3444  0.0  0.0 112640   928 pts/0    S+   15:43   0:00                      \_ grep --color=auto systemd


/var/log/messages (from controller) 
------------------------------------
                                \n  tzdata.noarch 0:2015g-1.el7                                                   \n  util-linux.x86_64 0:2.23.2-22.el7_1.1   
      \n\nComplete!\nyum return code: 0\nStarting cluster node\nStarting Cluster...\nRedirecting to /bin/systemctl start  corosync.service\nFailed to get D-Bu
to socket /run/systemd/private: No such file or directory\n\nERROR overcloud-controller-0 failed to join cluster in 360 seconds\n", "deploy_stderr": "Non-fata
m package glusterfs-3.7.1-16.el7.x86_64\nNon-fatal POSTUN scriptlet failure in rpm package glusterfs-3.6.0.29-2.el7.x86_64\nError: unable to start corosync\nE
unning on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not current
cluster is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nErr
ning on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not currently
uster is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nError
ng on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not currently r
ter is not currently running on this node\nError: cluster is not currently running on this node\nError: cluster is not currently running on this node\nError: 
 on this no
Comment 1 Jan Provaznik 2015-10-27 16:13:52 EDT
I think systemd got into a broken state on the controller node:
[root@overcloud-controller-0 ~]# systemctl 
Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: No such file or directory

Because systemctl doesn't work also any services can't be started.
Comment 2 Omri Hochman 2015-10-27 16:25 EDT
Created attachment 1087011 [details]
messages

Adding messages file from controller
Comment 3 James Slagle 2015-11-06 13:08:19 EST
i also saw some cluster related errors during an update attempt:
https://bugzilla.redhat.com/show_bug.cgi?id=1278004

the puppet reapply is happening due to:
https://bugzilla.redhat.com/show_bug.cgi?id=1278181

though it's still unclear why reapplying the puppet causes these errors
Comment 5 Udi 2015-12-15 06:01:17 EST
Update from 7.0 to 7.2 is working. Verified.
Comment 7 errata-xmlrpc 2015-12-21 11:57:27 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2651

Note You need to log in before you can comment on or make changes to this bug.