1231970 – Overcloud updates do not apply puppet hieradata changes

Bug 1231970 - Overcloud updates do not apply puppet hieradata changes

Summary: Overcloud updates do not apply puppet hieradata changes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ga
Target Release:	Director
Assignee:	Jay Dobies
QA Contact:	Marius Cornea
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-06-15 17:45 UTC by Steven Hardy
Modified:	2023-02-22 23:02 UTC (History)
CC List:	8 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-0.8.6-18.el7ost openstack-tripleo-image-elements-0.9.6-4.el7ost
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-08-05 13:53:42 UTC
Target Upstream Version:
Embargoed:
Flags:	yeylon: needinfo+

Attachments	(Terms of Use)
heat deployment-show output (12.20 KB, text/plain) 2015-07-22 22:44 UTC, Marius Cornea	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1463092	None	None	None	Never
OpenStack gerrit	183085	None	MERGED	Remove NO_SIGNAL from puppet role templates	2020-11-16 17:21:08 UTC
OpenStack gerrit	183086	None	MERGED	Remove NO_SIGNAL from ControllerBootstrapNodeDeployment	2020-11-16 17:20:47 UTC
OpenStack gerrit	183087	None	MERGED	Remove NO_SIGNAL from Controller\|ObjectSwiftDeployment	2020-11-16 17:20:47 UTC
OpenStack gerrit	183089	None	MERGED	Remove DefaultSignalTransport from top-level template	2020-11-16 17:20:47 UTC
OpenStack gerrit	188022	None	MERGED	Make CephStorageDeployment depend on NetworkDeployment	2020-11-16 17:20:47 UTC
OpenStack gerrit	190282	None	MERGED	Return derived config id when signalling deployments	2020-11-16 17:20:46 UTC
OpenStack gerrit	191146	None	MERGED	Make puppet-applying *Post resources depend on hieradata	2020-11-16 17:21:09 UTC
Red Hat Product Errata	RHEA-2015:1549	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform director Release	2015-08-05 17:49:10 UTC

Description Steven Hardy 2015-06-15 17:45:43 UTC

Description of problem:
Currently, if you attempt to update many input parameters which are applied to the deployed nodes via puppet, those changes do not result in the correct re-application of the relevant puppet manifests, so the update appears to work but no changes are made to the deployed nodes.

Version-Release number of selected component (if applicable):


How reproducible:
Always.

Steps to Reproduce:
1. Deploy an overcloud
2. Make a change to an input parameter which is applied to the nodes via puppet hieradata, e.g change "Debug" to true
3. Observe that /etc/puppet/hieradata on the node reflects the change, but the related system configuration (for example /etc/nova/nova.conf) has not been applied and the services have not been restarted (because the manfifest hasn't been reapplied.

Actual results:
No changes are observed.

Expected results:
Overcloud updates should apply config changes related to input parameter updates.


Additional info:


Upstream bug, patches posted: https://bugs.launchpad.net/tripleo/+bug/1463092

Comment 3 Steven Hardy 2015-06-17 06:50:39 UTC

I've posted some patches upstream which aim to address this:

https://review.openstack.org/#/c/190282/
https://review.openstack.org/#/c/191146/

The idea is that the 99-refresh-completed signalling returns the derived config ID (which changes every time any config input or definition changes) in the deploy_stdout, which is then accessible inside the template so we can wire in the explicit dependency between the hieradata deployments and subsequent puppet manifest applying configs.

I've not had time to heavily test the approach, but initial local tests indicate that it should resolve this issue and allow us to properly reapply the manifests whenever the hieradata changes.

Comment 4 Jan Provaznik 2015-06-25 12:42:36 UTC

excpet the patches above we also need to backport NO_SIGNAL patches:
https://review.openstack.org/#/c/183085/2

Otherwise signalling doesn't work as expected - switching back to ON_DEV until https://review.openstack.org/#/c/183085/2 is backported.

Comment 5 Steven Hardy 2015-06-25 12:56:12 UTC

Ok having chatted to Jan on IRC we realized there's a series related to NO_SIGNAL and dependencies which is a prerequisite to the fixes referenced above, this whole series should be backported:

https://review.openstack.org/#/c/183085/2
https://review.openstack.org/#/c/188022/
https://review.openstack.org/#/c/183086/
https://review.openstack.org/#/c/183087/
https://review.openstack.org/#/c/183088/
https://review.openstack.org/#/c/183089/

Then with the two patches in comment #3 puppet applying deployments should re-run when deployments modify hieradata.

Comment 7 Steven Hardy 2015-07-22 11:39:37 UTC

Notes on how this might be verified:

1. Create an overcloud with the defaults for all parameters, e.g

  openstack overcloud deploy --templates

2. Note the value of a parameter, such as "Debug", after the stack is CREATE_COMPLETE

  heat stack-show overcloud | grep Debug

3. Pass a parameter overriding the current value, e.g set Debug to false

cat param_env.yaml
parameters:
  Debug: false

  openstack overcloud deploy --templates -e param_env.yaml

4. Verify the value has been updated and the stack is UPDATE_COMPLETE (this may take a few minutes)

  heat stack-list 
  heat stack-show overcloud | grep Debug

5. Log on to e.g a controller node, and check the status of the various "debug" hieradata values in /etc/puppet/hieradata

  cd /etc/puppet/hieradata
  grep debug ./*

Optionally this could be done after step (2), you should see the value switch from true to false

6. Inspect a service configuration file, e.g /etc/heat/heat.conf - see that the value has been switched from true to false.

Comment 8 Steven Hardy 2015-07-22 15:36:41 UTC

Ok, apologies, that approach won't actually work, because the oscplugin hard-codes a bunch of parameters which aren't configurable yet and take precedence over the environment parameters. (I'll raise a bug about that).

Instead, we need to alter the hard-coded default for "Debug":

/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py

Change the 'Debug': 'True' line to 'Debug': 'False'

then do openstack overcloud deploy --templates (no need to pass the env file above).

The remaining validation steps (on the node) remain the same.

Comment 9 Marius Cornea 2015-07-22 22:43:47 UTC

Ok, I tried chaning Debug to False in /usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py and then update using the initial deploy command.

openstack overcloud deploy --control-scale 3 --compute-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml --templates

It resulted in UPDATE_FAILED stack. Attaching output of heat deployment show. There looks to be an issue with restarting the openstack-nova-novncproxy resources and also some failed actions show up in the output of pcs status.

At a 2nd run of the deploy command the stack ended up with UPDATE_COMPLETE status but the openstack-nova-novncproxy resources still show up as unmanaged. 

Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
     openstack-nova-novncproxy	(systemd:openstack-nova-novncproxy):	FAILED overcloud-controller-0 (unmanaged) 
     openstack-nova-novncproxy	(systemd:openstack-nova-novncproxy):	FAILED overcloud-controller-2 (unmanaged) 
     openstack-nova-novncproxy	(systemd:openstack-nova-novncproxy):	FAILED overcloud-controller-1 (unmanaged) 

Failed actions:
    openstack-nova-novncproxy_stop_0 on overcloud-controller-0 'OCF_TIMEOUT' (198): call=395, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=12ms, exec=2ms
    openstack-nova-novncproxy_stop_0 on overcloud-controller-0 'OCF_TIMEOUT' (198): call=395, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=12ms, exec=2ms
    neutron-openvswitch-agent_monitor_60000 on overcloud-controller-0 'not running' (7): call=370, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:31 2015', queued=0ms, exec=0ms
    openstack-nova-api_monitor_60000 on overcloud-controller-2 'OCF_PENDING' (196): call=234, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:18:55 2015', queued=0ms, exec=0ms
    openstack-nova-novncproxy_stop_0 on overcloud-controller-2 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:19:11 2015', queued=12ms, exec=1ms
    openstack-nova-novncproxy_stop_0 on overcloud-controller-2 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:19:11 2015', queued=12ms, exec=1ms
    neutron-openvswitch-agent_monitor_60000 on overcloud-controller-2 'not running' (7): call=342, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:31 2015', queued=0ms, exec=0ms
    openstack-nova-api_monitor_60000 on overcloud-controller-1 'OCF_PENDING' (196): call=235, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:55 2015', queued=0ms, exec=0ms
    openstack-nova-novncproxy_stop_0 on overcloud-controller-1 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=13ms, exec=4ms
    openstack-nova-novncproxy_stop_0 on overcloud-controller-1 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=13ms, exec=4ms
    neutron-openvswitch-agent_monitor_60000 on overcloud-controller-1 'not running' (7): call=327, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:31 2015', queued=0ms, exec=0ms

Comment 10 Marius Cornea 2015-07-22 22:44:24 UTC

Created attachment 1055108 [details]
heat deployment-show output

Comment 11 James Slagle 2015-07-23 00:14:39 UTC

what about the change to the debug value though? did that take effect?

Comment 12 Marius Cornea 2015-07-23 08:11:01 UTC

Yes, it did take effect:

[stack@instack ~]$  heat stack-show overcloud | grep Debug
|                       |   "Debug": "False",

Comment 13 Mike Burns 2015-07-23 11:43:45 UTC

based on these comments, this is verified.  The failed update is a separate bug that we need to track/debug separately.

Comment 15 errata-xmlrpc 2015-08-05 13:53:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549

Note You need to log in before you can comment on or make changes to this bug.