Bug 1231970 - Overcloud updates do not apply puppet hieradata changes
Summary: Overcloud updates do not apply puppet hieradata changes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: Director
Assignee: Jay Dobies
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-15 17:45 UTC by Steven Hardy
Modified: 2016-03-04 04:55 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-0.8.6-18.el7ost openstack-tripleo-image-elements-0.9.6-4.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-05 13:53:42 UTC
Target Upstream Version:
yeylon: needinfo+


Attachments (Terms of Use)
heat deployment-show output (12.20 KB, text/plain)
2015-07-22 22:44 UTC, Marius Cornea
no flags Details


Links
System ID Priority Status Summary Last Updated
Launchpad 1463092 None None None Never
OpenStack gerrit 183085 None None None Never
OpenStack gerrit 183086 None None None Never
OpenStack gerrit 183087 None None None Never
OpenStack gerrit 183089 None None None Never
OpenStack gerrit 188022 None None None Never
OpenStack gerrit 190282 None None None Never
OpenStack gerrit 191146 None None None Never
Red Hat Product Errata RHEA-2015:1549 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 17:49:10 UTC

Description Steven Hardy 2015-06-15 17:45:43 UTC
Description of problem:
Currently, if you attempt to update many input parameters which are applied to the deployed nodes via puppet, those changes do not result in the correct re-application of the relevant puppet manifests, so the update appears to work but no changes are made to the deployed nodes.

Version-Release number of selected component (if applicable):


How reproducible:
Always.

Steps to Reproduce:
1. Deploy an overcloud
2. Make a change to an input parameter which is applied to the nodes via puppet hieradata, e.g change "Debug" to true
3. Observe that /etc/puppet/hieradata on the node reflects the change, but the related system configuration (for example /etc/nova/nova.conf) has not been applied and the services have not been restarted (because the manfifest hasn't been reapplied.

Actual results:
No changes are observed.

Expected results:
Overcloud updates should apply config changes related to input parameter updates.


Additional info:


Upstream bug, patches posted: https://bugs.launchpad.net/tripleo/+bug/1463092

Comment 3 Steven Hardy 2015-06-17 06:50:39 UTC
I've posted some patches upstream which aim to address this:

https://review.openstack.org/#/c/190282/
https://review.openstack.org/#/c/191146/

The idea is that the 99-refresh-completed signalling returns the derived config ID (which changes every time any config input or definition changes) in the deploy_stdout, which is then accessible inside the template so we can wire in the explicit dependency between the hieradata deployments and subsequent puppet manifest applying configs.

I've not had time to heavily test the approach, but initial local tests indicate that it should resolve this issue and allow us to properly reapply the manifests whenever the hieradata changes.

Comment 4 Jan Provaznik 2015-06-25 12:42:36 UTC
excpet the patches above we also need to backport NO_SIGNAL patches:
https://review.openstack.org/#/c/183085/2

Otherwise signalling doesn't work as expected - switching back to ON_DEV until https://review.openstack.org/#/c/183085/2 is backported.

Comment 5 Steven Hardy 2015-06-25 12:56:12 UTC
Ok having chatted to Jan on IRC we realized there's a series related to NO_SIGNAL and dependencies which is a prerequisite to the fixes referenced above, this whole series should be backported:

https://review.openstack.org/#/c/183085/2
https://review.openstack.org/#/c/188022/
https://review.openstack.org/#/c/183086/
https://review.openstack.org/#/c/183087/
https://review.openstack.org/#/c/183088/
https://review.openstack.org/#/c/183089/

Then with the two patches in comment #3 puppet applying deployments should re-run when deployments modify hieradata.

Comment 7 Steven Hardy 2015-07-22 11:39:37 UTC
Notes on how this might be verified:

1. Create an overcloud with the defaults for all parameters, e.g

  openstack overcloud deploy --templates

2. Note the value of a parameter, such as "Debug", after the stack is CREATE_COMPLETE

  heat stack-show overcloud | grep Debug

3. Pass a parameter overriding the current value, e.g set Debug to false

cat param_env.yaml
parameters:
  Debug: false

  openstack overcloud deploy --templates -e param_env.yaml

4. Verify the value has been updated and the stack is UPDATE_COMPLETE (this may take a few minutes)

  heat stack-list 
  heat stack-show overcloud | grep Debug

5. Log on to e.g a controller node, and check the status of the various "debug" hieradata values in /etc/puppet/hieradata

  cd /etc/puppet/hieradata
  grep debug ./*

Optionally this could be done after step (2), you should see the value switch from true to false

6. Inspect a service configuration file, e.g /etc/heat/heat.conf - see that the value has been switched from true to false.

Comment 8 Steven Hardy 2015-07-22 15:36:41 UTC
Ok, apologies, that approach won't actually work, because the oscplugin hard-codes a bunch of parameters which aren't configurable yet and take precedence over the environment parameters. (I'll raise a bug about that).

Instead, we need to alter the hard-coded default for "Debug":

/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py

Change the 'Debug': 'True' line to 'Debug': 'False'

then do openstack overcloud deploy --templates (no need to pass the env file above).

The remaining validation steps (on the node) remain the same.

Comment 9 Marius Cornea 2015-07-22 22:43:47 UTC
Ok, I tried chaning Debug to False in /usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py and then update using the initial deploy command.

openstack overcloud deploy --control-scale 3 --compute-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml --templates

It resulted in UPDATE_FAILED stack. Attaching output of heat deployment show. There looks to be an issue with restarting the openstack-nova-novncproxy resources and also some failed actions show up in the output of pcs status.

At a 2nd run of the deploy command the stack ended up with UPDATE_COMPLETE status but the openstack-nova-novncproxy resources still show up as unmanaged. 

Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
     openstack-nova-novncproxy	(systemd:openstack-nova-novncproxy):	FAILED overcloud-controller-0 (unmanaged) 
     openstack-nova-novncproxy	(systemd:openstack-nova-novncproxy):	FAILED overcloud-controller-2 (unmanaged) 
     openstack-nova-novncproxy	(systemd:openstack-nova-novncproxy):	FAILED overcloud-controller-1 (unmanaged) 

Failed actions:
    openstack-nova-novncproxy_stop_0 on overcloud-controller-0 'OCF_TIMEOUT' (198): call=395, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=12ms, exec=2ms
    openstack-nova-novncproxy_stop_0 on overcloud-controller-0 'OCF_TIMEOUT' (198): call=395, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=12ms, exec=2ms
    neutron-openvswitch-agent_monitor_60000 on overcloud-controller-0 'not running' (7): call=370, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:31 2015', queued=0ms, exec=0ms
    openstack-nova-api_monitor_60000 on overcloud-controller-2 'OCF_PENDING' (196): call=234, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:18:55 2015', queued=0ms, exec=0ms
    openstack-nova-novncproxy_stop_0 on overcloud-controller-2 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:19:11 2015', queued=12ms, exec=1ms
    openstack-nova-novncproxy_stop_0 on overcloud-controller-2 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:19:11 2015', queued=12ms, exec=1ms
    neutron-openvswitch-agent_monitor_60000 on overcloud-controller-2 'not running' (7): call=342, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:31 2015', queued=0ms, exec=0ms
    openstack-nova-api_monitor_60000 on overcloud-controller-1 'OCF_PENDING' (196): call=235, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:55 2015', queued=0ms, exec=0ms
    openstack-nova-novncproxy_stop_0 on overcloud-controller-1 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=13ms, exec=4ms
    openstack-nova-novncproxy_stop_0 on overcloud-controller-1 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=13ms, exec=4ms
    neutron-openvswitch-agent_monitor_60000 on overcloud-controller-1 'not running' (7): call=327, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:31 2015', queued=0ms, exec=0ms

Comment 10 Marius Cornea 2015-07-22 22:44:24 UTC
Created attachment 1055108 [details]
heat deployment-show output

Comment 11 James Slagle 2015-07-23 00:14:39 UTC
what about the change to the debug value though? did that take effect?

Comment 12 Marius Cornea 2015-07-23 08:11:01 UTC
Yes, it did take effect:

[stack@instack ~]$  heat stack-show overcloud | grep Debug
|                       |   "Debug": "False",

Comment 13 Mike Burns 2015-07-23 11:43:45 UTC
based on these comments, this is verified.  The failed update is a separate bug that we need to track/debug separately.

Comment 15 errata-xmlrpc 2015-08-05 13:53:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549


Note You need to log in before you can comment on or make changes to this bug.