Description of problem: Currently, if you attempt to update many input parameters which are applied to the deployed nodes via puppet, those changes do not result in the correct re-application of the relevant puppet manifests, so the update appears to work but no changes are made to the deployed nodes. Version-Release number of selected component (if applicable): How reproducible: Always. Steps to Reproduce: 1. Deploy an overcloud 2. Make a change to an input parameter which is applied to the nodes via puppet hieradata, e.g change "Debug" to true 3. Observe that /etc/puppet/hieradata on the node reflects the change, but the related system configuration (for example /etc/nova/nova.conf) has not been applied and the services have not been restarted (because the manfifest hasn't been reapplied. Actual results: No changes are observed. Expected results: Overcloud updates should apply config changes related to input parameter updates. Additional info: Upstream bug, patches posted: https://bugs.launchpad.net/tripleo/+bug/1463092
I've posted some patches upstream which aim to address this: https://review.openstack.org/#/c/190282/ https://review.openstack.org/#/c/191146/ The idea is that the 99-refresh-completed signalling returns the derived config ID (which changes every time any config input or definition changes) in the deploy_stdout, which is then accessible inside the template so we can wire in the explicit dependency between the hieradata deployments and subsequent puppet manifest applying configs. I've not had time to heavily test the approach, but initial local tests indicate that it should resolve this issue and allow us to properly reapply the manifests whenever the hieradata changes.
excpet the patches above we also need to backport NO_SIGNAL patches: https://review.openstack.org/#/c/183085/2 Otherwise signalling doesn't work as expected - switching back to ON_DEV until https://review.openstack.org/#/c/183085/2 is backported.
Ok having chatted to Jan on IRC we realized there's a series related to NO_SIGNAL and dependencies which is a prerequisite to the fixes referenced above, this whole series should be backported: https://review.openstack.org/#/c/183085/2 https://review.openstack.org/#/c/188022/ https://review.openstack.org/#/c/183086/ https://review.openstack.org/#/c/183087/ https://review.openstack.org/#/c/183088/ https://review.openstack.org/#/c/183089/ Then with the two patches in comment #3 puppet applying deployments should re-run when deployments modify hieradata.
Notes on how this might be verified: 1. Create an overcloud with the defaults for all parameters, e.g openstack overcloud deploy --templates 2. Note the value of a parameter, such as "Debug", after the stack is CREATE_COMPLETE heat stack-show overcloud | grep Debug 3. Pass a parameter overriding the current value, e.g set Debug to false cat param_env.yaml parameters: Debug: false openstack overcloud deploy --templates -e param_env.yaml 4. Verify the value has been updated and the stack is UPDATE_COMPLETE (this may take a few minutes) heat stack-list heat stack-show overcloud | grep Debug 5. Log on to e.g a controller node, and check the status of the various "debug" hieradata values in /etc/puppet/hieradata cd /etc/puppet/hieradata grep debug ./* Optionally this could be done after step (2), you should see the value switch from true to false 6. Inspect a service configuration file, e.g /etc/heat/heat.conf - see that the value has been switched from true to false.
Ok, apologies, that approach won't actually work, because the oscplugin hard-codes a bunch of parameters which aren't configurable yet and take precedence over the environment parameters. (I'll raise a bug about that). Instead, we need to alter the hard-coded default for "Debug": /usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py Change the 'Debug': 'True' line to 'Debug': 'False' then do openstack overcloud deploy --templates (no need to pass the env file above). The remaining validation steps (on the node) remain the same.
Ok, I tried chaning Debug to False in /usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py and then update using the initial deploy command. openstack overcloud deploy --control-scale 3 --compute-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml --templates It resulted in UPDATE_FAILED stack. Attaching output of heat deployment show. There looks to be an issue with restarting the openstack-nova-novncproxy resources and also some failed actions show up in the output of pcs status. At a 2nd run of the deploy command the stack ended up with UPDATE_COMPLETE status but the openstack-nova-novncproxy resources still show up as unmanaged. Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] openstack-nova-novncproxy (systemd:openstack-nova-novncproxy): FAILED overcloud-controller-0 (unmanaged) openstack-nova-novncproxy (systemd:openstack-nova-novncproxy): FAILED overcloud-controller-2 (unmanaged) openstack-nova-novncproxy (systemd:openstack-nova-novncproxy): FAILED overcloud-controller-1 (unmanaged) Failed actions: openstack-nova-novncproxy_stop_0 on overcloud-controller-0 'OCF_TIMEOUT' (198): call=395, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=12ms, exec=2ms openstack-nova-novncproxy_stop_0 on overcloud-controller-0 'OCF_TIMEOUT' (198): call=395, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=12ms, exec=2ms neutron-openvswitch-agent_monitor_60000 on overcloud-controller-0 'not running' (7): call=370, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:31 2015', queued=0ms, exec=0ms openstack-nova-api_monitor_60000 on overcloud-controller-2 'OCF_PENDING' (196): call=234, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:18:55 2015', queued=0ms, exec=0ms openstack-nova-novncproxy_stop_0 on overcloud-controller-2 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:19:11 2015', queued=12ms, exec=1ms openstack-nova-novncproxy_stop_0 on overcloud-controller-2 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:19:11 2015', queued=12ms, exec=1ms neutron-openvswitch-agent_monitor_60000 on overcloud-controller-2 'not running' (7): call=342, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:31 2015', queued=0ms, exec=0ms openstack-nova-api_monitor_60000 on overcloud-controller-1 'OCF_PENDING' (196): call=235, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:55 2015', queued=0ms, exec=0ms openstack-nova-novncproxy_stop_0 on overcloud-controller-1 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=13ms, exec=4ms openstack-nova-novncproxy_stop_0 on overcloud-controller-1 'OCF_TIMEOUT' (198): call=352, status=Timed Out, exit-reason='none', last-rc-change='Wed Jul 22 18:20:13 2015', queued=13ms, exec=4ms neutron-openvswitch-agent_monitor_60000 on overcloud-controller-1 'not running' (7): call=327, status=complete, exit-reason='none', last-rc-change='Wed Jul 22 18:19:31 2015', queued=0ms, exec=0ms
Created attachment 1055108 [details] heat deployment-show output
what about the change to the debug value though? did that take effect?
Yes, it did take effect: [stack@instack ~]$ heat stack-show overcloud | grep Debug | | "Debug": "False",
based on these comments, this is verified. The failed update is a separate bug that we need to track/debug separately.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1549