Description of problem: Puppet runs during scale out when --skip-deploy-identifier is used Version-Release number of selected component (if applicable): python-tripleoclient-9.2.6-2.el7ost.noarch openstack-tripleo-common-8.6.6-2.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy split stack overcloud with 3controllers + 3computes + 3ceph 2. On compute-0 check journalctl -u os-collect-config | grep "Run puppet host configuration for step" | wc -l 3. Remove one compute node 4. openstack overcloud node delete 5. Re-run overcloud deploy with the initial number of nodes 6. Wait for deploy to succeed. 7. On compute-0 check journalctl -u os-collect-config | grep "Run puppet host configuration for step" | wc -l Actual results: Double than the result in step 2 Expected results: Same number of occurrences as in step 2 Additional info: Attaching job artifacts.
During a scale out it would change some of the IP/host lists though right? So wouldn't you expect puppet to actually run during this time?
(In reply to Dan Prince from comment #2) > During a scale out it would change some of the IP/host lists though right? > So wouldn't you expect puppet to actually run during this time? We had a chat with Alex Gurenko(who initially tested this) and the expectation is that puppet won't run on existing compute nodes. Also the same test used to previously work so how can we proceed to determine what's going on?
James do you know if puppet runs are expected on existing nodes during scale-out?
(In reply to Steve Baker from comment #4) > James do you know if puppet runs are expected on existing nodes during > scale-out? Assuming no other changes, if --skep-deploy-identifier is passed then puppet should not be run on existing nodes. Nothing in the SoftwareConfig should change in that case, so Heat wouldn't trigger a new SoftwareDeployment to run puppet. If that's no longer the case, then either there is a bug around --skip-deploy-identifier, or perhaps something changed in the templates such that Heat is detecting a change in the SoftwareConfig during scale out even when --skip-deploy-identifier is passed.
I'm not spotting any reason why the configs would be rerun. The deployment identifier is properly being set to "". The only thing I'm noticing is that the data ordering of the SoftwareConfig json seems to be reorganized between runs. Rabi, is the data ordering taken into consideration when determining if a software config needs to be reapplied?
AFAIK os-refresh-config does not make that decision based on the config data. If the config_id has changed[1] (which I assume is the case here), it would try and re-apply the config. We seem to generate a new derived_config every time something changes for a deployment. Are we sure it's setting DeployIdentifier parameter as '' both before and after? I guess[2] is a backward incompatible change and broken this. If the client was upgraded in-between, and then the re-deploy was run with --skip-deploy-identifier it would reset the DeployIdentifier from an unique value to '' and would create the new configs that would be re-deployed. [1] https://github.com/openstack/heat-agents/blob/master/heat-config/os-refresh-config/configure.d/55-heat-config#L138 [2] https://review.openstack.org/#/c/583079/1/tripleoclient/v1/overcloud_deploy.py
This seems broken as --skip_deploy_identifier would behave the opposite for an update. I've submitted a patch which would probably fix this https://review.openstack.org/#/c/631204/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0448