1657571 – Puppet runs during scale out when --skip-deploy-identifier is used

Bug 1657571 - Puppet runs during scale out when --skip-deploy-identifier is used

Summary: Puppet runs during scale out when --skip-deploy-identifier is used

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-tripleoclient
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	z5
Target Release:	13.0 (Queens)
Assignee:	RHOS Maint
QA Contact:	Sasha Smolyak
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-10 02:04 UTC by Marius Cornea
Modified:	2019-03-14 13:55 UTC (History)
CC List:	10 users (show)
Fixed In Version:	python-tripleoclient-9.2.6-6.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-03-14 13:55:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0448	0	None	None	None	2019-03-14 13:55:18 UTC

Description Marius Cornea 2018-12-10 02:04:07 UTC

Description of problem:

Puppet runs during scale out when --skip-deploy-identifier is used

Version-Release number of selected component (if applicable):
python-tripleoclient-9.2.6-2.el7ost.noarch
openstack-tripleo-common-8.6.6-2.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy split stack overcloud with 3controllers + 3computes + 3ceph
2. On compute-0 check journalctl -u os-collect-config | grep "Run puppet host configuration for step" | wc -l
3. Remove one compute node
4. openstack overcloud node delete
5. Re-run overcloud deploy with the initial number of nodes
6. Wait for deploy to succeed.
7. On compute-0 check journalctl -u os-collect-config | grep "Run puppet host configuration for step" | wc -l

Actual results:
Double than the result in step 2

Expected results:
Same number of occurrences as in step 2 

Additional info:
Attaching job artifacts.

Comment 2 Dan Prince 2018-12-10 14:28:04 UTC

During a scale out it would change some of the IP/host lists though right? So wouldn't you expect puppet to actually run during this time?

Comment 3 Marius Cornea 2018-12-10 23:12:19 UTC

(In reply to Dan Prince from comment #2)
> During a scale out it would change some of the IP/host lists though right?
> So wouldn't you expect puppet to actually run during this time?

We had a chat with Alex Gurenko(who initially tested this) and the expectation is that puppet won't run on existing compute nodes. Also the same test used to previously work so how can we proceed to determine what's going on?

Comment 4 Steve Baker 2018-12-17 21:34:16 UTC

James do you know if puppet runs are expected on existing nodes during scale-out?

Comment 5 James Slagle 2018-12-18 13:51:11 UTC

(In reply to Steve Baker from comment #4)
> James do you know if puppet runs are expected on existing nodes during
> scale-out?

Assuming no other changes, if --skep-deploy-identifier is passed then puppet should not be run on existing nodes.

Nothing in the SoftwareConfig should change in that case, so Heat wouldn't trigger a new SoftwareDeployment to run puppet.

If that's no longer the case, then either there is a bug around --skip-deploy-identifier, or perhaps something changed in the templates such that Heat is detecting a change in the SoftwareConfig during scale out even when --skip-deploy-identifier is passed.

Comment 6 Alex Schultz 2019-01-15 21:20:23 UTC

I'm not spotting any reason why the configs would be rerun. The deployment identifier is properly being set to "". The only thing I'm noticing is that the data ordering of the SoftwareConfig json seems to be reorganized between runs.  Rabi, is the data ordering taken into consideration when determining if a software config needs to be reapplied?

Comment 7 Rabi Mishra 2019-01-16 09:53:02 UTC

AFAIK os-refresh-config does not make that decision based on the config data. If the config_id has changed[1] (which I assume is the case here), it would try and re-apply the config. 

We seem to generate a new derived_config every time something changes for a deployment. 

Are we sure it's setting DeployIdentifier parameter as '' both before and after? I guess[2] is a backward incompatible change and broken this. If the client was upgraded in-between, and then the re-deploy was run with --skip-deploy-identifier it would reset the DeployIdentifier from an unique value to '' and would create the new configs that would be re-deployed.

[1] https://github.com/openstack/heat-agents/blob/master/heat-config/os-refresh-config/configure.d/55-heat-config#L138 
[2] https://review.openstack.org/#/c/583079/1/tripleoclient/v1/overcloud_deploy.py

Comment 8 Rabi Mishra 2019-01-16 12:23:09 UTC

This seems broken as --skip_deploy_identifier would behave the opposite for an update. I've submitted a patch which would probably fix this https://review.openstack.org/#/c/631204/

Comment 27 errata-xmlrpc 2019-03-14 13:55:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0448

Note You need to log in before you can comment on or make changes to this bug.