1493303 – OS::Heat::SoftwareConfig - missing group property can permanently break software deployments and os-collect-config on deployed nodes

Bug 1493303 - OS::Heat::SoftwareConfig - missing group property can permanently break software deployments and os-collect-config on deployed nodes

Summary: OS::Heat::SoftwareConfig - missing group property can permanently break softw...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	os-apply-config
Sub Component:
Version:	8.0 (Liberty)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	z3
Target Release:	13.0 (Queens)
Assignee:	James Slagle
QA Contact:	Gurenko Alex
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1653510 1653512 1653513
TreeView+	depends on / blocked

Reported:	2017-09-19 21:08 UTC by Matt Flusche
Modified:	2020-12-14 10:08 UTC (History)
CC List:	8 users (show)
Fixed In Version:	os-apply-config-8.3.1-0.20180308131116.62fdfc1.el7.centos
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1653510 1653512 1653513 (view as bug list)
Environment:
Last Closed:	2018-11-09 11:43:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	506328	0	None	MERGED	fixes how os-apply-config handles invalid json	2020-09-16 17:19:48 UTC

Description Matt Flusche 2017-09-19 21:08:36 UTC

Description of problem:

Using the example here:
https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/custom-block-storage-back-end-deployment-guide/custom-block-storage-back-end-deployment-guide/

Section 3.2 - /home/stack/templates/custom-config.yaml

  CinderRestartConfig: # 3
    type: OS::Heat::SoftwareConfig
    properties:
      config: |
        #!/bin/sh
        sudo pcs resource restart openstack-cinder-volume

The property: "group: script" is missing.

Using this resource in a deployment can create chaos on the deployed nodes.

Note: a doc bz has been filed to address the missing config:  https://bugzilla.redhat.com/show_bug.cgi?id=1493243

This BZ is for the underlying heat / os-collect-config issues that results from using this mis-configured resource. 

Once this software deployment runs /etc/os-collect-config.conf on target nodes will be broken; halting any additional software deployments until the stack times out.

[root@overcloud-controller-0 ~]# cat /etc/os-collect-config.conf 
[DEFAULT]
command = os-refresh-config

On the deploy node the following is found:

[root@overcloud-controller-0 ~]# ls  /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json*
/var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json
/var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json.last
/var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json.orig

[root@overcloud-controller-0 ~]# cat /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json
"#!/bin/sh\nsudo pcs resource restart openstack-cinder-volume\n"

Deleting these files and running os-apply-config will temporarily fix the os-collect-config.conf file.

[root@overcloud-controller-0 ~]# rm -f /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json*

[root@overcloud-controller-0 ~]# os-apply-config 
[2017/09/19 08:46:50 PM] [INFO] writing /etc/os-net-config/config.json
[2017/09/19 08:46:50 PM] [INFO] writing /var/run/heat-config/heat-config
[2017/09/19 08:46:50 PM] [INFO] writing /etc/puppet/hiera.yaml
[2017/09/19 08:46:50 PM] [INFO] writing /etc/os-collect-config.conf
[2017/09/19 08:46:50 PM] [INFO] success

[root@overcloud-controller-0 ~]# cat /etc/os-collect-config.conf 
[DEFAULT]
command = os-refresh-config
collectors = ec2
collectors = cfn
collectors = local

[cfn]
metadata_url = http://172.16.1.1:8000/v1/
stack_name = overcloud-Controller-q6gd3h72wt4g-0-fnmmcrrzext7
secret_access_key = 89d28cccc33748a78c960ac9fe33d133
access_key_id = fc47334db43c4e778945e295d0d5b85d
path = Controller.Metadata

Here is the heat resource metadata:

[stack@undercloud8 templates]$ heat resource-metadata overcloud-Controller-q6gd3h72wt4g-0-fnmmcrrzext7 Controller
[...]
      "group": "Heat::Ungrouped", 
      "name": "overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh", 
      "outputs": [], 
      "creation_time": "2017-09-19T20:31:33", 
      "options": {}, 
      "config": "#!/bin/sh\nsudo pcs resource restart openstack-cinder-volume\n", 
      "id": "10a65d60-3473-4a46-89d7-b0e847e01e46"
[...]

I assume this "Heat::Ungrouped" is not handled properly.

There may be a better way but the work-around I've found:

- cancel deployment
- restart heat-engine
- fix software deployment resource
- run new deployment and manually signal all software deployment resources (the old resource seems to hangout in the node's metadata until a second deployment completes).
- on all nodes fix os-collect-config.conf
    - rm -f /var/lib/os-collect-config/*CinderRestartConfig*
    - os-apply-config
    - cat /etc/os-collect-config.conf # verify correct config
    - systemctl restart os-collect-config
- run deployment again.

Version-Release number of selected component (if applicable):
heat on Director:
openstack-heat-templates-0-0.2.20170112.el7ost.noarch
openstack-heat-api-cfn-5.0.3-2.el7ost.noarch
openstack-heat-engine-5.0.3-2.el7ost.noarch
openstack-heat-api-cloudwatch-5.0.3-2.el7ost.noarch
openstack-heat-api-5.0.3-2.el7ost.noarch
openstack-heat-common-5.0.3-2.el7ost.noarch

os-collect-config on deployed nodes:
os-collect-config-0.1.37-2.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. see above

Actual results:
heat software deployment is completely broken until manually intervention

Expected results:
fail friendly...er :)

Additional info:

Comment 2 Matt Flusche 2017-09-20 13:22:24 UTC

Is this something that could be handled better by os-apply-config?

Comment 3 Matt Flusche 2017-09-21 19:14:26 UTC

I submitted the following to address this issue:

https://review.openstack.org/#/c/506328/

Guess I should submit a launchpad bug also.

I tested with this specific issue on osp 8 and it resolves it.

Comment 4 Zane Bitter 2017-10-12 14:38:45 UTC

The fix to os-apply-config upstream looks plausible to me; changing component.

Comment 7 Steve Baker 2018-11-05 21:54:01 UTC

The fix landed upstream 9 months ago

Comment 10 Lon Hohberger 2018-11-06 11:44:00 UTC

According to our records, this should be resolved by os-apply-config-8.3.1-0.20180308131116.62fdfc1.el7ost.  This build is available now.

Comment 11 Gurenko Alex 2018-11-08 16:29:44 UTC

Verified on puddle 2018-11-07.3

[stack@undercloud-0 ~]$ rpm -q os-apply-config
os-apply-config-8.3.1-0.20180308131116.62fdfc1.el7ost.noarch

Note You need to log in before you can comment on or make changes to this bug.