Bug 1493303 - OS::Heat::SoftwareConfig - missing group property can permanently break software deployments and os-collect-config on deployed nodes
Summary: OS::Heat::SoftwareConfig - missing group property can permanently break softw...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-apply-config
Version: 8.0 (Liberty)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z3
: 13.0 (Queens)
Assignee: James Slagle
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks: 1653510 1653512 1653513
TreeView+ depends on / blocked
 
Reported: 2017-09-19 21:08 UTC by Matt Flusche
Modified: 2020-12-14 10:08 UTC (History)
8 users (show)

Fixed In Version: os-apply-config-8.3.1-0.20180308131116.62fdfc1.el7.centos
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1653510 1653512 1653513 (view as bug list)
Environment:
Last Closed: 2018-11-09 11:43:17 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 506328 0 None MERGED fixes how os-apply-config handles invalid json 2020-09-16 17:19:48 UTC

Description Matt Flusche 2017-09-19 21:08:36 UTC
Description of problem:

Using the example here:
https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/custom-block-storage-back-end-deployment-guide/custom-block-storage-back-end-deployment-guide/

Section 3.2 - /home/stack/templates/custom-config.yaml

  CinderRestartConfig: # 3
    type: OS::Heat::SoftwareConfig
    properties:
      config: |
        #!/bin/sh
        sudo pcs resource restart openstack-cinder-volume

The property: "group: script" is missing.

Using this resource in a deployment can create chaos on the deployed nodes.

Note: a doc bz has been filed to address the missing config:  https://bugzilla.redhat.com/show_bug.cgi?id=1493243

This BZ is for the underlying heat / os-collect-config issues that results from using this mis-configured resource. 

Once this software deployment runs /etc/os-collect-config.conf on target nodes will be broken; halting any additional software deployments until the stack times out.

[root@overcloud-controller-0 ~]# cat /etc/os-collect-config.conf 
[DEFAULT]
command = os-refresh-config

On the deploy node the following is found:

[root@overcloud-controller-0 ~]# ls  /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json*
/var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json
/var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json.last
/var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json.orig

[root@overcloud-controller-0 ~]# cat /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json
"#!/bin/sh\nsudo pcs resource restart openstack-cinder-volume\n"

Deleting these files and running os-apply-config will temporarily fix the os-collect-config.conf file.

[root@overcloud-controller-0 ~]# rm -f /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json*

[root@overcloud-controller-0 ~]# os-apply-config 
[2017/09/19 08:46:50 PM] [INFO] writing /etc/os-net-config/config.json
[2017/09/19 08:46:50 PM] [INFO] writing /var/run/heat-config/heat-config
[2017/09/19 08:46:50 PM] [INFO] writing /etc/puppet/hiera.yaml
[2017/09/19 08:46:50 PM] [INFO] writing /etc/os-collect-config.conf
[2017/09/19 08:46:50 PM] [INFO] success

[root@overcloud-controller-0 ~]# cat /etc/os-collect-config.conf 
[DEFAULT]
command = os-refresh-config
collectors = ec2
collectors = cfn
collectors = local

[cfn]
metadata_url = http://172.16.1.1:8000/v1/
stack_name = overcloud-Controller-q6gd3h72wt4g-0-fnmmcrrzext7
secret_access_key = 89d28cccc33748a78c960ac9fe33d133
access_key_id = fc47334db43c4e778945e295d0d5b85d
path = Controller.Metadata

Here is the heat resource metadata:

[stack@undercloud8 templates]$ heat resource-metadata overcloud-Controller-q6gd3h72wt4g-0-fnmmcrrzext7 Controller
[...]
      "group": "Heat::Ungrouped", 
      "name": "overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh", 
      "outputs": [], 
      "creation_time": "2017-09-19T20:31:33", 
      "options": {}, 
      "config": "#!/bin/sh\nsudo pcs resource restart openstack-cinder-volume\n", 
      "id": "10a65d60-3473-4a46-89d7-b0e847e01e46"
[...]

I assume this "Heat::Ungrouped" is not handled properly.

There may be a better way but the work-around I've found:

- cancel deployment
- restart heat-engine
- fix software deployment resource
- run new deployment and manually signal all software deployment resources (the old resource seems to hangout in the node's metadata until a second deployment completes).
- on all nodes fix os-collect-config.conf
    - rm -f /var/lib/os-collect-config/*CinderRestartConfig*
    - os-apply-config
    - cat /etc/os-collect-config.conf # verify correct config
    - systemctl restart os-collect-config
- run deployment again.

Version-Release number of selected component (if applicable):
heat on Director:
openstack-heat-templates-0-0.2.20170112.el7ost.noarch
openstack-heat-api-cfn-5.0.3-2.el7ost.noarch
openstack-heat-engine-5.0.3-2.el7ost.noarch
openstack-heat-api-cloudwatch-5.0.3-2.el7ost.noarch
openstack-heat-api-5.0.3-2.el7ost.noarch
openstack-heat-common-5.0.3-2.el7ost.noarch

os-collect-config on deployed nodes:
os-collect-config-0.1.37-2.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. see above

Actual results:
heat software deployment is completely broken until manually intervention

Expected results:
fail friendly...er :)

Additional info:

Comment 2 Matt Flusche 2017-09-20 13:22:24 UTC
Is this something that could be handled better by os-apply-config?

Comment 3 Matt Flusche 2017-09-21 19:14:26 UTC
I submitted the following to address this issue:

https://review.openstack.org/#/c/506328/

Guess I should submit a launchpad bug also.

I tested with this specific issue on osp 8 and it resolves it.

Comment 4 Zane Bitter 2017-10-12 14:38:45 UTC
The fix to os-apply-config upstream looks plausible to me; changing component.

Comment 7 Steve Baker 2018-11-05 21:54:01 UTC
The fix landed upstream 9 months ago

Comment 10 Lon Hohberger 2018-11-06 11:44:00 UTC
According to our records, this should be resolved by os-apply-config-8.3.1-0.20180308131116.62fdfc1.el7ost.  This build is available now.

Comment 11 Gurenko Alex 2018-11-08 16:29:44 UTC
Verified on puddle 2018-11-07.3

[stack@undercloud-0 ~]$ rpm -q os-apply-config
os-apply-config-8.3.1-0.20180308131116.62fdfc1.el7ost.noarch


Note You need to log in before you can comment on or make changes to this bug.