Description of problem: Using the example here: https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/custom-block-storage-back-end-deployment-guide/custom-block-storage-back-end-deployment-guide/ Section 3.2 - /home/stack/templates/custom-config.yaml CinderRestartConfig: # 3 type: OS::Heat::SoftwareConfig properties: config: | #!/bin/sh sudo pcs resource restart openstack-cinder-volume The property: "group: script" is missing. Using this resource in a deployment can create chaos on the deployed nodes. Note: a doc bz has been filed to address the missing config: https://bugzilla.redhat.com/show_bug.cgi?id=1493243 This BZ is for the underlying heat / os-collect-config issues that results from using this mis-configured resource. Once this software deployment runs /etc/os-collect-config.conf on target nodes will be broken; halting any additional software deployments until the stack times out. [root@overcloud-controller-0 ~]# cat /etc/os-collect-config.conf [DEFAULT] command = os-refresh-config On the deploy node the following is found: [root@overcloud-controller-0 ~]# ls /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json* /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json.last /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json.orig [root@overcloud-controller-0 ~]# cat /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json "#!/bin/sh\nsudo pcs resource restart openstack-cinder-volume\n" Deleting these files and running os-apply-config will temporarily fix the os-collect-config.conf file. [root@overcloud-controller-0 ~]# rm -f /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json* [root@overcloud-controller-0 ~]# os-apply-config [2017/09/19 08:46:50 PM] [INFO] writing /etc/os-net-config/config.json [2017/09/19 08:46:50 PM] [INFO] writing /var/run/heat-config/heat-config [2017/09/19 08:46:50 PM] [INFO] writing /etc/puppet/hiera.yaml [2017/09/19 08:46:50 PM] [INFO] writing /etc/os-collect-config.conf [2017/09/19 08:46:50 PM] [INFO] success [root@overcloud-controller-0 ~]# cat /etc/os-collect-config.conf [DEFAULT] command = os-refresh-config collectors = ec2 collectors = cfn collectors = local [cfn] metadata_url = http://172.16.1.1:8000/v1/ stack_name = overcloud-Controller-q6gd3h72wt4g-0-fnmmcrrzext7 secret_access_key = 89d28cccc33748a78c960ac9fe33d133 access_key_id = fc47334db43c4e778945e295d0d5b85d path = Controller.Metadata Here is the heat resource metadata: [stack@undercloud8 templates]$ heat resource-metadata overcloud-Controller-q6gd3h72wt4g-0-fnmmcrrzext7 Controller [...] "group": "Heat::Ungrouped", "name": "overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh", "outputs": [], "creation_time": "2017-09-19T20:31:33", "options": {}, "config": "#!/bin/sh\nsudo pcs resource restart openstack-cinder-volume\n", "id": "10a65d60-3473-4a46-89d7-b0e847e01e46" [...] I assume this "Heat::Ungrouped" is not handled properly. There may be a better way but the work-around I've found: - cancel deployment - restart heat-engine - fix software deployment resource - run new deployment and manually signal all software deployment resources (the old resource seems to hangout in the node's metadata until a second deployment completes). - on all nodes fix os-collect-config.conf - rm -f /var/lib/os-collect-config/*CinderRestartConfig* - os-apply-config - cat /etc/os-collect-config.conf # verify correct config - systemctl restart os-collect-config - run deployment again. Version-Release number of selected component (if applicable): heat on Director: openstack-heat-templates-0-0.2.20170112.el7ost.noarch openstack-heat-api-cfn-5.0.3-2.el7ost.noarch openstack-heat-engine-5.0.3-2.el7ost.noarch openstack-heat-api-cloudwatch-5.0.3-2.el7ost.noarch openstack-heat-api-5.0.3-2.el7ost.noarch openstack-heat-common-5.0.3-2.el7ost.noarch os-collect-config on deployed nodes: os-collect-config-0.1.37-2.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. see above Actual results: heat software deployment is completely broken until manually intervention Expected results: fail friendly...er :) Additional info:
Is this something that could be handled better by os-apply-config?
I submitted the following to address this issue: https://review.openstack.org/#/c/506328/ Guess I should submit a launchpad bug also. I tested with this specific issue on osp 8 and it resolves it.
The fix to os-apply-config upstream looks plausible to me; changing component.
The fix landed upstream 9 months ago
According to our records, this should be resolved by os-apply-config-8.3.1-0.20180308131116.62fdfc1.el7ost. This build is available now.
Verified on puddle 2018-11-07.3 [stack@undercloud-0 ~]$ rpm -q os-apply-config os-apply-config-8.3.1-0.20180308131116.62fdfc1.el7ost.noarch