Description of problem:
Using the example here:
https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/custom-block-storage-back-end-deployment-guide/custom-block-storage-back-end-deployment-guide/
Section 3.2 - /home/stack/templates/custom-config.yaml
CinderRestartConfig: # 3
type: OS::Heat::SoftwareConfig
properties:
config: |
#!/bin/sh
sudo pcs resource restart openstack-cinder-volume
The property: "group: script" is missing.
Using this resource in a deployment can create chaos on the deployed nodes.
Note: a doc bz has been filed to address the missing config: https://bugzilla.redhat.com/show_bug.cgi?id=1493243
This BZ is for the underlying heat / os-collect-config issues that results from using this mis-configured resource.
Once this software deployment runs /etc/os-collect-config.conf on target nodes will be broken; halting any additional software deployments until the stack times out.
[root@overcloud-controller-0 ~]# cat /etc/os-collect-config.conf
[DEFAULT]
command = os-refresh-config
On the deploy node the following is found:
[root@overcloud-controller-0 ~]# ls /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json*
/var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json
/var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json.last
/var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json.orig
[root@overcloud-controller-0 ~]# cat /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json
"#!/bin/sh\nsudo pcs resource restart openstack-cinder-volume\n"
Deleting these files and running os-apply-config will temporarily fix the os-collect-config.conf file.
[root@overcloud-controller-0 ~]# rm -f /var/lib/os-collect-config/overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh.json*
[root@overcloud-controller-0 ~]# os-apply-config
[2017/09/19 08:46:50 PM] [INFO] writing /etc/os-net-config/config.json
[2017/09/19 08:46:50 PM] [INFO] writing /var/run/heat-config/heat-config
[2017/09/19 08:46:50 PM] [INFO] writing /etc/puppet/hiera.yaml
[2017/09/19 08:46:50 PM] [INFO] writing /etc/os-collect-config.conf
[2017/09/19 08:46:50 PM] [INFO] success
[root@overcloud-controller-0 ~]# cat /etc/os-collect-config.conf
[DEFAULT]
command = os-refresh-config
collectors = ec2
collectors = cfn
collectors = local
[cfn]
metadata_url = http://172.16.1.1:8000/v1/
stack_name = overcloud-Controller-q6gd3h72wt4g-0-fnmmcrrzext7
secret_access_key = 89d28cccc33748a78c960ac9fe33d133
access_key_id = fc47334db43c4e778945e295d0d5b85d
path = Controller.Metadata
Here is the heat resource metadata:
[stack@undercloud8 templates]$ heat resource-metadata overcloud-Controller-q6gd3h72wt4g-0-fnmmcrrzext7 Controller
[...]
"group": "Heat::Ungrouped",
"name": "overcloud-ControllerNodesPostDeployment-oyacs2thnul5-ExtraConfig-x66w67lchms7-CinderRestartConfig-7zukzjtujsnh",
"outputs": [],
"creation_time": "2017-09-19T20:31:33",
"options": {},
"config": "#!/bin/sh\nsudo pcs resource restart openstack-cinder-volume\n",
"id": "10a65d60-3473-4a46-89d7-b0e847e01e46"
[...]
I assume this "Heat::Ungrouped" is not handled properly.
There may be a better way but the work-around I've found:
- cancel deployment
- restart heat-engine
- fix software deployment resource
- run new deployment and manually signal all software deployment resources (the old resource seems to hangout in the node's metadata until a second deployment completes).
- on all nodes fix os-collect-config.conf
- rm -f /var/lib/os-collect-config/*CinderRestartConfig*
- os-apply-config
- cat /etc/os-collect-config.conf # verify correct config
- systemctl restart os-collect-config
- run deployment again.
Version-Release number of selected component (if applicable):
heat on Director:
openstack-heat-templates-0-0.2.20170112.el7ost.noarch
openstack-heat-api-cfn-5.0.3-2.el7ost.noarch
openstack-heat-engine-5.0.3-2.el7ost.noarch
openstack-heat-api-cloudwatch-5.0.3-2.el7ost.noarch
openstack-heat-api-5.0.3-2.el7ost.noarch
openstack-heat-common-5.0.3-2.el7ost.noarch
os-collect-config on deployed nodes:
os-collect-config-0.1.37-2.el7ost.noarch
How reproducible:
100%
Steps to Reproduce:
1. see above
Actual results:
heat software deployment is completely broken until manually intervention
Expected results:
fail friendly...er :)
Additional info:
I submitted the following to address this issue:
https://review.openstack.org/#/c/506328/
Guess I should submit a launchpad bug also.
I tested with this specific issue on osp 8 and it resolves it.