Description of problem: Following the Ceph storage node replacement procedure @ https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Replacing_Ceph_Storage_Nodes The openstack overcloud node delete step fails with the following error: overcloud.CephStorageAllNodesDeployment: resource_type: OS::Heat::StructuredDeployments physical_resource_id: 75b6c232-21c9-46e5-9b46-c07ef8d7b7af status: UPDATE_FAILED status_reason: | StackValidationFailed: resources.CephStorageAllNodesDeployment: Property error: CephStorageAllNodesDeployment.Properties.input_values: The Referenced Attribute (CephStorage resource.0.hostname) is incorrect. overcloud.ComputeAllNodesDeployment: resource_type: OS::Heat::StructuredDeployments physical_resource_id: 7927559f-55f1-4c7f-b58d-1fe2fab9705c status: UPDATE_FAILED status_reason: | UPDATE aborted Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-5.0.0-1.3.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud with 3 ceph storage nodes 2. Stop one of the Ceph storage nodes 3. Disable and remove from the crush map the OSDs running on the stop node according to the procedure in https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Replacing_Ceph_Storage_Nodes 4. Delete the Ceph node: source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates openstack overcloud node delete --stack overcloud --templates $THT \ -e $THT/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e $THT/environments/storage-environment.yaml \ -e ~/templates/disk-layout.yaml \ 03915d83-6026-4a4f-9e93-a3807c9e0d8e Actual results: Stack update fails with: StackValidationFailed: resources.CephStorageAllNodesDeployment: Property error: CephStorageAllNodesDeployment.Properties.input_values: The Referenced Attribute (CephStorage resource.0.hostname) is incorrect. Expected results: Stack update completes ok Additional info:
can you provide: all your custom templates heat-api.log, heat-engine.log from the undercloud plan contents (download the overcloud container contents from swift and tgz that)
i'd also be interested which ceph node the uuid 03915d83-6026-4a4f-9e93-a3807c9e0d8e corresponds to. Is it the first one? Does the issue reproduce if you try to delete the last ceph node instead? Also, for OSP 10, I don't think you have to pass --templates and all the -e's to the node delete command.
(In reply to James Slagle from comment #2) > Also, for OSP 10, I don't think you have to pass --templates and all the > -e's to the node delete command. Brad, can you confirm this bit ^?
(In reply to James Slagle from comment #3) > (In reply to James Slagle from comment #2) > > > Also, for OSP 10, I don't think you have to pass --templates and all the > > -e's to the node delete command. > > Brad, can you confirm this bit ^? checked with him on irc and he confirmed that you don't need to pass --templates or the -e's anymore to the openstack overcloud node delete command.
Created attachment 1218844 [details] Logs and templates
This is because we now set the bootstrap node for all roles (to enable deployment of any puppet profile which expects to detect the first node in the cluster aka bootstrap node). Previously only the Controller set this, but now we have a hard-coded reference to node "0" here in the overcloud template: https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.j2.yaml#L234 input_values: bootstrap_nodeid: {get_attr: [{{role.name}}, resource.0.hostname]} bootstrap_nodeid_ip: {get_attr: [{{role.name}}, resource.0.ip_address]} We need some way for the node delete workflow to change this index when replacing node "0", or another way to detect the first node in the group without using the node name (this looks like an index but I think it's referring to the resource name in the resource group, so it should be e.g "1" after this removal, ideally we'd use a list lookup here instead, perhaps that's a possible way to fix this).
https://review.openstack.org/#/c/395699/ posted upstream which I believe resolves this issue, done some local testing but feedback welcome.
(In reply to Steven Hardy from comment #7) > https://review.openstack.org/#/c/395699/ posted upstream which I believe > resolves this issue, done some local testing but feedback welcome. Tested it on my env as well and it looks good.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html