Bug 1392995 - Replacing a Ceph storage node fails with StackValidationFailed: resources.CephStorageAllNodesDeployment: Property error: CephStorageAllNodesDeployment.Properties.input_values: The Referenced Attribute (CephStorage resource.0.hostname) is incorrect.
Summary: Replacing a Ceph storage node fails with StackValidationFailed: resources.Cep...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 10.0 (Newton)
Assignee: Steven Hardy
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-08 16:20 UTC by Marius Cornea
Modified: 2016-12-14 16:31 UTC (History)
16 users (show)

Fixed In Version: openstack-tripleo-heat-templates-5.0.0-1.7.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-14 16:31:00 UTC


Attachments (Terms of Use)
Logs and templates (143.51 KB, application/x-gzip)
2016-11-09 08:18 UTC, Marius Cornea
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 396280 None None None 2016-11-10 15:19:51 UTC
Launchpad 1640449 None None None 2016-11-09 11:27:44 UTC

Description Marius Cornea 2016-11-08 16:20:19 UTC
Description of problem:

Following the Ceph storage node replacement procedure @
https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Replacing_Ceph_Storage_Nodes

The openstack overcloud node delete step fails with the following error:

overcloud.CephStorageAllNodesDeployment:
  resource_type: OS::Heat::StructuredDeployments
  physical_resource_id: 75b6c232-21c9-46e5-9b46-c07ef8d7b7af
  status: UPDATE_FAILED
  status_reason: |
    StackValidationFailed: resources.CephStorageAllNodesDeployment: Property error: CephStorageAllNodesDeployment.Properties.input_values: The Referenced Attribute (CephStorage resource.0.hostname) is incorrect.
overcloud.ComputeAllNodesDeployment:
  resource_type: OS::Heat::StructuredDeployments
  physical_resource_id: 7927559f-55f1-4c7f-b58d-1fe2fab9705c
  status: UPDATE_FAILED
  status_reason: |
    UPDATE aborted


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-5.0.0-1.3.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with 3 ceph storage nodes
2. Stop one of the Ceph storage nodes
3. Disable and remove from the crush map the OSDs running on the stop node according to the procedure in
https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Replacing_Ceph_Storage_Nodes
4. Delete the Ceph node:
source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates
openstack overcloud node delete --stack overcloud --templates $THT \
-e $THT/environments/network-isolation.yaml \
-e ~/templates/network-environment.yaml \
-e $THT/environments/storage-environment.yaml \
-e ~/templates/disk-layout.yaml \
03915d83-6026-4a4f-9e93-a3807c9e0d8e


Actual results:
Stack update fails with:
    StackValidationFailed: resources.CephStorageAllNodesDeployment: Property error: CephStorageAllNodesDeployment.Properties.input_values: The Referenced Attribute (CephStorage resource.0.hostname) is incorrect.

Expected results:
Stack update completes ok 

Additional info:

Comment 1 James Slagle 2016-11-08 19:13:09 UTC
can you provide:

all your custom templates
heat-api.log, heat-engine.log from the undercloud
plan contents (download the overcloud container contents from swift and tgz that)

Comment 2 James Slagle 2016-11-08 22:32:25 UTC
i'd also be interested which ceph node the uuid 03915d83-6026-4a4f-9e93-a3807c9e0d8e corresponds to. Is it the first one? Does the issue reproduce if you try to delete the last ceph node instead?

Also, for OSP 10, I don't think you have to pass --templates and all the -e's to the node delete command.

Comment 3 James Slagle 2016-11-08 22:34:38 UTC
(In reply to James Slagle from comment #2)

> Also, for OSP 10, I don't think you have to pass --templates and all the
> -e's to the node delete command.

Brad, can you confirm this bit ^?

Comment 4 James Slagle 2016-11-09 01:11:00 UTC
(In reply to James Slagle from comment #3)
> (In reply to James Slagle from comment #2)
> 
> > Also, for OSP 10, I don't think you have to pass --templates and all the
> > -e's to the node delete command.
> 
> Brad, can you confirm this bit ^?

checked with him on irc and he confirmed that you don't need to pass --templates or the -e's anymore to the openstack overcloud node delete command.

Comment 5 Marius Cornea 2016-11-09 08:18:41 UTC
Created attachment 1218844 [details]
Logs and templates

Comment 6 Steven Hardy 2016-11-09 08:21:48 UTC
This is because we now set the bootstrap node for all roles (to enable deployment of any puppet profile which expects to detect the first node in the cluster aka bootstrap node).

Previously only the Controller set this, but now we have a hard-coded reference to node "0" here in the overcloud template:

https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.j2.yaml#L234

 input_values:
        bootstrap_nodeid: {get_attr: [{{role.name}}, resource.0.hostname]}
        bootstrap_nodeid_ip: {get_attr: [{{role.name}}, resource.0.ip_address]}

We need some way for the node delete workflow to change this index when replacing node "0", or another way to detect the first node in the group without using the node name (this looks like an index but I think it's referring to the resource name in the resource group, so it should be e.g "1" after this removal, ideally we'd use a list lookup here instead, perhaps that's a possible way to fix this).

Comment 7 Steven Hardy 2016-11-10 11:45:35 UTC
https://review.openstack.org/#/c/395699/ posted upstream which I believe resolves this issue, done some local testing but feedback welcome.

Comment 8 Marius Cornea 2016-11-10 18:03:37 UTC
(In reply to Steven Hardy from comment #7)
> https://review.openstack.org/#/c/395699/ posted upstream which I believe
> resolves this issue, done some local testing but feedback welcome.

Tested it on my env as well and it looks good.

Comment 16 errata-xmlrpc 2016-12-14 16:31:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.