Bug 1544088

Summary:	Run the correct command to delete the overcloud node, but the wrong overcloud node is deleted by heat command
Product:	Red Hat OpenStack	Reporter:	liuwei <wliu>
Component:	openstack-tripleo-common	Assignee:	Rabi Mishra <ramishra>
Status:	CLOSED ERRATA	QA Contact:	Gurenko Alex <agurenko>
Severity:	high	Docs Contact:
Priority:	high
Version:	10.0 (Newton)	CC:	agurenko, akaris, coldford, cshastri, dhill, ebarrera, gkadam, kiyyappa, mburns, mschuppe, pkundal, ramishra, rcernin, rhel-osp-director-maint, sandyada, sbaker, segutier, shardy, slinaber, srevivo, ssmolyak
Target Milestone:	z8	Keywords:	Triaged, ZStream
Target Release:	10.0 (Newton)
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	openstack-tripleo-common-5.4.7-3.el7ost	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-05-17 15:48:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 4 Robin Cernin 2018-02-12 04:16:27 UTC

Version:

OSP10 (Newton)

heat-cfntools-1.3.0-2.el7ost.noarch
openstack-heat-api-7.0.6-1.el7ost.noarch
openstack-heat-api-cfn-7.0.6-1.el7ost.noarch
openstack-heat-api-cloudwatch-7.0.6-1.el7ost.noarch
openstack-heat-common-7.0.6-1.el7ost.noarch
openstack-heat-engine-7.0.6-1.el7ost.noarch
openstack-heat-templates-0-0.14.1e6015dgit.el7ost.noarch
openstack-tripleo-heat-templates-5.3.3-1.el7ost.noarch
openstack-tripleo-heat-templates-compat-2.0.0-58.el7ost.noarch
puppet-heat-9.5.0-2.el7ost.noarch
python-heat-agent-0-0.14.1e6015dgit.el7ost.noarch
python-heat-tests-7.0.6-1.el7ost.noarch
python-heatclient-1.5.2-1.el7ost.noarch
How to reproduce:

0) Scale out with additional node (for example index 10)
1) Node with index 10 is created successfully in Heat database
2) Node with index 10 is assigned instance_uuid in Nova

Now the problem appears that for example: there is HW issue. the node can't boot to disk.

3) Node fails with ERROR in Nova state
4) Try to remove node with 'overcloud node delete ... [instance_uuid]'
5) Heat removes the last node instead

Actual results:

 Heat removes the last node instead

Expected results:

 Heat should remove the node specified and perform update

We guess this behavior is because the stack is already in FAILED state.

Comment 5 Rabi Mishra 2018-02-12 04:50:29 UTC

> We guess this behavior is because the stack is already in FAILED state.

Yes, heat would try to _replace_ all FAILED resources/nodes by default with an stack update.

Assuming that the a node is in FAILED state, I would also expect it to remove the node blacklisted (03c28a44-979b-4ed2-9463-04661df11570) and try replace the node in FAILED state, both.

Comment 33 Gurenko Alex 2018-05-10 12:54:00 UTC

Verified on puddle 2018-05-09.2

Comment 38 errata-xmlrpc 2018-05-17 15:48:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1597