Bug 1544088

Summary: Run the correct command to delete the overcloud node, but the wrong overcloud node is deleted by heat command
Product: Red Hat OpenStack Reporter: liuwei <wliu>
Component: openstack-tripleo-commonAssignee: Rabi Mishra <ramishra>
Status: CLOSED ERRATA QA Contact: Gurenko Alex <agurenko>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: agurenko, akaris, coldford, cshastri, dhill, ebarrera, gkadam, kiyyappa, mburns, mschuppe, pkundal, ramishra, rcernin, rhel-osp-director-maint, sandyada, sbaker, segutier, shardy, slinaber, srevivo, ssmolyak
Target Milestone: z8Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-common-5.4.7-3.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-17 15:48:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 Robin Cernin 2018-02-12 04:16:27 UTC
Version:

OSP10 (Newton)

heat-cfntools-1.3.0-2.el7ost.noarch
openstack-heat-api-7.0.6-1.el7ost.noarch
openstack-heat-api-cfn-7.0.6-1.el7ost.noarch
openstack-heat-api-cloudwatch-7.0.6-1.el7ost.noarch
openstack-heat-common-7.0.6-1.el7ost.noarch
openstack-heat-engine-7.0.6-1.el7ost.noarch
openstack-heat-templates-0-0.14.1e6015dgit.el7ost.noarch
openstack-tripleo-heat-templates-5.3.3-1.el7ost.noarch
openstack-tripleo-heat-templates-compat-2.0.0-58.el7ost.noarch
puppet-heat-9.5.0-2.el7ost.noarch
python-heat-agent-0-0.14.1e6015dgit.el7ost.noarch
python-heat-tests-7.0.6-1.el7ost.noarch
python-heatclient-1.5.2-1.el7ost.noarch
How to reproduce:

0) Scale out with additional node (for example index 10)
1) Node with index 10 is created successfully in Heat database
2) Node with index 10 is assigned instance_uuid in Nova

Now the problem appears that for example: there is HW issue. the node can't boot to disk.

3) Node fails with ERROR in Nova state
4) Try to remove node with 'overcloud node delete ... [instance_uuid]'
5) Heat removes the last node instead

Actual results:

 Heat removes the last node instead

Expected results:

 Heat should remove the node specified and perform update

We guess this behavior is because the stack is already in FAILED state.

Comment 5 Rabi Mishra 2018-02-12 04:50:29 UTC
> We guess this behavior is because the stack is already in FAILED state.

Yes, heat would try to _replace_ all FAILED resources/nodes by default with an stack update.

Assuming that the a node is in FAILED state, I would also expect it to remove the node blacklisted (03c28a44-979b-4ed2-9463-04661df11570) and try replace the node in FAILED state, both.

Comment 33 Gurenko Alex 2018-05-10 12:54:00 UTC
Verified on puddle 2018-05-09.2

Comment 38 errata-xmlrpc 2018-05-17 15:48:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1597