Bug 1266102

Summary: overcloud node delete after failure can remove too many nodes
Product: Red Hat OpenStack Reporter: Zane Bitter <zbitter>
Component: openstack-tripleoAssignee: Jan Provaznik <jprovazn>
Status: CLOSED ERRATA QA Contact: Udi Kalifon <ukalifon>
Severity: unspecified Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: calfonso, dmacpher, mburns, ohochman, rhel-osp-director-maint, yeylon, zbitter
Target Milestone: y2Keywords: Triaged
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-0.0.1.dev6-4.git49b57eb.el7ost Doc Type: Bug Fix
Doc Text:
The director updates Heat parameters even if a stack-update operation failed. The Heat stack *Count parameters (e.g. ComputeCount) might not reflect the real number of nodes in an Overcloud. This caused the "overcloud node delete" command to delete an incorrect number of nodes. This fix modifies the "overcloud node delete" command to compute the current node count from the real number of servers in ResourceGroup instead of using stack parameters. Now the director deletes the correct number of nodes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-21 16:55:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zane Bitter 2015-09-24 13:16:49 UTC
Copied from a comment on bug 1255910:

It turns out that under certain circumstances BZ 1258967 is not sufficient fix for this issue [bug 1255910]:
- if a user tried to delete a node and this operation failed before, then using ComputeCount parameter for computing new node count is insufficient.

An upstream patch which counts new node count from actual nodes in ResourceGroup is here:
https://review.openstack.org/226682

Comment 4 Udi Kalifon 2015-11-29 10:58:16 UTC
I tried to scale up form 1 compute to 2, and the operation failed (nova returned "Message: Unknown, Code: Unknown"). I then used "nova delete" to delete the failed node. In stack-show I still see ComputeCount = 2, but at least it didn't delete all the computes there are.

How do I bring the stack to a consistent state with ComputeCount = 1? How to verify this bug? Thanks.

Comment 5 Zane Bitter 2015-11-30 21:44:45 UTC
Use the "openstack overcloud node delete" command to remove it from Heat's model.

Comment 7 errata-xmlrpc 2015-12-21 16:55:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2651