Bug 1266102 - overcloud node delete after failure can remove too many nodes
overcloud node delete after failure can remove too many nodes
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
high Severity unspecified
: y2
: 7.0 (Kilo)
Assigned To: Jan Provaznik
: Triaged
Depends On:
  Show dependency treegraph
Reported: 2015-09-24 09:16 EDT by Zane Bitter
Modified: 2015-12-21 11:55 EST (History)
7 users (show)

See Also:
Fixed In Version: openstack-tripleo-common-0.0.1.dev6-4.git49b57eb.el7ost
Doc Type: Bug Fix
Doc Text:
The director updates Heat parameters even if a stack-update operation failed. The Heat stack *Count parameters (e.g. ComputeCount) might not reflect the real number of nodes in an Overcloud. This caused the "overcloud node delete" command to delete an incorrect number of nodes. This fix modifies the "overcloud node delete" command to compute the current node count from the real number of servers in ResourceGroup instead of using stack parameters. Now the director deletes the correct number of nodes.
Story Points: ---
Clone Of:
Last Closed: 2015-12-21 11:55:29 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 226682 None None None Never

  None (edit)
Description Zane Bitter 2015-09-24 09:16:49 EDT
Copied from a comment on bug 1255910:

It turns out that under certain circumstances BZ 1258967 is not sufficient fix for this issue [bug 1255910]:
- if a user tried to delete a node and this operation failed before, then using ComputeCount parameter for computing new node count is insufficient.

An upstream patch which counts new node count from actual nodes in ResourceGroup is here:
Comment 4 Udi 2015-11-29 05:58:16 EST
I tried to scale up form 1 compute to 2, and the operation failed (nova returned "Message: Unknown, Code: Unknown"). I then used "nova delete" to delete the failed node. In stack-show I still see ComputeCount = 2, but at least it didn't delete all the computes there are.

How do I bring the stack to a consistent state with ComputeCount = 1? How to verify this bug? Thanks.
Comment 5 Zane Bitter 2015-11-30 16:44:45 EST
Use the "openstack overcloud node delete" command to remove it from Heat's model.
Comment 7 errata-xmlrpc 2015-12-21 11:55:29 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.