Bug 1266102 - overcloud node delete after failure can remove too many nodes
Summary: overcloud node delete after failure can remove too many nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: y2
: 7.0 (Kilo)
Assignee: Jan Provaznik
QA Contact: Udi Kalifon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-24 13:16 UTC by Zane Bitter
Modified: 2015-12-21 16:55 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-common-0.0.1.dev6-4.git49b57eb.el7ost
Doc Type: Bug Fix
Doc Text:
The director updates Heat parameters even if a stack-update operation failed. The Heat stack *Count parameters (e.g. ComputeCount) might not reflect the real number of nodes in an Overcloud. This caused the "overcloud node delete" command to delete an incorrect number of nodes. This fix modifies the "overcloud node delete" command to compute the current node count from the real number of servers in ResourceGroup instead of using stack parameters. Now the director deletes the correct number of nodes.
Clone Of:
Environment:
Last Closed: 2015-12-21 16:55:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 226682 0 None None None Never
Red Hat Product Errata RHBA-2015:2651 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OSP 7 director Bug Fix Advisory 2015-12-21 21:50:26 UTC

Description Zane Bitter 2015-09-24 13:16:49 UTC
Copied from a comment on bug 1255910:

It turns out that under certain circumstances BZ 1258967 is not sufficient fix for this issue [bug 1255910]:
- if a user tried to delete a node and this operation failed before, then using ComputeCount parameter for computing new node count is insufficient.

An upstream patch which counts new node count from actual nodes in ResourceGroup is here:
https://review.openstack.org/226682

Comment 4 Udi Kalifon 2015-11-29 10:58:16 UTC
I tried to scale up form 1 compute to 2, and the operation failed (nova returned "Message: Unknown, Code: Unknown"). I then used "nova delete" to delete the failed node. In stack-show I still see ComputeCount = 2, but at least it didn't delete all the computes there are.

How do I bring the stack to a consistent state with ComputeCount = 1? How to verify this bug? Thanks.

Comment 5 Zane Bitter 2015-11-30 21:44:45 UTC
Use the "openstack overcloud node delete" command to remove it from Heat's model.

Comment 7 errata-xmlrpc 2015-12-21 16:55:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2651


Note You need to log in before you can comment on or make changes to this bug.