Red Hat Bugzilla – Bug 1476233
[RFE] Increment for overcloud nodes
Last modified: 2018-05-24 21:57:59 EDT
Description of problem:
Overcloud index always increments and doesn't do decrement when scaling-down or deleting the node
Version-Release number of selected component (if applicable):
In deployed overcloud, try to re-utilise freed node indexes, such as overcloud-controller-NN
Steps to Reproduce:
1. Deploy an overcloud
2. Remove a node from configuration
3. Try to re-utilise index number that belonged to this host
TripleO skips this number
TripleO re-uses this number
Fundamentally, this is the nature of how Heat ResourceGroup's scale up and down. It would have to be addressed in Heat if this were to be fixed.
AFAIK, Tripleo marks the resources to be removed as blacklisted in RG (when doing overcloud node delete).
Heat has something called 'resource-mark-unhealthy' which would mark the resource (controller or compute index) as CHECK_FAILED, that would be replaced in the next update.
Probably this can be leveraged by Tripleo rather than backlisting resources, though I don't know if there are any other implications from Tripleo when new node uses an old index.
I forgot to mention that there is one drawback of using mark-unhealthy with RG though:
If you have 5 nodes, and you mark 'node-2' as unhealthy and then reduce the count(size) to 4 (expecting that 'node-2' would be removed, it would replace 'node-2' with a new node and chop off 'node-5' from the top, which may be unacceptable.
This is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1426563 - a procedure was tested to enable use of the heat mark-unhealthy feature to do an in-place node replacement that reuses the same IPs etc, but the decision was made to not document that process for general use.
May be we need to revisit that discussion, but if the decision is the same this might be considered a duplicate of that earlier bug because basically the observed behavior is expected.