Bug 1761364 - Playing with removal policies can lead to deletion of wrong node ...
Summary: Playing with removal policies can lead to deletion of wrong node ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Rabi Mishra
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-14 09:17 UTC by Andreas Karis
Modified: 2020-03-10 11:22 UTC (History)
5 users (show)

Fixed In Version: openstack-tripleo-common-8.7.1-4.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-10 11:22:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
database dump 1 (15.00 MB, application/gzip)
2019-10-14 09:38 UTC, Andreas Karis
no flags Details
database dump b (15.00 MB, application/octet-stream)
2019-10-14 09:54 UTC, Andreas Karis
no flags Details
dump 3/3 (1.51 MB, application/octet-stream)
2019-10-14 11:33 UTC, Andreas Karis
no flags Details
templates (10.79 KB, application/gzip)
2019-10-14 12:48 UTC, Andreas Karis
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 689326 0 'None' MERGED Reset *RemovalPoliciesMode for node delete 2020-12-10 14:59:24 UTC
Red Hat Knowledge Base (Solution) 4232971 0 None None None 2019-10-14 12:51:03 UTC
Red Hat Product Errata RHBA-2020:0760 0 None None None 2020-03-10 11:22:54 UTC

Description Andreas Karis 2019-10-14 09:17:03 UTC
Description of problem:

Hi,

I am currently experimenting with OSP 13 and removal policies. I ran a few scale-outs and scale-downs with different removal policy settings. In order to reset node indexes, I created and included:
~~~
#-e ${template_base_dir}/removal-policies.yaml \
(undercloud) [stack@undercloud-0 ~]$ cat dpdk-ips-from-pool/removal-policies.yaml 
parameter_defaults:
  ComputeOvsDpdkRemovalPolicies: []
  ComputeOvsDpdkRemovalPoliciesMode: update
~~~

After a while, I removed this file:
~~~
(undercloud) [stack@undercloud-0 ~]$ cat overcloud_deploy.sh 
#!/bin/bash
template_base_dir=/home/stack/dpdk-ips-from-pool

if ! [ -f ${template_base_dir}/overcloud_images.yaml ]; then
  openstack overcloud container image prepare \
    --namespace=registry.access.redhat.com/rhosp13 \
    --prefix=openstack- \
    -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/neutron-ovs-dpdk.yaml \
    --set ceph_namespace=registry.access.redhat.com/rhceph \
    --set ceph_image=rhceph-3-rhel7 \
    --tag-from-label {version}-{release} \
    --output-env-file=${template_base_dir}/overcloud_images.yaml
fi
~~~

I then removed compute-0 again, scaled out so that I had compute 1 and compute 2:
~~~
(undercloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| acc313ce-18c8-4ff6-9fae-cb7f5cda6f03 | computeovsdpdk-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| 38552a1e-d0dc-426f-b045-1ad0b111bc2f | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.29 |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
~~~

~~~
(undercloud) [stack@undercloud-0 ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
| id                                   | stack_name | stack_status    | creation_time        | updated_time         | project                          |
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
| f2e1165f-a768-48e7-b6bb-24ac13e11dfb | overcloud  | UPDATE_COMPLETE | 2019-10-08T14:52:05Z | 2019-10-09T14:43:35Z | 1f9155cb6d314ff4b0933d42afd01204 |
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
~~~

I then decided to delete computeovsdpdk-1:
~~~
(undercloud) [stack@undercloud-0 ~]$  openstack overcloud node delete --stack overcloud acc313ce-18c8-4ff6-9fae-cb7f5cda6f03
Deleting the following nodes from stack overcloud:
- acc313ce-18c8-4ff6-9fae-cb7f5cda6f03
Started Mistral Workflow tripleo.scale.v1.delete_node. Execution ID: 5c78e3a3-c8e0-4e8c-a59a-92769bd36b59
Waiting for messages on queue 'tripleo' with no timeout.
~~~

Once the node deletion was done, heat had scaled down compute-2 and **replaced** it with compute-0!!
~~~
(undercloud) [stack@undercloud-0 ~]$ nova list
ironic node-list
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
~~~

Comment 1 Andreas Karis 2019-10-14 09:38:25 UTC
Created attachment 1625517 [details]
database dump 1

Comment 2 Andreas Karis 2019-10-14 09:54:01 UTC
Created attachment 1625521 [details]
database dump b

Comment 3 Andreas Karis 2019-10-14 11:33:56 UTC
Created attachment 1625573 [details]
dump 3/3

Comment 4 Rabi Mishra 2019-10-14 12:11:06 UTC
We use PATCH update of heat stack in TripleO. Removing removal-policies.yaml would not reset 'ComputeOvsDpdkRemovalPoliciesMode'. You've to change it explicitly to 'append' in a subsequent update.

Here is what has happened. All along the policy mode has been 'update'.

1. Initial update after removing the file, empty blacklist
2. Node delete of computeovsdpdk-0, count is reduced to 2,  blacklist is reset (nothing to do) and ['0'] added. So you get indexes '1' and '2'.
3. Node delete of computeovsdpdk-1, count is reduced to 1, blacklist is reset ('0' removed) and ['1'] added. So you get index ['0'].

Using node delete reduces the role count automatically. So better set 'ComputeOvsDpdkRemovalPolicies' parameter.

Resetting 'ComputeOvsDpdkRemovalPoliciesMode' to 'append' (default) would have given you only index '2' in step 3.

This is expected behaviour.

Comment 5 Andreas Karis 2019-10-14 12:35:24 UTC
Hi,

I totally understand how PATCH updates work. I kept the setting on purpose. Because that's what users will *likely* do. It's easy to forget to set the setting back, by just removing an environment file or commenting the parameter.

However, RemovalPoliciesMode and RemovalPolicies are super dangerous. How can we tell users to run this in production if something like this so easily happens? If they keep RemovalPoliciesMode "update" and then they run a node delete, possibly weeks later, and we just remove an existing node with important VMs on it?

I really can't accept the answer "This is expected behaviour" when this feature has the potential to wreak havoc in customer environments.

- Andreas

Comment 6 Andreas Karis 2019-10-14 12:42:13 UTC
I reran the same test again just to visualize it:

Baseline:
~~~
(undercloud) [stack@undercloud-0 ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
| id                                   | stack_name | stack_status    | creation_time        | updated_time         | project                          |
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
| f2e1165f-a768-48e7-b6bb-24ac13e11dfb | overcloud  | UPDATE_COMPLETE | 2019-10-08T14:52:05Z | 2019-10-14T09:46:35Z | 1f9155cb6d314ff4b0933d42afd01204 |
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
(undercloud) [stack@undercloud-0 ~]$ openstack stack show overcloud | grep RemovalPolicies | cut -b1-200
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]'
|                       | ComputeOvsDpdkRemovalPoliciesMode: update
|                       | ControllerRemovalPolicies: '[]'
|                       | ControllerRemovalPoliciesMode: append
(undercloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| 64112b16-c73c-41b6-b278-341ef6f3b908 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
(undercloud) [stack@undercloud-0 ~]$  openstack overcloud node delete --stack overcloud 383d0aeb-614c-41b5-8219-2a16c3f9923c
Deleting the following nodes from stack overcloud:
- 383d0aeb-614c-41b5-8219-2a16c3f9923c
Started Mistral Workflow tripleo.scale.v1.delete_node. Execution ID: e8c4c624-fba5-490e-993b-e2e8988be0c0
(...)
~~~

So now node count gets internally set to 1, and I instructed tripleo to delete  383d0aeb-614c-41b5-8219-2a16c3f9923c (computeovsdpdk-0).

Here's what happens during the update:
~~~
(undercloud) [stack@undercloud-0 ~]$  while true ; do nova list; ironic node-list  ; openstack stack show overcloud | grep RemovalPolicies | cut -b1-200; sleep 30 ; done
 +--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| 64112b16-c73c-41b6-b278-341ef6f3b908 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 383d0aeb-614c-41b5-8219-2a16c3f9923c | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 64112b16-c73c-41b6-b278-341ef6f3b908 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: update                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| 64112b16-c73c-41b6-b278-341ef6f3b908 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 383d0aeb-614c-41b5-8219-2a16c3f9923c | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 64112b16-c73c-41b6-b278-341ef6f3b908 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: update                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| 64112b16-c73c-41b6-b278-341ef6f3b908 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 383d0aeb-614c-41b5-8219-2a16c3f9923c | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 64112b16-c73c-41b6-b278-341ef6f3b908 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
~~~

So we can first see that the RemovalPolicies are updated. ['1'] is replaced with ['0']. Removal policies are now index ['0']. Note that at no point did I specify the UUID for computeovsdpdk-2. I also did not remove index 2 or blacklist index 2 in the RemovalPolicies.

Heat now continues and deletes computeovsdpdk-0 as instructed. But it also deletes computeovsdpdk-2. Why? For a human administrator who doesn't understand (and shouldn't) how heat internally operates, this seems completely out of place. And it replaces computeovsdpdk-2 with a new node. Given that the RemovalPolicy blacklists -0, of course this is -1:
~~~
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| 64112b16-c73c-41b6-b278-341ef6f3b908 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 383d0aeb-614c-41b5-8219-2a16c3f9923c | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 64112b16-c73c-41b6-b278-341ef6f3b908 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: update                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | BUILD  | spawning   | NOSTATE     | ctlplane=192.168.24.13 |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | None                                 | power off   | available          | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power off   | deploying          | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
~~~

After all of this, the user ends up with:
~~~
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: update                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | None                                 | power off   | available          | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
~~~

This is an extremely dangerous feature.

Comment 7 Andreas Karis 2019-10-14 12:48:42 UTC
Created attachment 1625608 [details]
templates

Comment 8 Andreas Karis 2019-10-14 13:00:47 UTC
I hope that my concern makes sense. If a user decides to delete node X, our software can't go in and remove nodes X and Y. The idea behind exposing RemovalPolicies was that users had an easy was to reuse node indexes. That's a feature that users have long been looking for. But it seems that if they are not extremely cautious, they can involuntarily, and quite easily through omission, instruct Director to rebuild another (or several other?) nodes, as well.

Comment 9 Andreas Karis 2019-10-14 13:05:02 UTC
Also note that computeovsdpdk-1 was in the above case (comment 6) rebuilt on:
 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34

That's the same ironic node that computeovsdpdk-2 was on. If this was a production environment, all data on computeovsdpdk-2 would now have been lost forever.

Comment 10 Rabi Mishra 2019-10-14 13:48:40 UTC
> I really can't accept the answer "This is expected behaviour" when this feature has the potential to wreak havoc in customer environments.

Well, I don't know what's the expectation here. 'RemovalPolicies/RemovalPoliciesMode' won't work together with node delete and surely would lead to issues. Also, 'node delete' is really buggy and we never suggest it to customers (probably should be deprecated and removed). 

I would suggest customers to explicitly set blacklists and node counts, rather than using node delete.

Comment 11 Rabi Mishra 2019-10-14 14:00:58 UTC
We probably should update the documentation and kb articles.

Comment 12 Andreas Karis 2019-10-14 14:10:34 UTC
Hi Rabi, 

The official documentation for OSP 13 has:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/index#sect-Removing_Compute_Nodes
~~~
(undercloud) $ openstack overcloud node delete --stack [STACK_UUID] [NODE1_UUID] [NODE2_UUID] [NODE3_UUID]
~~~

The official documentation for hypverconverged has:
https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_cloud/13/html/operations_guide/removing-a-node-from-the-overcloud#removing-the-nova-compute-services-from-the-overcloud-rhhi
~~~
 Delete the compute node by UUID from the overcloud:

openstack overcloud node delete --stack OSP_NAME NOVA_UUID
~~~


If that's buggy, then we have a huge issue in our documentation. 

What's the correct way for deleting nodes? And what's the correct way for deleting nodes and preserving their index, so that customers can reuse it? How should I modify https://access.redhat.com/solutions/4232971 so that it isn't risky?

Comment 13 Andreas Karis 2019-10-14 14:12:48 UTC
FYI, I ran a scale-out and set the policy to "append". Now, I'm deleting a node, to reproduce (or not) the initial issue:
~~~
(undercloud) [stack@undercloud-0 ~]$ nova list
ironic node-list
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
| 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
(undercloud) [stack@undercloud-0 ~]$ ironic node-list
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
(undercloud) [stack@undercloud-0 ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
| id                                   | stack_name | stack_status    | creation_time        | updated_time         | project                          |
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
| f2e1165f-a768-48e7-b6bb-24ac13e11dfb | overcloud  | UPDATE_COMPLETE | 2019-10-08T14:52:05Z | 2019-10-14T13:01:55Z | 1f9155cb6d314ff4b0933d42afd01204 |
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
(undercloud) [stack@undercloud-0 ~]$ openstack stack show overcloud | grep RemovalPolicies | cut -b1-200
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]'
|                       | ComputeOvsDpdkRemovalPoliciesMode: append
|                       | ControllerRemovalPolicies: '[]'
|                       | ControllerRemovalPoliciesMode: append
(undercloud) [stack@undercloud-0 ~]$  openstack overcloud node delete --stack overcloud 46c3670a-f86a-4b10-a2ef-4ffc018ee565
Deleting the following nodes from stack overcloud:
- 46c3670a-f86a-4b10-a2ef-4ffc018ee565
Started Mistral Workflow tripleo.scale.v1.delete_node. Execution ID: 5a197c80-249c-43d6-9a83-935cd313dcc2
Waiting for messages on queue 'tripleo' with no timeout.

~~~

Comment 14 Andreas Karis 2019-10-14 14:24:33 UTC
That new test doesn't start out too well:
~~~
(undercloud) [stack@undercloud-0 ~]$  while true ; do nova list; ironic node-list  ; openstack stack show overcloud | grep RemovalPolicies | cut -b1-200; sleep 30 ; done | tee output.txt
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
| 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
| 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
| 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
| 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
~~~

Even though I flipped PoliciesMode back to "append" on the last scale-out, the resource list gets updated from ['0'] to ['1']. Shouldn't it append to ['0','1'] ???

Comment 15 Rabi Mishra 2019-10-14 14:39:20 UTC
> What's the correct way for deleting nodes?

- Update(append indexes) *RemovalPolicies to delete nodes (User have to identify the index for a nova instance)
- Toggle *RemovalPolicyMode to reset the blacklist as only updating *RemovalPolicies won't do it.

> If that's buggy, then we have a huge issue in our documentation. 

Both *RemovalPolicies and 'node delete' are used to blacklist nodes and using both in 'non-append' mode can overwrite stuff and lead to issues. As far as 'node delete' is concerned it  definitely has issues in corner cases. But it's been used by users till date.


> Even though I flipped PoliciesMode back to "append" on the last scale-out, the resource list gets updated from ['0'] to ['1']. Shouldn't it append to ['0','1'] ???

Parameters won't change, the blacklist the db would change with 'node delete'.

Comment 16 Andreas Karis 2019-10-14 14:44:02 UTC
That looks better indeed with append:

+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
| 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on    | active             | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | deleting   | Running     |                        |
| 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on    | deleting           | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                 

|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| ID                                   | Name             | Status | Task State | Power State | Networks               |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
| 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 4141033e-83de-489d-8856-9e068324fe5c | controller-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.26 |
+--------------------------------------+------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33   | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on    | active             | False       |
| 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34   | None                                 | power off   | available          | False       |
| 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on    | active             | False       |
| 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on    | active             | False       |
| 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
^C
(undercloud) [stack@undercloud-0 ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
| id                                   | stack_name | stack_status    | creation_time        | updated_time         | project                          |
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
| f2e1165f-a768-48e7-b6bb-24ac13e11dfb | overcloud  | UPDATE_COMPLETE | 2019-10-08T14:52:05Z | 2019-10-14T14:14:02Z | 1f9155cb6d314ff4b0933d42afd01204 |
+--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+
(undercloud) [stack@undercloud-0 ~]$

Comment 17 Andreas Karis 2019-10-17 14:29:36 UTC
Hi,

I really don't understand how the RemovalPolicies work.

In the following example, all nodes are from the same role. Just hostname mapping is different for the first 3 computes ...

I had compute-0, compute-1, compute-2. Then, I deleted compute 1:
~~~
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
| ID                                   | Name                              | Status | Task State | Power State | Networks               |
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
| 5fe2160a-0665-445e-88ce-d73d50e10042 | 0123456789-overcloud-compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.35 |
| 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | 0123456789-overcloud-compute-2    | ACTIVE | -          | Running     | ctlplane=192.168.24.19 |
| 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | 0123456789-overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| e6603878-6909-4dce-8525-cdee1da02a63 | 0123456789-overcloud-controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| f9774cb6-837a-44ca-8a66-4de0e5650c3f | 0123456789-overcloud-controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| fe4b5498-26e6-4893-b26f-5dd5ca2f6881 | compute-30   | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | power on    | active             | False       |
| 690a7161-c893-49b8-b09e-73bc177a1efc | compute-31   | None                                 | power off   | available          | False       |
| 47059e88-1b4d-47e2-bdcf-a2641b7e5303 | compute-32   | 5fe2160a-0665-445e-88ce-d73d50e10042 | power on    | active             | False       |
| 80afb6f5-a855-4117-ab77-607f4a429ebf | controller-0 | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | power on    | active             | False       |
| af0c1a1f-ffba-4080-b8ae-049d3d2197d7 | controller-1 | e6603878-6909-4dce-8525-cdee1da02a63 | power on    | active             | False       |
| 23b0b128-8085-4cb0-b473-35be4cb3778f | controller-2 | f9774cb6-837a-44ca-8a66-4de0e5650c3f | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append 
~~~

So the Policies contains resource_list 1, that makes sense.

Then, I deleted -0 and the removal policies went to '0' ...
~~~
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| fe4b5498-26e6-4893-b26f-5dd5ca2f6881 | compute-30   | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | power on    | active             | False       |
| 690a7161-c893-49b8-b09e-73bc177a1efc | compute-31   | None                                 | power off   | available          | False       |
| 47059e88-1b4d-47e2-bdcf-a2641b7e5303 | compute-32   | None                                 | power off   | available          | False       |
| 80afb6f5-a855-4117-ab77-607f4a429ebf | controller-0 | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | power on    | active             | False       |
| af0c1a1f-ffba-4080-b8ae-049d3d2197d7 | controller-1 | e6603878-6909-4dce-8525-cdee1da02a63 | power on    | active             | False       |
| 23b0b128-8085-4cb0-b473-35be4cb3778f | controller-2 | f9774cb6-837a-44ca-8a66-4de0e5650c3f | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append                                                                                                                                         
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
| ID                                   | Name                              | Status | Task State | Power State | Networks               |
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
| 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | 0123456789-overcloud-compute-2    | ACTIVE | -          | Running     | ctlplane=192.168.24.19 |
| 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | 0123456789-overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| e6603878-6909-4dce-8525-cdee1da02a63 | 0123456789-overcloud-controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| f9774cb6-837a-44ca-8a66-4de0e5650c3f | 0123456789-overcloud-controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
~~~

Then, I scaled out and it remained on '0' ... Why didn't it deploy node -1, then ?!
~~~
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
| 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | 0123456789-overcloud-compute-2    | ACTIVE | -          | Running     | ctlplane=192.168.24.19 |
| 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | 0123456789-overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| e6603878-6909-4dce-8525-cdee1da02a63 | 0123456789-overcloud-controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| f9774cb6-837a-44ca-8a66-4de0e5650c3f | 0123456789-overcloud-controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| cd1fe4d7-462c-4903-8602-585d3bbfe0b5 | computeovsdpdk-3                  | ACTIVE | -          | Running     | ctlplane=192.168.24.14 |
| 2ec6da5a-4221-4d5f-9d5f-459a5bfa15ed | computeovsdpdk-4                  | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| fe4b5498-26e6-4893-b26f-5dd5ca2f6881 | compute-30   | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | power on    | active             | False       |
| 690a7161-c893-49b8-b09e-73bc177a1efc | compute-31   | cd1fe4d7-462c-4903-8602-585d3bbfe0b5 | power on    | active             | False       |
| 47059e88-1b4d-47e2-bdcf-a2641b7e5303 | compute-32   | 2ec6da5a-4221-4d5f-9d5f-459a5bfa15ed | power on    | active             | False       |
| 80afb6f5-a855-4117-ab77-607f4a429ebf | controller-0 | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | power on    | active             | False       |
| af0c1a1f-ffba-4080-b8ae-049d3d2197d7 | controller-1 | e6603878-6909-4dce-8525-cdee1da02a63 | power on    | active             | False       |
| 23b0b128-8085-4cb0-b473-35be4cb3778f | controller-2 | f9774cb6-837a-44ca-8a66-4de0e5650c3f | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append   
~~~

Then, I deleted compute-3 and [{"resource_list": ["3"]}] ...
~~~
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
| ID                                   | Name                              | Status | Task State | Power State | Networks               |
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
| 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | 0123456789-overcloud-compute-2    | ACTIVE | -          | Running     | ctlplane=192.168.24.19 |
| 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | 0123456789-overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| e6603878-6909-4dce-8525-cdee1da02a63 | 0123456789-overcloud-controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| f9774cb6-837a-44ca-8a66-4de0e5650c3f | 0123456789-overcloud-controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| 2ec6da5a-4221-4d5f-9d5f-459a5bfa15ed | computeovsdpdk-4                  | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
+--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+
The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead.
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| fe4b5498-26e6-4893-b26f-5dd5ca2f6881 | compute-30   | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | power on    | active             | False       |
| 690a7161-c893-49b8-b09e-73bc177a1efc | compute-31   | None                                 | power off   | available          | False       |
| 47059e88-1b4d-47e2-bdcf-a2641b7e5303 | compute-32   | 2ec6da5a-4221-4d5f-9d5f-459a5bfa15ed | power on    | active             | False       |
| 80afb6f5-a855-4117-ab77-607f4a429ebf | controller-0 | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | power on    | active             | False       |
| af0c1a1f-ffba-4080-b8ae-049d3d2197d7 | controller-1 | e6603878-6909-4dce-8525-cdee1da02a63 | power on    | active             | False       |
| 23b0b128-8085-4cb0-b473-35be4cb3778f | controller-2 | f9774cb6-837a-44ca-8a66-4de0e5650c3f | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
|                       | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["3"]}]'                                                                                                                   
|                       | ComputeOvsDpdkRemovalPoliciesMode: append                                                                                                                                     
|                       | ControllerRemovalPolicies: '[]'                                                                                                                                               
|                       | ControllerRemovalPoliciesMode: append 
~~~

Question 1:
How do I deploy ID -0 again, with the above setup?

Question 2:
How do I deploy ID -1 again, with the above setup?

Question 3:
How do I deploy ID -3 again, with the above setup?

Thanks,

Andreas

Comment 18 James Slagle 2019-10-17 15:20:13 UTC
Based on what you've described, I don't see any new issue or unexpected change in behavior. However, we'd need to see the commands you actually ran and the templates used to see if there's an actual issue here.

As you describe it, I don't directly see a problem. Heat won't reuse previously deleted indexes, which is what I think you're asking for. You can certainly use a previously used hostname though by mapping the new index to hostname you want. For instance, if you wanted to reuse the compute-1 hostname, but the next index is actually "-5", then HostnameMap would need something like:

overcloud-compute-5: compute-1

Likewise for the FixedIp's list, if you're using those. Meaning the IP's for the new compute-1 would need to be in the list position for overcloud-compute-5.

The switching of the hostname format in the middle of the scale up/down seems to be unnecessarily complicating things, but I don't think that's the direct cause of any issue.

Comment 19 Andreas Karis 2019-10-17 15:45:50 UTC
Hi James,

This bugreport is about using RemovalPolicies. This BZ has a full database backup + templates attached.  This is about documenting https://access.redhat.com/solutions/4232971 in a way so that our customers can make use of this feature. However, I found out during my tests that setting the RemovalPoliciesMode to 'update', then reusing an index, and then forgetting to set it back to 'append' will cause a node replacement **of the wrong node** when running openstack overcloud node delete, again. That's described in detail in my posts. That's well described, with full templates, database backup, until comment 9. 

Rabi then said that: a) I need to flap back to 'append' from 'update', otherwise openvstack overcloud node delete will cause issues. E.g. replacing the wrong node. He said that everything worked as designed. Comment 10,11,15.

I then showed that flipping it back to append indeed does not replace the wrong node: Comments 13,14,16

I then asked Rabi: if node delete is indeed buggy, what is the correct way to *not* use node delete and how to update our documentation, which recommends to use node delete.

Also, I'm still looking for how to reuse node indexes. This entire BZ is about finding a procedure to use RemovalPolicies so that users can reuse the internal indexes: https://access.redhat.com/solutions/4232971 
I created a more complex scenario and want to use this feature in a save way. The concrete question is in: comment 17

I also wonder:
* if overcloud node delete is buggy and we should use another command, why is this not documented and the first time I hear about this
* we explicitly exposed RemovalPolicies via environment files in tripleo so that users can reuse indexes. Again, https://access.redhat.com/solutions/4232971 --- but the feature seems to be very dangerous to me (see above). So I need to find a way to document this, so that our users do not shoot themselves in the foot. I also would like to find some more complex scenarios for how to use this feature. E.g., in a scenario where indexes 0,1 and 3 are removed, how can users reuse index 1. I'm talking about internal indexes. 

I'm well aware of the workarounds with FixedIPs and HostnameMaps. However, these are insufficient and not satisfying for a very large number of our customers. Note that this ticket is not about using HostnameMaps. That's just unfortunate in my last environment as I reused a different lab environment which had it. The task here is: how to reuse internal node indexes safely with RemovalPolicies.

Comment 20 Andreas Karis 2019-10-17 15:48:34 UTC
Just relized that the overcloud_deploy.sh may be missing. It's:

openstack overcloud deploy --templates \
-r ${template_base_dir}/roles_data.yaml \
-e ${template_base_dir}/overcloud_images.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/host-config-and-reboot.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/neutron-ovs-dpdk.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ovs-dpdk-permissions.yaml \
-e ${template_base_dir}/network-environment.yaml \
-e ${template_base_dir}/node-count.yaml \
-e ${template_base_dir}/dpdk-conf.yaml \
-e ${template_base_dir}/custom-hostnames.yaml \
--log-file /home/stack/overcloud_install.log

#  +/-  -e ${template_base_dir}/removal-policies.yaml \

Comment 21 James Slagle 2019-10-17 16:20:52 UTC
Thanks for the background information as it was previously not clear to me based on the summary of the bug. Sound like you're trying to validate the procedure from https://access.redhat.com/solutions/4232971 and while doing so, found an issue with node deletion.

I'll re-assign the bug so it can be prioritized correctly.

Comment 23 Rabi Mishra 2019-10-18 05:44:45 UTC
I thought I clarified it in my earlier comments..

> Then, I deleted -0 and the removal policies went to '0'

I think you're making a mistake of looking at the stack parameters. In the 'append' mode effective blacklisting is the union of blacklist history (in heat db) and what's in *RemovalPolicies parameter.

balcklist=['0', '1'], count=1, nodes=['2']

> Then, I scaled out and it remained on '0' ... Why didn't it deploy node -1, then ?!

balcklist=['0', '1'], count=3, nodes=['2', '3', '4']

> Then, I deleted compute-3 and [{"resource_list": ["3"]}] ...

blacklist=['0', '1', '3'], count=2, nodes=['2', '4']

> How do I deploy ID -0 again, with the above setup?

Scaling out with below would flush blacklist history and use the ones you provided with parameters.

ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1", "3"]}]'                                                                                                                   
ComputeOvsDpdkRemovalPoliciesMode: update 
ComputeCount: 3

nodes=['0', '2', '4'] blacklist=["1", "3"]

> How do I deploy ID -1 again, with the above setup?

ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0", "3"]}]'                                                                                                                   
ComputeOvsDpdkRemovalPoliciesMode: update 
ComputeCount: 3

nodes=['1', '2', '4'] blacklist=["0", "3"]

> How do I deploy ID -3 again, with the above setup?

ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0", "1"]}]'                                                                                                                   
ComputeOvsDpdkRemovalPoliciesMode: update 
ComputeCount: 3

nodes=['2', '3', '4'] blacklist=["0", "1"]


Unless you increase the ComputeCount, when using 'update' mode, it would remove higher indexes to achieve the count. 'node delete' decreases the count along with changing the blacklist, so with 'update' mode you can end up with unsatisfactory results. Let me explain that...

Suppose you've count=3, nodes=['0', '1', '2'], blacklist=[], policy=append

a. node delete  ['0']

count=2, nodes=['1', '2'], blacklist=['0'], policy=append, *RemovalPolicies: [{"resource_list": ["0"]}]

b. node delete ['1']

count=1, nodes=['2'], blacklist=['0', '1'], policy=append, *RemovalPolicies: [{"resource_list": ["1"]}]

Note: node delete resets the *RemovalPolicies parameter and it works fine as long as you use 'append' mode as the blacklist history is in heat database.

c. overcloud update with policy mode change to 'update' and no increase in *Count

count=1, nodes=['0'], blacklist=['1'], policy=update, *RemovalPolicies: [{"resource_list": ["1"]}]

Note: Because it's a PATCH update, it would use the *RemovalPolicies parameter set during last node delete, As you're asking to flush the blacklist history and have a count=1, '2' would be deleted and '0' created. But probably user would not have expected it.

We should explicitly mention that when using 'update' (flush all history of blacklisting) mode you should reset  *RemovalPolicies and increase the *Count as required to not see any undesired behaviour, as it will always flush the blacklist history in the heat database.

d. overcloud update with *Count: 3

count=3, nodes=['0', '2', '3'], blacklist=['1'], policy=update, *RemovalPolicies: [{"resource_list": ["1"]}]

e. node delete ['3']

count=2, nodes=['0', '1'], blacklist=['3'], policy=update, *RemovalPolicies: [{"resource_list": ["3"]}]

both 2, 3 would be deleted and '1' would be created and this is logical ad you're only asking for 2 nodes and to flush the blacklist history

That's the reason, I mentioned earlier to not use 'node delete' when you've set the removal mode to 'update'. It would be good to keep track of all blacklists in the template itself by appending to *RemovalPolicies and use simple stack updates.

Comment 24 Rabi Mishra 2019-10-18 06:55:39 UTC
> if overcloud node delete is buggy and we should use another command, why is this not documented and the first time I hear about this

Well, overcloud node delete resets *Count and *RemovalPolicies, parameters which would be different from what's in the template and can result in issues. We've been working around that till date.

Point (c) in the last comment is not correct as it would use node count=3 in the template, so it would result in what's mentioned in (d). 

Updated example I used above:

Suppose you've count=3, nodes=['0', '1', '2'], blacklist=[], policy=append

a. node delete ['0']

count=2, nodes=['1', '2'], blacklist=['0'], mode=append, *RemovalPolicies: [{"resource_list": ["0"]}]

b. node delete ['1']

count=1, nodes=['2'], blacklist=['0', '1'], mode=append, *RemovalPolicies: [{"resource_list": ["1"]}]

c. overcloud update with policy mode change to 'update' (It will use count=3 in template)

count=3, nodes=['0', '2', '3'], blacklist=['1'], mode=update, *RemovalPolicies: [{"resource_list": ["1"]}]

d. node delete ['3']

count=2, nodes=['0', '1'], blacklist=['3'], mode=update, *RemovalPolicies: [{"resource_list": ["3"]}]

both 2, 3 would be deleted and '1' would be created. This seems logical as you're only asking for 2 nodes and to flush the blacklist history.

However, we should force mode=append with 'node delete' to avoid it.

count=2, nodes=['0', '2'], blacklist=['1', '3'], mode=append, *RemovalPolicies: [{"resource_list": ["3"]}]


I've submitted a patch to force the mode to 'append' for 'node delete' https://review.opendev.org/#/c/689326/

Comment 25 Andreas Karis 2019-10-18 10:43:16 UTC
Rabi, 

I made your comments public, I hope that's o.k. - that way, my customer can follow up.

I'll work on the knowledge base article as well.

Thanks,

- Andreas

Comment 28 errata-xmlrpc 2020-03-10 11:22:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760


Note You need to log in before you can comment on or make changes to this bug.