Bug 1761364
Summary: | Playing with removal policies can lead to deletion of wrong node ... | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Andreas Karis <akaris> | ||||||||||
Component: | openstack-tripleo-common | Assignee: | Rabi Mishra <ramishra> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Arik Chernetsky <achernet> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 13.0 (Queens) | CC: | jschluet, jslagle, mburns, ramishra, slinaber | ||||||||||
Target Milestone: | --- | Keywords: | Reopened, Triaged, ZStream | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | openstack-tripleo-common-8.7.1-4.el7ost | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2020-03-10 11:22:02 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Andreas Karis
2019-10-14 09:17:03 UTC
Created attachment 1625517 [details]
database dump 1
Created attachment 1625521 [details]
database dump b
Created attachment 1625573 [details]
dump 3/3
We use PATCH update of heat stack in TripleO. Removing removal-policies.yaml would not reset 'ComputeOvsDpdkRemovalPoliciesMode'. You've to change it explicitly to 'append' in a subsequent update. Here is what has happened. All along the policy mode has been 'update'. 1. Initial update after removing the file, empty blacklist 2. Node delete of computeovsdpdk-0, count is reduced to 2, blacklist is reset (nothing to do) and ['0'] added. So you get indexes '1' and '2'. 3. Node delete of computeovsdpdk-1, count is reduced to 1, blacklist is reset ('0' removed) and ['1'] added. So you get index ['0']. Using node delete reduces the role count automatically. So better set 'ComputeOvsDpdkRemovalPolicies' parameter. Resetting 'ComputeOvsDpdkRemovalPoliciesMode' to 'append' (default) would have given you only index '2' in step 3. This is expected behaviour. Hi, I totally understand how PATCH updates work. I kept the setting on purpose. Because that's what users will *likely* do. It's easy to forget to set the setting back, by just removing an environment file or commenting the parameter. However, RemovalPoliciesMode and RemovalPolicies are super dangerous. How can we tell users to run this in production if something like this so easily happens? If they keep RemovalPoliciesMode "update" and then they run a node delete, possibly weeks later, and we just remove an existing node with important VMs on it? I really can't accept the answer "This is expected behaviour" when this feature has the potential to wreak havoc in customer environments. - Andreas I reran the same test again just to visualize it: Baseline: ~~~ (undercloud) [stack@undercloud-0 ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+ | id | stack_name | stack_status | creation_time | updated_time | project | +--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+ | f2e1165f-a768-48e7-b6bb-24ac13e11dfb | overcloud | UPDATE_COMPLETE | 2019-10-08T14:52:05Z | 2019-10-14T09:46:35Z | 1f9155cb6d314ff4b0933d42afd01204 | +--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+ (undercloud) [stack@undercloud-0 ~]$ openstack stack show overcloud | grep RemovalPolicies | cut -b1-200 | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: update | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append (undercloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | 64112b16-c73c-41b6-b278-341ef6f3b908 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.18 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ (undercloud) [stack@undercloud-0 ~]$ openstack overcloud node delete --stack overcloud 383d0aeb-614c-41b5-8219-2a16c3f9923c Deleting the following nodes from stack overcloud: - 383d0aeb-614c-41b5-8219-2a16c3f9923c Started Mistral Workflow tripleo.scale.v1.delete_node. Execution ID: e8c4c624-fba5-490e-993b-e2e8988be0c0 (...) ~~~ So now node count gets internally set to 1, and I instructed tripleo to delete 383d0aeb-614c-41b5-8219-2a16c3f9923c (computeovsdpdk-0). Here's what happens during the update: ~~~ (undercloud) [stack@undercloud-0 ~]$ while true ; do nova list; ironic node-list ; openstack stack show overcloud | grep RemovalPolicies | cut -b1-200; sleep 30 ; done +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | 64112b16-c73c-41b6-b278-341ef6f3b908 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.18 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 383d0aeb-614c-41b5-8219-2a16c3f9923c | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 64112b16-c73c-41b6-b278-341ef6f3b908 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: update | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | 64112b16-c73c-41b6-b278-341ef6f3b908 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.18 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 383d0aeb-614c-41b5-8219-2a16c3f9923c | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 64112b16-c73c-41b6-b278-341ef6f3b908 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: update | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | 64112b16-c73c-41b6-b278-341ef6f3b908 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.18 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 383d0aeb-614c-41b5-8219-2a16c3f9923c | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 64112b16-c73c-41b6-b278-341ef6f3b908 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ ~~~ So we can first see that the RemovalPolicies are updated. ['1'] is replaced with ['0']. Removal policies are now index ['0']. Note that at no point did I specify the UUID for computeovsdpdk-2. I also did not remove index 2 or blacklist index 2 in the RemovalPolicies. Heat now continues and deletes computeovsdpdk-0 as instructed. But it also deletes computeovsdpdk-2. Why? For a human administrator who doesn't understand (and shouldn't) how heat internally operates, this seems completely out of place. And it replaces computeovsdpdk-2 with a new node. Given that the RemovalPolicy blacklists -0, of course this is -1: ~~~ +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 383d0aeb-614c-41b5-8219-2a16c3f9923c | computeovsdpdk-0 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | 64112b16-c73c-41b6-b278-341ef6f3b908 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.18 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 383d0aeb-614c-41b5-8219-2a16c3f9923c | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 64112b16-c73c-41b6-b278-341ef6f3b908 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: update | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | BUILD | spawning | NOSTATE | ctlplane=192.168.24.13 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | None | power off | available | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power off | deploying | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ ~~~ After all of this, the user ends up with: ~~~ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: update | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | None | power off | available | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ ~~~ This is an extremely dangerous feature. Created attachment 1625608 [details]
templates
I hope that my concern makes sense. If a user decides to delete node X, our software can't go in and remove nodes X and Y. The idea behind exposing RemovalPolicies was that users had an easy was to reuse node indexes. That's a feature that users have long been looking for. But it seems that if they are not extremely cautious, they can involuntarily, and quite easily through omission, instruct Director to rebuild another (or several other?) nodes, as well. Also note that computeovsdpdk-1 was in the above case (comment 6) rebuilt on: 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 That's the same ironic node that computeovsdpdk-2 was on. If this was a production environment, all data on computeovsdpdk-2 would now have been lost forever. > I really can't accept the answer "This is expected behaviour" when this feature has the potential to wreak havoc in customer environments.
Well, I don't know what's the expectation here. 'RemovalPolicies/RemovalPoliciesMode' won't work together with node delete and surely would lead to issues. Also, 'node delete' is really buggy and we never suggest it to customers (probably should be deprecated and removed).
I would suggest customers to explicitly set blacklists and node counts, rather than using node delete.
We probably should update the documentation and kb articles. Hi Rabi, The official documentation for OSP 13 has: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/index#sect-Removing_Compute_Nodes ~~~ (undercloud) $ openstack overcloud node delete --stack [STACK_UUID] [NODE1_UUID] [NODE2_UUID] [NODE3_UUID] ~~~ The official documentation for hypverconverged has: https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_cloud/13/html/operations_guide/removing-a-node-from-the-overcloud#removing-the-nova-compute-services-from-the-overcloud-rhhi ~~~ Delete the compute node by UUID from the overcloud: openstack overcloud node delete --stack OSP_NAME NOVA_UUID ~~~ If that's buggy, then we have a huge issue in our documentation. What's the correct way for deleting nodes? And what's the correct way for deleting nodes and preserving their index, so that customers can reuse it? How should I modify https://access.redhat.com/solutions/4232971 so that it isn't risky? FYI, I ran a scale-out and set the policy to "append". Now, I'm deleting a node, to reproduce (or not) the initial issue: ~~~ (undercloud) [stack@undercloud-0 ~]$ nova list ironic node-list +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ (undercloud) [stack@undercloud-0 ~]$ ironic node-list The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ (undercloud) [stack@undercloud-0 ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+ | id | stack_name | stack_status | creation_time | updated_time | project | +--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+ | f2e1165f-a768-48e7-b6bb-24ac13e11dfb | overcloud | UPDATE_COMPLETE | 2019-10-08T14:52:05Z | 2019-10-14T13:01:55Z | 1f9155cb6d314ff4b0933d42afd01204 | +--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+ (undercloud) [stack@undercloud-0 ~]$ openstack stack show overcloud | grep RemovalPolicies | cut -b1-200 | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append (undercloud) [stack@undercloud-0 ~]$ openstack overcloud node delete --stack overcloud 46c3670a-f86a-4b10-a2ef-4ffc018ee565 Deleting the following nodes from stack overcloud: - 46c3670a-f86a-4b10-a2ef-4ffc018ee565 Started Mistral Workflow tripleo.scale.v1.delete_node. Execution ID: 5a197c80-249c-43d6-9a83-935cd313dcc2 Waiting for messages on queue 'tripleo' with no timeout. ~~~ That new test doesn't start out too well: ~~~ (undercloud) [stack@undercloud-0 ~]$ while true ; do nova list; ironic node-list ; openstack stack show overcloud | grep RemovalPolicies | cut -b1-200; sleep 30 ; done | tee output.txt +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ ~~~ Even though I flipped PoliciesMode back to "append" on the last scale-out, the resource list gets updated from ['0'] to ['1']. Shouldn't it append to ['0','1'] ??? > What's the correct way for deleting nodes? - Update(append indexes) *RemovalPolicies to delete nodes (User have to identify the index for a nova instance) - Toggle *RemovalPolicyMode to reset the blacklist as only updating *RemovalPolicies won't do it. > If that's buggy, then we have a huge issue in our documentation. Both *RemovalPolicies and 'node delete' are used to blacklist nodes and using both in 'non-append' mode can overwrite stuff and lead to issues. As far as 'node delete' is concerned it definitely has issues in corner cases. But it's been used by users till date. > Even though I flipped PoliciesMode back to "append" on the last scale-out, the resource list gets updated from ['0'] to ['1']. Shouldn't it append to ['0','1'] ??? Parameters won't change, the blacklist the db would change with 'node delete'. That looks better indeed with append: +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on | active | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | computeovsdpdk-1 | ACTIVE | deleting | Running | | | 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | 46c3670a-f86a-4b10-a2ef-4ffc018ee565 | power on | deleting | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ | 11d337e6-cf3b-420b-a3e8-76c355d13742 | computeovsdpdk-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | 4141033e-83de-489d-8856-9e068324fe5c | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 996f98c0-8ee9-4e66-9273-35651dde1e84 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.26 | +--------------------------------------+------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 46cbbfe1-3d86-42b8-88f0-eea94d3cc6ad | compute-33 | 11d337e6-cf3b-420b-a3e8-76c355d13742 | power on | active | False | | 63742c70-6750-4bdd-a6f3-8eef8e0addef | compute-34 | None | power off | available | False | | 47fb7e86-1f3d-4e0d-b573-0be1418f9a0d | controller-0 | 996f98c0-8ee9-4e66-9273-35651dde1e84 | power on | active | False | | 320769fe-fb7c-4f3e-af1a-67a00442a96c | controller-1 | 4141033e-83de-489d-8856-9e068324fe5c | power on | active | False | | 0526bb87-2317-4eac-9f68-89d6d4b57aab | controller-2 | a45ec4f4-4798-4538-9ea0-9eea88762cd2 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append ^C (undercloud) [stack@undercloud-0 ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+ | id | stack_name | stack_status | creation_time | updated_time | project | +--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+ | f2e1165f-a768-48e7-b6bb-24ac13e11dfb | overcloud | UPDATE_COMPLETE | 2019-10-08T14:52:05Z | 2019-10-14T14:14:02Z | 1f9155cb6d314ff4b0933d42afd01204 | +--------------------------------------+------------+-----------------+----------------------+----------------------+----------------------------------+ (undercloud) [stack@undercloud-0 ~]$ Hi, I really don't understand how the RemovalPolicies work. In the following example, all nodes are from the same role. Just hostname mapping is different for the first 3 computes ... I had compute-0, compute-1, compute-2. Then, I deleted compute 1: ~~~ +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ | 5fe2160a-0665-445e-88ce-d73d50e10042 | 0123456789-overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.168.24.35 | | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | 0123456789-overcloud-compute-2 | ACTIVE | - | Running | ctlplane=192.168.24.19 | | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | 0123456789-overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | e6603878-6909-4dce-8525-cdee1da02a63 | 0123456789-overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | f9774cb6-837a-44ca-8a66-4de0e5650c3f | 0123456789-overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.10 | +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | fe4b5498-26e6-4893-b26f-5dd5ca2f6881 | compute-30 | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | power on | active | False | | 690a7161-c893-49b8-b09e-73bc177a1efc | compute-31 | None | power off | available | False | | 47059e88-1b4d-47e2-bdcf-a2641b7e5303 | compute-32 | 5fe2160a-0665-445e-88ce-d73d50e10042 | power on | active | False | | 80afb6f5-a855-4117-ab77-607f4a429ebf | controller-0 | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | power on | active | False | | af0c1a1f-ffba-4080-b8ae-049d3d2197d7 | controller-1 | e6603878-6909-4dce-8525-cdee1da02a63 | power on | active | False | | 23b0b128-8085-4cb0-b473-35be4cb3778f | controller-2 | f9774cb6-837a-44ca-8a66-4de0e5650c3f | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append ~~~ So the Policies contains resource_list 1, that makes sense. Then, I deleted -0 and the removal policies went to '0' ... ~~~ +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | fe4b5498-26e6-4893-b26f-5dd5ca2f6881 | compute-30 | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | power on | active | False | | 690a7161-c893-49b8-b09e-73bc177a1efc | compute-31 | None | power off | available | False | | 47059e88-1b4d-47e2-bdcf-a2641b7e5303 | compute-32 | None | power off | available | False | | 80afb6f5-a855-4117-ab77-607f4a429ebf | controller-0 | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | power on | active | False | | af0c1a1f-ffba-4080-b8ae-049d3d2197d7 | controller-1 | e6603878-6909-4dce-8525-cdee1da02a63 | power on | active | False | | 23b0b128-8085-4cb0-b473-35be4cb3778f | controller-2 | f9774cb6-837a-44ca-8a66-4de0e5650c3f | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | 0123456789-overcloud-compute-2 | ACTIVE | - | Running | ctlplane=192.168.24.19 | | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | 0123456789-overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | e6603878-6909-4dce-8525-cdee1da02a63 | 0123456789-overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | f9774cb6-837a-44ca-8a66-4de0e5650c3f | 0123456789-overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.10 | +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ ~~~ Then, I scaled out and it remained on '0' ... Why didn't it deploy node -1, then ?! ~~~ +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | 0123456789-overcloud-compute-2 | ACTIVE | - | Running | ctlplane=192.168.24.19 | | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | 0123456789-overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | e6603878-6909-4dce-8525-cdee1da02a63 | 0123456789-overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | f9774cb6-837a-44ca-8a66-4de0e5650c3f | 0123456789-overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.10 | | cd1fe4d7-462c-4903-8602-585d3bbfe0b5 | computeovsdpdk-3 | ACTIVE | - | Running | ctlplane=192.168.24.14 | | 2ec6da5a-4221-4d5f-9d5f-459a5bfa15ed | computeovsdpdk-4 | ACTIVE | - | Running | ctlplane=192.168.24.13 | +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | fe4b5498-26e6-4893-b26f-5dd5ca2f6881 | compute-30 | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | power on | active | False | | 690a7161-c893-49b8-b09e-73bc177a1efc | compute-31 | cd1fe4d7-462c-4903-8602-585d3bbfe0b5 | power on | active | False | | 47059e88-1b4d-47e2-bdcf-a2641b7e5303 | compute-32 | 2ec6da5a-4221-4d5f-9d5f-459a5bfa15ed | power on | active | False | | 80afb6f5-a855-4117-ab77-607f4a429ebf | controller-0 | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | power on | active | False | | af0c1a1f-ffba-4080-b8ae-049d3d2197d7 | controller-1 | e6603878-6909-4dce-8525-cdee1da02a63 | power on | active | False | | 23b0b128-8085-4cb0-b473-35be4cb3778f | controller-2 | f9774cb6-837a-44ca-8a66-4de0e5650c3f | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append ~~~ Then, I deleted compute-3 and [{"resource_list": ["3"]}] ... ~~~ +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | 0123456789-overcloud-compute-2 | ACTIVE | - | Running | ctlplane=192.168.24.19 | | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | 0123456789-overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | e6603878-6909-4dce-8525-cdee1da02a63 | 0123456789-overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | f9774cb6-837a-44ca-8a66-4de0e5650c3f | 0123456789-overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.10 | | 2ec6da5a-4221-4d5f-9d5f-459a5bfa15ed | computeovsdpdk-4 | ACTIVE | - | Running | ctlplane=192.168.24.13 | +--------------------------------------+-----------------------------------+--------+------------+-------------+------------------------+ The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | fe4b5498-26e6-4893-b26f-5dd5ca2f6881 | compute-30 | 278e5a1b-a785-4a49-bd03-dbd5d2b7ee0e | power on | active | False | | 690a7161-c893-49b8-b09e-73bc177a1efc | compute-31 | None | power off | available | False | | 47059e88-1b4d-47e2-bdcf-a2641b7e5303 | compute-32 | 2ec6da5a-4221-4d5f-9d5f-459a5bfa15ed | power on | active | False | | 80afb6f5-a855-4117-ab77-607f4a429ebf | controller-0 | 2d47dd59-fc58-48f1-9ba8-0f990974f6ac | power on | active | False | | af0c1a1f-ffba-4080-b8ae-049d3d2197d7 | controller-1 | e6603878-6909-4dce-8525-cdee1da02a63 | power on | active | False | | 23b0b128-8085-4cb0-b473-35be4cb3778f | controller-2 | f9774cb6-837a-44ca-8a66-4de0e5650c3f | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | | ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["3"]}]' | | ComputeOvsDpdkRemovalPoliciesMode: append | | ControllerRemovalPolicies: '[]' | | ControllerRemovalPoliciesMode: append ~~~ Question 1: How do I deploy ID -0 again, with the above setup? Question 2: How do I deploy ID -1 again, with the above setup? Question 3: How do I deploy ID -3 again, with the above setup? Thanks, Andreas Based on what you've described, I don't see any new issue or unexpected change in behavior. However, we'd need to see the commands you actually ran and the templates used to see if there's an actual issue here. As you describe it, I don't directly see a problem. Heat won't reuse previously deleted indexes, which is what I think you're asking for. You can certainly use a previously used hostname though by mapping the new index to hostname you want. For instance, if you wanted to reuse the compute-1 hostname, but the next index is actually "-5", then HostnameMap would need something like: overcloud-compute-5: compute-1 Likewise for the FixedIp's list, if you're using those. Meaning the IP's for the new compute-1 would need to be in the list position for overcloud-compute-5. The switching of the hostname format in the middle of the scale up/down seems to be unnecessarily complicating things, but I don't think that's the direct cause of any issue. Hi James, This bugreport is about using RemovalPolicies. This BZ has a full database backup + templates attached. This is about documenting https://access.redhat.com/solutions/4232971 in a way so that our customers can make use of this feature. However, I found out during my tests that setting the RemovalPoliciesMode to 'update', then reusing an index, and then forgetting to set it back to 'append' will cause a node replacement **of the wrong node** when running openstack overcloud node delete, again. That's described in detail in my posts. That's well described, with full templates, database backup, until comment 9. Rabi then said that: a) I need to flap back to 'append' from 'update', otherwise openvstack overcloud node delete will cause issues. E.g. replacing the wrong node. He said that everything worked as designed. Comment 10,11,15. I then showed that flipping it back to append indeed does not replace the wrong node: Comments 13,14,16 I then asked Rabi: if node delete is indeed buggy, what is the correct way to *not* use node delete and how to update our documentation, which recommends to use node delete. Also, I'm still looking for how to reuse node indexes. This entire BZ is about finding a procedure to use RemovalPolicies so that users can reuse the internal indexes: https://access.redhat.com/solutions/4232971 I created a more complex scenario and want to use this feature in a save way. The concrete question is in: comment 17 I also wonder: * if overcloud node delete is buggy and we should use another command, why is this not documented and the first time I hear about this * we explicitly exposed RemovalPolicies via environment files in tripleo so that users can reuse indexes. Again, https://access.redhat.com/solutions/4232971 --- but the feature seems to be very dangerous to me (see above). So I need to find a way to document this, so that our users do not shoot themselves in the foot. I also would like to find some more complex scenarios for how to use this feature. E.g., in a scenario where indexes 0,1 and 3 are removed, how can users reuse index 1. I'm talking about internal indexes. I'm well aware of the workarounds with FixedIPs and HostnameMaps. However, these are insufficient and not satisfying for a very large number of our customers. Note that this ticket is not about using HostnameMaps. That's just unfortunate in my last environment as I reused a different lab environment which had it. The task here is: how to reuse internal node indexes safely with RemovalPolicies. Just relized that the overcloud_deploy.sh may be missing. It's: openstack overcloud deploy --templates \ -r ${template_base_dir}/roles_data.yaml \ -e ${template_base_dir}/overcloud_images.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/host-config-and-reboot.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/neutron-ovs-dpdk.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ovs-dpdk-permissions.yaml \ -e ${template_base_dir}/network-environment.yaml \ -e ${template_base_dir}/node-count.yaml \ -e ${template_base_dir}/dpdk-conf.yaml \ -e ${template_base_dir}/custom-hostnames.yaml \ --log-file /home/stack/overcloud_install.log # +/- -e ${template_base_dir}/removal-policies.yaml \ Thanks for the background information as it was previously not clear to me based on the summary of the bug. Sound like you're trying to validate the procedure from https://access.redhat.com/solutions/4232971 and while doing so, found an issue with node deletion. I'll re-assign the bug so it can be prioritized correctly. I thought I clarified it in my earlier comments.. > Then, I deleted -0 and the removal policies went to '0' I think you're making a mistake of looking at the stack parameters. In the 'append' mode effective blacklisting is the union of blacklist history (in heat db) and what's in *RemovalPolicies parameter. balcklist=['0', '1'], count=1, nodes=['2'] > Then, I scaled out and it remained on '0' ... Why didn't it deploy node -1, then ?! balcklist=['0', '1'], count=3, nodes=['2', '3', '4'] > Then, I deleted compute-3 and [{"resource_list": ["3"]}] ... blacklist=['0', '1', '3'], count=2, nodes=['2', '4'] > How do I deploy ID -0 again, with the above setup? Scaling out with below would flush blacklist history and use the ones you provided with parameters. ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["1", "3"]}]' ComputeOvsDpdkRemovalPoliciesMode: update ComputeCount: 3 nodes=['0', '2', '4'] blacklist=["1", "3"] > How do I deploy ID -1 again, with the above setup? ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0", "3"]}]' ComputeOvsDpdkRemovalPoliciesMode: update ComputeCount: 3 nodes=['1', '2', '4'] blacklist=["0", "3"] > How do I deploy ID -3 again, with the above setup? ComputeOvsDpdkRemovalPolicies: '[{"resource_list": ["0", "1"]}]' ComputeOvsDpdkRemovalPoliciesMode: update ComputeCount: 3 nodes=['2', '3', '4'] blacklist=["0", "1"] Unless you increase the ComputeCount, when using 'update' mode, it would remove higher indexes to achieve the count. 'node delete' decreases the count along with changing the blacklist, so with 'update' mode you can end up with unsatisfactory results. Let me explain that... Suppose you've count=3, nodes=['0', '1', '2'], blacklist=[], policy=append a. node delete ['0'] count=2, nodes=['1', '2'], blacklist=['0'], policy=append, *RemovalPolicies: [{"resource_list": ["0"]}] b. node delete ['1'] count=1, nodes=['2'], blacklist=['0', '1'], policy=append, *RemovalPolicies: [{"resource_list": ["1"]}] Note: node delete resets the *RemovalPolicies parameter and it works fine as long as you use 'append' mode as the blacklist history is in heat database. c. overcloud update with policy mode change to 'update' and no increase in *Count count=1, nodes=['0'], blacklist=['1'], policy=update, *RemovalPolicies: [{"resource_list": ["1"]}] Note: Because it's a PATCH update, it would use the *RemovalPolicies parameter set during last node delete, As you're asking to flush the blacklist history and have a count=1, '2' would be deleted and '0' created. But probably user would not have expected it. We should explicitly mention that when using 'update' (flush all history of blacklisting) mode you should reset *RemovalPolicies and increase the *Count as required to not see any undesired behaviour, as it will always flush the blacklist history in the heat database. d. overcloud update with *Count: 3 count=3, nodes=['0', '2', '3'], blacklist=['1'], policy=update, *RemovalPolicies: [{"resource_list": ["1"]}] e. node delete ['3'] count=2, nodes=['0', '1'], blacklist=['3'], policy=update, *RemovalPolicies: [{"resource_list": ["3"]}] both 2, 3 would be deleted and '1' would be created and this is logical ad you're only asking for 2 nodes and to flush the blacklist history That's the reason, I mentioned earlier to not use 'node delete' when you've set the removal mode to 'update'. It would be good to keep track of all blacklists in the template itself by appending to *RemovalPolicies and use simple stack updates. > if overcloud node delete is buggy and we should use another command, why is this not documented and the first time I hear about this Well, overcloud node delete resets *Count and *RemovalPolicies, parameters which would be different from what's in the template and can result in issues. We've been working around that till date. Point (c) in the last comment is not correct as it would use node count=3 in the template, so it would result in what's mentioned in (d). Updated example I used above: Suppose you've count=3, nodes=['0', '1', '2'], blacklist=[], policy=append a. node delete ['0'] count=2, nodes=['1', '2'], blacklist=['0'], mode=append, *RemovalPolicies: [{"resource_list": ["0"]}] b. node delete ['1'] count=1, nodes=['2'], blacklist=['0', '1'], mode=append, *RemovalPolicies: [{"resource_list": ["1"]}] c. overcloud update with policy mode change to 'update' (It will use count=3 in template) count=3, nodes=['0', '2', '3'], blacklist=['1'], mode=update, *RemovalPolicies: [{"resource_list": ["1"]}] d. node delete ['3'] count=2, nodes=['0', '1'], blacklist=['3'], mode=update, *RemovalPolicies: [{"resource_list": ["3"]}] both 2, 3 would be deleted and '1' would be created. This seems logical as you're only asking for 2 nodes and to flush the blacklist history. However, we should force mode=append with 'node delete' to avoid it. count=2, nodes=['0', '2'], blacklist=['1', '3'], mode=append, *RemovalPolicies: [{"resource_list": ["3"]}] I've submitted a patch to force the mode to 'append' for 'node delete' https://review.opendev.org/#/c/689326/ Rabi, I made your comments public, I hope that's o.k. - that way, my customer can follow up. I'll work on the knowledge base article as well. Thanks, - Andreas Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0760 |