Bug 1399429

Summary: Unable to delete overcloud node when identifying --stack by UUID, using name works however
Product: Red Hat OpenStack Reporter: John Fulton <johfulto>
Component: python-tripleoclientAssignee: RHOS Maint <rhos-maint>
Status: CLOSED ERRATA QA Contact: Gurenko Alex <agurenko>
Severity: low Docs Contact:
Priority: low    
Version: 10.0 (Newton)CC: agurenko, ddomingo, hbrock, johfulto, jschluet, jslagle, mburns, mcornea, rhel-osp-director-maint
Target Milestone: z1Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-tripleoclient-5.4.1-1.el7ost Doc Type: No Doc Update
Doc Text:
None
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-01 14:46:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Fulton 2016-11-29 04:33:58 UTC
I. Description of problem:

I created an OsdCompute custom role [1] and was able to deploy [2] a physical 7 node overcloud with 3 Controllers and 4 OsdComputes. However, I was unable to delete one of my OsdComputes using `openstack overcloud node delete` and received an unrecognized argument error for '-r': 

[stack@hci-director ~]$ echo $stack_id
23e7c364-7303-4af6-b54d-cfbf1b737680
[stack@hci-director ~]$ echo $nova_id
5fa641cf-b290-4a2a-b15e-494ab9d10d8a
[stack@hci-director ~]$ time openstack overcloud node delete --stack $stack_id --templates \
> -r ~/custom-templates/custom-roles.yaml \
> -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml \
> -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
> -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
> -e ~/custom-templates/network.yaml \
> -e ~/custom-templates/ceph.yaml \
> -e ~/custom-templates/layout.yaml $nova_id
usage: openstack overcloud node delete [-h] [--stack STACK]
                                       [--templates [TEMPLATES]]
                                       [-e <HEAT ENVIRONMENT FILE>]
                                       <node> [<node> ...]
openstack overcloud node delete: error: unrecognized arguments: -r 5fa641cf-b290-4a2a-b15e-494ab9d10d8a

real	0m0.758s
user	0m0.501s
sys	0m0.085s
[stack@hci-director ~]$

[1] https://github.com/RHsyseng/hci/blob/master/custom-templates/custom-roles.yaml#L168
[2] https://github.com/RHsyseng/hci/blob/master/scripts/deploy.sh


II. Version-Release number of selected component (if applicable):

Reproduced using the puddle from 10.0-RHEL-7/2016-11-19.4. 

[stack@hci-director ~]$ rpm -qa | egrep tripleo | sort 
openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch
openstack-tripleo-common-5.4.0-2.el7ost.noarch
openstack-tripleo-heat-templates-5.1.0-3.el7ost.noarch
openstack-tripleo-image-elements-5.1.0-1.el7ost.noarch
openstack-tripleo-puppet-elements-5.1.0-2.el7ost.noarch
openstack-tripleo-ui-1.0.5-1.el7ost.noarch
openstack-tripleo-validations-5.1.0-5.el7ost.noarch
puppet-tripleo-5.4.0-2.el7ost.noarch
python-tripleoclient-5.4.0-1.el7ost.noarch
[stack@hci-director ~]$ 


III. How reproducible:
Deterministic

IV. Steps to Reproduce:
1. Deploy an overcloud which has nodes from a custom role as described in our docs*
2. Try to delete one of the nodes from the custom roles will keeping the rest of the overcloud running

* https://access.redhat.com/documentation/en/red-hat-openstack-platform/10-beta/single/advanced-overcloud-customization/#example_3_creating_a_new_role

V. Actual results:
The custom role, OsdCompute in this case, is not deleted and instead an error is seen. 

VI. Expected results:
The custom role, OsdCompute in this case, is deleted just like how a non-custom role, e.g. compute, is deleted. 


VII. Additional info:

All of my Heat templates can be seen at: 

 https://github.com/RHsyseng/hci/tree/master/custom-templates

Attempting the delete without the -r produces the following error:

[stack@hci-director ~]$ time openstack overcloud node delete --stack $stack_id --templates -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e ~/custom-templates/network.yaml -e ~/custom-templates/ceph.yaml -e ~/custom-templates/layout.yaml $nova_id 
deleting nodes [u'5fa641cf-b290-4a2a-b15e-494ab9d10d8a'] from stack 23e7c364-7303-4af6-b54d-cfbf1b737680
Started Mistral Workflow. Execution ID: 0bf0e91c-2f49-4f84-a46d-f251ee99e9fe
{u'execution': {u'id': u'0bf0e91c-2f49-4f84-a46d-f251ee99e9fe',
                u'input': {u'container': u'23e7c364-7303-4af6-b54d-cfbf1b737680',
                           u'nodes': [u'5fa641cf-b290-4a2a-b15e-494ab9d10d8a'],
                           u'queue_name': u'668f46fb-1b76-46c8-898d-b260cc2c0996',
                           u'timeout': 240},
                u'name': u'tripleo.scale.v1.delete_node',
                u'params': {},
                u'spec': {u'description': u'deletes given overcloud nodes and updates the stack',
                          u'input': [u'container',
                                     u'nodes',
                                     {u'timeout': 240},
                                     {u'queue_name': u'tripleo'}],
                          u'name': u'delete_node',
                          u'tasks': {u'delete_node': {u'action': u'tripleo.scale.delete_node nodes=<% $.nodes %> timeout=<% $.timeout %> container=<% $.container %>',
                                                      u'name': u'delete_node',
                                                      u'on-error': u'set_delete_node_failed',
                                                      u'on-success': u'send_message',
                                                      u'type': u'direct',
                                                      u'version': u'2.0'},
                                     u'send_message': {u'action': u'zaqar.queue_post',
                                                       u'input': {u'messages': {u'body': {u'payload': {u'execution': u'<% execution() %>',
                                                                                                       u'message': u"<% $.get('message', '') %>",
                                                                                                       u'status': u"<% $.get('status', 'SUCCESS') %>"},
                                                                                          u'type': u'tripleo.scale.v1.delete_node'}},
                                                                  u'queue_name': u'<% $.queue_name %>'},
                                                       u'name': u'send_message',
                                                       u'retry': u'count=5 delay=1',
                                                       u'type': u'direct',
                                                       u'version': u'2.0'},
                                     u'set_delete_node_failed': {u'name': u'set_delete_node_failed',
                                                                 u'on-success': u'send_message',
                                                                 u'publish': {u'message': u'<% task(delete_node).result %>',
                                                                              u'status': u'FAILED'},
                                                                 u'type': u'direct',
                                                                 u'version': u'2.0'}},
                          u'version': u'2.0'}},
 u'message': u"Failed to run action [action_ex_id=0d09a2ab-007d-4efa-893d-f7e26c2a8c91, action_cls='<class 'mistral.actions.action_factory.ScaleDownAction'>', attributes='{}', params='{u'nodes': [u'5fa641cf-b290-4a2a-b15e-494ab9d10d8a'], u'container': u'23e7c364-7303-4af6-b54d-cfbf1b737680', u'timeout': 240}']\n Environment not found [name=23e7c364-7303-4af6-b54d-cfbf1b737680]",
 u'status': u'FAILED'}

real	1m37.188s
user	0m0.515s
sys	0m0.074s
[stack@hci-director ~]$ 

No errors were reported by `openstack stack failures list overcloud`.

[stack@hci-director ~]$ openstack stack failures list overcloud
[stack@hci-director ~]$ openstack stack list
+--------------------------------------+------------+-----------------+----------------------+--------------+
| ID                                   | Stack Name | Stack Status    | Creation Time        | Updated Time |
+--------------------------------------+------------+-----------------+----------------------+--------------+
| 23e7c364-7303-4af6-b54d-cfbf1b737680 | overcloud  | CREATE_COMPLETE | 2016-11-24T03:24:56Z | None         |
+--------------------------------------+------------+-----------------+----------------------+--------------+
[stack@hci-director ~]$ openstack server list
+--------------------------------------+-------------------------+--------+-----------------------+----------------+
| ID                                   | Name                    | Status | Networks              | Image Name     |
+--------------------------------------+-------------------------+--------+-----------------------+----------------+
| fc8686c1-a675-4c89-a508-cc1b34d5d220 | overcloud-controller-2  | ACTIVE | ctlplane=192.168.1.37 | overcloud-full |
| 7c6ae5f3-7e18-4aa2-a1f8-53145647a3de | overcloud-osd-compute-2 | ACTIVE | ctlplane=192.168.1.30 | overcloud-full |
| 5fa641cf-b290-4a2a-b15e-494ab9d10d8a | overcloud-osd-compute-3 | ACTIVE | ctlplane=192.168.1.21 | overcloud-full |
| 851f76db-427c-42b3-8e0b-e8b4b19770f8 | overcloud-controller-0  | ACTIVE | ctlplane=192.168.1.33 | overcloud-full |
| e2906507-6a06-4c4d-bd15-9f7de455e91d | overcloud-controller-1  | ACTIVE | ctlplane=192.168.1.29 | overcloud-full |
| 0f93a712-b9eb-4f42-bc05-f2c8c2edfd81 | overcloud-osd-compute-0 | ACTIVE | ctlplane=192.168.1.32 | overcloud-full |
| 8f266c17-ff39-422e-a935-effb219c7782 | overcloud-osd-compute-1 | ACTIVE | ctlplane=192.168.1.24 | overcloud-full |
+--------------------------------------+-------------------------+--------+-----------------------+----------------+
[stack@hci-director ~]$ 

Our staged documentation: 

https://access.redhat.com/documentation/en/red-hat-openstack-platform/10-beta/single/director-installation-and-usage/#sect-Removing_Compute_Nodes

Suggests running "openstack overcloud node delete --stack [STACK_UUID] --templates -e [ENVIRONMENT_FILE] [NODE1_UUID] [NODE2_UUID] [NODE3_UUID]" and "If you passed any extra environment files when you created the Overcloud, pass them here again using the -e or --environment-file option to avoid making undesired manual changes to the Overcloud." Does -r apply in the same way? If not, then the docs will probably need an update too.

Comment 1 John Fulton 2016-11-29 05:19:38 UTC
I was able to delete my extra node by updating the Heat templates to shrink the node count and then re-running the deploy. Is my issue really just a docbug? 

Can someone confirm this is the recommended way to delete an overcloud node and then the BZ can be changed to a docbug? 

Details:

I had originally set my OsdComputeCount to 4 and had pre-assigned IPs for the 4th node (overcloud-compute-3). I then just decremented the count and commented out the extra IPs: 

[stack@hci-director ~]$ egrep "\#|OsdComputeCount" ~/custom-templates/layout.yaml
  OsdComputeCount: 3 
      #- 192.168.2.206      
      #- 192.168.3.206      
      #- 172.16.1.206
      #- 172.16.2.206
[stack@hci-director ~]$ 

From there I just re-ran my deploy script [1] and after that I had the desired behavior. Because the node to be deleted was a Ceph OSD, I had followed our procedure to manually remove the OSDs from the Ceph cluster as per our doc [2]. The same doc will also need an update since it suggests using `openstack overcloud node delete`. Here [3] are details on my overcloud after running the deploy again with the node count changed. 

[1] https://github.com/RHsyseng/hci/blob/master/scripts/deploy.sh
[2] https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud#Replacing_Ceph_Storage_Nodes

[3] 

[stack@hci-director ~]$ openstack server list
+--------------------------------------+-------------------------+--------+-----------------------+----------------+
| ID                                   | Name                    | Status | Networks              | Image Name     |
+--------------------------------------+-------------------------+--------+-----------------------+----------------+
| fc8686c1-a675-4c89-a508-cc1b34d5d220 | overcloud-controller-2  | ACTIVE | ctlplane=192.168.1.37 | overcloud-full |
| 7c6ae5f3-7e18-4aa2-a1f8-53145647a3de | overcloud-osd-compute-2 | ACTIVE | ctlplane=192.168.1.30 | overcloud-full |
| 851f76db-427c-42b3-8e0b-e8b4b19770f8 | overcloud-controller-0  | ACTIVE | ctlplane=192.168.1.33 | overcloud-full |
| e2906507-6a06-4c4d-bd15-9f7de455e91d | overcloud-controller-1  | ACTIVE | ctlplane=192.168.1.29 | overcloud-full |
| 0f93a712-b9eb-4f42-bc05-f2c8c2edfd81 | overcloud-osd-compute-0 | ACTIVE | ctlplane=192.168.1.32 | overcloud-full |
| 8f266c17-ff39-422e-a935-effb219c7782 | overcloud-osd-compute-1 | ACTIVE | ctlplane=192.168.1.24 | overcloud-full |
+--------------------------------------+-------------------------+--------+-----------------------+----------------+
[stack@hci-director ~]$ 

[stack@hci-director ~]$ openstack stack list
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| 23e7c364-7303-4af6-b54d-cfbf1b737680 | overcloud  | UPDATE_COMPLETE | 2016-11-24T03:24:56Z | 2016-11-29T04:47:03Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
[stack@hci-director ~]$

Comment 2 John Fulton 2016-11-30 15:59:33 UTC
As per shardy the following is the correct way to delete a node, even if it's from a custom role: 

`openstack overcloud node delete --stack $ID $node_id`

The above should be run without any -r or -e options. I will test this next and update the bug.

Comment 6 John Fulton 2016-11-30 17:21:02 UTC
I made a mistake in my testing with the IDs and comments 3,4,5 should be ignored.

Comment 7 John Fulton 2016-11-30 17:36:11 UTC
I was able to delete a node but I had to provide the stack name "overcloud" and not the UUID. 

Here's an example of it working: 

1. Identify the node ID

[stack@hci-director ~]$ openstack server list | grep osd-compute-3
| 6b2a2e71-f9c8-4d5b-aaf8-dada97c90821 | overcloud-osd-compute-3 | ACTIVE | ctlplane=192.168.1.27 | overcloud-full |
[stack@hci-director ~]$

2. Start a Mistral workflow to delete the node by ID from the stack by name: 

[stack@hci-director ~]$ time openstack overcloud node delete --stack overcloud 6b2a2e71-f9c8-4d5b-aaf8-dada97c90821
deleting nodes [u'6b2a2e71-f9c8-4d5b-aaf8-dada97c90821'] from stack overcloud
Started Mistral Workflow. Execution ID: 396f123d-df5b-4f37-b137-83d33969b52b

real    1m50.662s
user    0m0.563s
sys     0m0.099s
[stack@hci-director ~]$ 

3. Observe that the stack is being updated: 

[stack@hci-director ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+--------------------+----------------------+----------------------+
| id                                   | stack_name | stack_status       | creation_time        | updated_time         |
+--------------------------------------+------------+--------------------+----------------------+----------------------+
| 23e7c364-7303-4af6-b54d-cfbf1b737680 | overcloud  | UPDATE_IN_PROGRESS | 2016-11-24T03:24:56Z | 2016-11-30T17:16:48Z |
+--------------------------------------+------------+--------------------+----------------------+----------------------+
[stack@hci-director ~]$

4. Observe that the update is complete: 

[stack@hci-director ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| id                                   | stack_name | stack_status    | creation_time        | updated_time         |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| 23e7c364-7303-4af6-b54d-cfbf1b737680 | overcloud  | UPDATE_COMPLETE | 2016-11-24T03:24:56Z | 2016-11-30T17:16:48Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
[stack@hci-director ~]$ 

5. Observe that the node was deleted as desired. 

[stack@hci-director ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks              |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| 851f76db-427c-42b3-8e0b-e8b4b19770f8 | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.168.1.33 |
| e2906507-6a06-4c4d-bd15-9f7de455e91d | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.168.1.29 |
| fc8686c1-a675-4c89-a508-cc1b34d5d220 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.168.1.37 |
| 0f93a712-b9eb-4f42-bc05-f2c8c2edfd81 | overcloud-osd-compute-0 | ACTIVE | -          | Running     | ctlplane=192.168.1.32 |
| 8f266c17-ff39-422e-a935-effb219c7782 | overcloud-osd-compute-1 | ACTIVE | -          | Running     | ctlplane=192.168.1.24 |
| 7c6ae5f3-7e18-4aa2-a1f8-53145647a3de | overcloud-osd-compute-2 | ACTIVE | -          | Running     | ctlplane=192.168.1.30 |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
[stack@hci-director ~]$ 

Warning: if you identify the stack by its UUID, as I did originally, you may run into the issue below. Note in the first line of output from the command below, that it correctly identifies the node number and the stack number but is unable to find the environment by name: "Environment not found [name=23e7c364-7303-4af6-b54d-cfbf1b737680]". So I think this is a minor bug and I'll update the title as the workaround is simple. 

[stack@hci-director ~]$ nova_id=$(openstack server list | grep compute-3 | awk {'print $2'} | egrep -vi 'id|^$')
[stack@hci-director ~]$ stack_id=$(openstack stack list | awk {'print $2'} | egrep -vi 'id|^$')
[stack@hci-director ~]$ time openstack overcloud node delete --stack $stack_id $nova_id
deleting nodes [u'6b2a2e71-f9c8-4d5b-aaf8-dada97c90821'] from stack 23e7c364-7303-4af6-b54d-cfbf1b737680
Started Mistral Workflow. Execution ID: 4864b1df-a170-4d51-b411-79f839d11ecd
{u'execution': {u'id': u'4864b1df-a170-4d51-b411-79f839d11ecd',
                u'input': {u'container': u'23e7c364-7303-4af6-b54d-cfbf1b737680',
                           u'nodes': [u'6b2a2e71-f9c8-4d5b-aaf8-dada97c90821'],
                           u'queue_name': u'b0c40c06-be37-402d-9636-6071ba3e28b2',
                           u'timeout': 240},
                u'name': u'tripleo.scale.v1.delete_node',
                u'params': {},
                u'spec': {u'description': u'deletes given overcloud nodes and updates the stack',
                          u'input': [u'container',
                                     u'nodes',
                                     {u'timeout': 240},
                                     {u'queue_name': u'tripleo'}],
                          u'name': u'delete_node',
                          u'tasks': {u'delete_node': {u'action': u'tripleo.scale.delete_node nodes=<% $.nodes %> timeout=<% $.timeout %> container=<% $.container %>',
                                                      u'name': u'delete_node',
                                                      u'on-error': u'set_delete_node_failed',
                                                      u'on-success': u'send_message',
                                                      u'type': u'direct',
                                                      u'version': u'2.0'},
                                     u'send_message': {u'action': u'zaqar.queue_post',
                                                       u'input': {u'messages': {u'body': {u'payload': {u'execution': u'<% execution() %>',
                                                                                                       u'message': u"<% $.get('message', '') %>",
                                                                                                       u'status': u"<% $.get('status', 'SUCCESS') %>"},
                                                                                          u'type': u'tripleo.scale.v1.delete_node'}},
                                                                  u'queue_name': u'<% $.queue_name %>'},
                                                       u'name': u'send_message',
                                                       u'retry': u'count=5 delay=1',
                                                       u'type': u'direct',
                                                       u'version': u'2.0'},
                                     u'set_delete_node_failed': {u'name': u'set_delete_node_failed',
                                                                 u'on-success': u'send_message',
                                                                 u'publish': {u'message': u'<% task(delete_node).result %>',
                                                                              u'status': u'FAILED'},
                                                                 u'type': u'direct',
                                                                 u'version': u'2.0'}},
                          u'version': u'2.0'}},
 u'message': u"Failed to run action [action_ex_id=c2e44ffe-00fc-4131-b29c-981e33f50ea1, action_cls='<class 'mistral.actions.action_factory.ScaleDownAction'>', attributes='{}', params='{u'nodes': [u'6b2a2e71-f9c8-4d5b-aaf8-dada97c90821'], u'container': u'23e7c364-7303-4af6-b54d-cfbf1b737680', u'timeout': 240}']\n Environment not found [name=23e7c364-7303-4af6-b54d-cfbf1b737680]",
 u'status': u'FAILED'}

real    1m39.169s
user    0m0.530s
sys     0m0.104s
[stack@hci-director ~]$

Comment 8 John Fulton 2016-12-01 16:20:12 UTC
Because this is a duplicate of the following upstream bug, which already has a fix released in Ocata, I am marking this BZ as MODIFIED. 

 https://bugs.launchpad.net/tripleo/+bug/1640933

Here is the fix from Ocata: 

 https://review.openstack.org/#/c/398289/

If this is backported to Newton, then we could identify the fixed-in and set it to POST as a next step.

Comment 9 Jon Schlueter 2017-01-12 21:48:20 UTC
patch landed in stable/newton

Comment 10 Jon Schlueter 2017-01-12 21:54:37 UTC
Targeting bz to OSP 10 since it's in recent build

Comment 13 John Fulton 2017-01-17 13:15:49 UTC
No release notes required for this bug fix. Flags set.

Comment 16 errata-xmlrpc 2017-02-01 14:46:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0234.html