Bug 1399429
Summary: | Unable to delete overcloud node when identifying --stack by UUID, using name works however | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | John Fulton <johfulto> |
Component: | python-tripleoclient | Assignee: | RHOS Maint <rhos-maint> |
Status: | CLOSED ERRATA | QA Contact: | Gurenko Alex <agurenko> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 10.0 (Newton) | CC: | agurenko, ddomingo, hbrock, johfulto, jschluet, jslagle, mburns, mcornea, rhel-osp-director-maint |
Target Milestone: | z1 | Keywords: | Triaged, ZStream |
Target Release: | 10.0 (Newton) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | python-tripleoclient-5.4.1-1.el7ost | Doc Type: | No Doc Update |
Doc Text: |
None
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-02-01 14:46:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
John Fulton
2016-11-29 04:33:58 UTC
I was able to delete my extra node by updating the Heat templates to shrink the node count and then re-running the deploy. Is my issue really just a docbug? Can someone confirm this is the recommended way to delete an overcloud node and then the BZ can be changed to a docbug? Details: I had originally set my OsdComputeCount to 4 and had pre-assigned IPs for the 4th node (overcloud-compute-3). I then just decremented the count and commented out the extra IPs: [stack@hci-director ~]$ egrep "\#|OsdComputeCount" ~/custom-templates/layout.yaml OsdComputeCount: 3 #- 192.168.2.206 #- 192.168.3.206 #- 172.16.1.206 #- 172.16.2.206 [stack@hci-director ~]$ From there I just re-ran my deploy script [1] and after that I had the desired behavior. Because the node to be deleted was a Ceph OSD, I had followed our procedure to manually remove the OSDs from the Ceph cluster as per our doc [2]. The same doc will also need an update since it suggests using `openstack overcloud node delete`. Here [3] are details on my overcloud after running the deploy again with the node count changed. [1] https://github.com/RHsyseng/hci/blob/master/scripts/deploy.sh [2] https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud#Replacing_Ceph_Storage_Nodes [3] [stack@hci-director ~]$ openstack server list +--------------------------------------+-------------------------+--------+-----------------------+----------------+ | ID | Name | Status | Networks | Image Name | +--------------------------------------+-------------------------+--------+-----------------------+----------------+ | fc8686c1-a675-4c89-a508-cc1b34d5d220 | overcloud-controller-2 | ACTIVE | ctlplane=192.168.1.37 | overcloud-full | | 7c6ae5f3-7e18-4aa2-a1f8-53145647a3de | overcloud-osd-compute-2 | ACTIVE | ctlplane=192.168.1.30 | overcloud-full | | 851f76db-427c-42b3-8e0b-e8b4b19770f8 | overcloud-controller-0 | ACTIVE | ctlplane=192.168.1.33 | overcloud-full | | e2906507-6a06-4c4d-bd15-9f7de455e91d | overcloud-controller-1 | ACTIVE | ctlplane=192.168.1.29 | overcloud-full | | 0f93a712-b9eb-4f42-bc05-f2c8c2edfd81 | overcloud-osd-compute-0 | ACTIVE | ctlplane=192.168.1.32 | overcloud-full | | 8f266c17-ff39-422e-a935-effb219c7782 | overcloud-osd-compute-1 | ACTIVE | ctlplane=192.168.1.24 | overcloud-full | +--------------------------------------+-------------------------+--------+-----------------------+----------------+ [stack@hci-director ~]$ [stack@hci-director ~]$ openstack stack list +--------------------------------------+------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | 23e7c364-7303-4af6-b54d-cfbf1b737680 | overcloud | UPDATE_COMPLETE | 2016-11-24T03:24:56Z | 2016-11-29T04:47:03Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+ [stack@hci-director ~]$ As per shardy the following is the correct way to delete a node, even if it's from a custom role: `openstack overcloud node delete --stack $ID $node_id` The above should be run without any -r or -e options. I will test this next and update the bug. I was able to delete a node but I had to provide the stack name "overcloud" and not the UUID. Here's an example of it working: 1. Identify the node ID [stack@hci-director ~]$ openstack server list | grep osd-compute-3 | 6b2a2e71-f9c8-4d5b-aaf8-dada97c90821 | overcloud-osd-compute-3 | ACTIVE | ctlplane=192.168.1.27 | overcloud-full | [stack@hci-director ~]$ 2. Start a Mistral workflow to delete the node by ID from the stack by name: [stack@hci-director ~]$ time openstack overcloud node delete --stack overcloud 6b2a2e71-f9c8-4d5b-aaf8-dada97c90821 deleting nodes [u'6b2a2e71-f9c8-4d5b-aaf8-dada97c90821'] from stack overcloud Started Mistral Workflow. Execution ID: 396f123d-df5b-4f37-b137-83d33969b52b real 1m50.662s user 0m0.563s sys 0m0.099s [stack@hci-director ~]$ 3. Observe that the stack is being updated: [stack@hci-director ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+--------------------+----------------------+----------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+--------------------+----------------------+----------------------+ | 23e7c364-7303-4af6-b54d-cfbf1b737680 | overcloud | UPDATE_IN_PROGRESS | 2016-11-24T03:24:56Z | 2016-11-30T17:16:48Z | +--------------------------------------+------------+--------------------+----------------------+----------------------+ [stack@hci-director ~]$ 4. Observe that the update is complete: [stack@hci-director ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+-----------------+----------------------+----------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | 23e7c364-7303-4af6-b54d-cfbf1b737680 | overcloud | UPDATE_COMPLETE | 2016-11-24T03:24:56Z | 2016-11-30T17:16:48Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+ [stack@hci-director ~]$ 5. Observe that the node was deleted as desired. [stack@hci-director ~]$ nova list +--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+ | 851f76db-427c-42b3-8e0b-e8b4b19770f8 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.1.33 | | e2906507-6a06-4c4d-bd15-9f7de455e91d | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.1.29 | | fc8686c1-a675-4c89-a508-cc1b34d5d220 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.1.37 | | 0f93a712-b9eb-4f42-bc05-f2c8c2edfd81 | overcloud-osd-compute-0 | ACTIVE | - | Running | ctlplane=192.168.1.32 | | 8f266c17-ff39-422e-a935-effb219c7782 | overcloud-osd-compute-1 | ACTIVE | - | Running | ctlplane=192.168.1.24 | | 7c6ae5f3-7e18-4aa2-a1f8-53145647a3de | overcloud-osd-compute-2 | ACTIVE | - | Running | ctlplane=192.168.1.30 | +--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+ [stack@hci-director ~]$ Warning: if you identify the stack by its UUID, as I did originally, you may run into the issue below. Note in the first line of output from the command below, that it correctly identifies the node number and the stack number but is unable to find the environment by name: "Environment not found [name=23e7c364-7303-4af6-b54d-cfbf1b737680]". So I think this is a minor bug and I'll update the title as the workaround is simple. [stack@hci-director ~]$ nova_id=$(openstack server list | grep compute-3 | awk {'print $2'} | egrep -vi 'id|^$') [stack@hci-director ~]$ stack_id=$(openstack stack list | awk {'print $2'} | egrep -vi 'id|^$') [stack@hci-director ~]$ time openstack overcloud node delete --stack $stack_id $nova_id deleting nodes [u'6b2a2e71-f9c8-4d5b-aaf8-dada97c90821'] from stack 23e7c364-7303-4af6-b54d-cfbf1b737680 Started Mistral Workflow. Execution ID: 4864b1df-a170-4d51-b411-79f839d11ecd {u'execution': {u'id': u'4864b1df-a170-4d51-b411-79f839d11ecd', u'input': {u'container': u'23e7c364-7303-4af6-b54d-cfbf1b737680', u'nodes': [u'6b2a2e71-f9c8-4d5b-aaf8-dada97c90821'], u'queue_name': u'b0c40c06-be37-402d-9636-6071ba3e28b2', u'timeout': 240}, u'name': u'tripleo.scale.v1.delete_node', u'params': {}, u'spec': {u'description': u'deletes given overcloud nodes and updates the stack', u'input': [u'container', u'nodes', {u'timeout': 240}, {u'queue_name': u'tripleo'}], u'name': u'delete_node', u'tasks': {u'delete_node': {u'action': u'tripleo.scale.delete_node nodes=<% $.nodes %> timeout=<% $.timeout %> container=<% $.container %>', u'name': u'delete_node', u'on-error': u'set_delete_node_failed', u'on-success': u'send_message', u'type': u'direct', u'version': u'2.0'}, u'send_message': {u'action': u'zaqar.queue_post', u'input': {u'messages': {u'body': {u'payload': {u'execution': u'<% execution() %>', u'message': u"<% $.get('message', '') %>", u'status': u"<% $.get('status', 'SUCCESS') %>"}, u'type': u'tripleo.scale.v1.delete_node'}}, u'queue_name': u'<% $.queue_name %>'}, u'name': u'send_message', u'retry': u'count=5 delay=1', u'type': u'direct', u'version': u'2.0'}, u'set_delete_node_failed': {u'name': u'set_delete_node_failed', u'on-success': u'send_message', u'publish': {u'message': u'<% task(delete_node).result %>', u'status': u'FAILED'}, u'type': u'direct', u'version': u'2.0'}}, u'version': u'2.0'}}, u'message': u"Failed to run action [action_ex_id=c2e44ffe-00fc-4131-b29c-981e33f50ea1, action_cls='<class 'mistral.actions.action_factory.ScaleDownAction'>', attributes='{}', params='{u'nodes': [u'6b2a2e71-f9c8-4d5b-aaf8-dada97c90821'], u'container': u'23e7c364-7303-4af6-b54d-cfbf1b737680', u'timeout': 240}']\n Environment not found [name=23e7c364-7303-4af6-b54d-cfbf1b737680]", u'status': u'FAILED'} real 1m39.169s user 0m0.530s sys 0m0.104s [stack@hci-director ~]$ Because this is a duplicate of the following upstream bug, which already has a fix released in Ocata, I am marking this BZ as MODIFIED. https://bugs.launchpad.net/tripleo/+bug/1640933 Here is the fix from Ocata: https://review.openstack.org/#/c/398289/ If this is backported to Newton, then we could identify the fixed-in and set it to POST as a next step. patch landed in stable/newton Targeting bz to OSP 10 since it's in recent build No release notes required for this bug fix. Flags set. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0234.html |