Bug 1312989
Summary: | Failed to delete overcloud stack: stuck on RHELUnregistrationDeployment ressource | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Guillaume Chenuet <gchenuet> |
Component: | rhosp-director | Assignee: | Angus Thomas <athomas> |
Status: | CLOSED DUPLICATE | QA Contact: | Arik Chernetsky <achernet> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.0 (Kilo) | CC: | athomas, dbecker, djuran, ipilcher, jslagle, mburns, morazi, mschuppe, rhel-osp-director-maint, sbaker, shardy, srevivo |
Target Milestone: | --- | ||
Target Release: | 10.0 (Newton) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-10-14 15:23:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Guillaume Chenuet
2016-02-29 17:15:47 UTC
Heat doesn't unregister the server; a shell script triggered by Heat unregisters the server. We'd need a lot more information about why it is not doing so to diagnose this, starting with the output of the script (which is available from "heat deployment-show"). A good bet is a network connectivity problem though. This bug did not make the OSP 8.0 release. It is being deferred to OSP 10. Hi, one scenario I have reproduced this is e.g. when a node fails for some reason, e.g. defect disc and try to remove the failed node: [stack@undercloud ~]$ nova list +--------------------------------------+----------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+----------------+--------+------------+-------------+-----------------------+ | b4c62f25-7f12-4e1b-9fcd-3046631c295c | BALOSCOMPINT00 | ACTIVE | - | Running | ctlplane=10.254.17.37 | | 7f7296c5-faeb-42d6-82c7-c3eb144de8e8 | BALOSCOMPINT01 | ACTIVE | - | Running | ctlplane=10.254.17.38 | | abf95736-75b5-4d48-a02b-f4f641eab188 | BALOSCTL00 | ACTIVE | - | Running | ctlplane=10.254.17.39 | +--------------------------------------+----------------+--------+------------+-------------+-----------------------+ [stack@undercloud ~]$ ironic node-list +--------------------------------------+-----------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+-----------------+--------------------------------------+-------------+--------------------+-------------+ | 955a408e-8cf1-41a5-a777-d630b6cbe533 | overcloud-node3 | 7f7296c5-faeb-42d6-82c7-c3eb144de8e8 | power on | active | False | | 8aca0579-add8-40a7-947c-5e1b8171e465 | overcloud-node4 | None | power off | available | False | | a34cbd6d-6102-40ae-b20a-a6f886a53cc1 | overcloud-node5 | None | power off | available | False | | 8a604223-779a-496f-b374-3b7fe7ffe000 | overcloud-node2 | b4c62f25-7f12-4e1b-9fcd-3046631c295c | power on | active | False | | b5a50354-ac17-47dd-a8b7-aa8528d2070e | overcloud-node1 | abf95736-75b5-4d48-a02b-f4f641eab188 | power on | active | False | +--------------------------------------+-----------------+--------------------------------------+-------------+--------------------+-------------+ => let's use BALOSCOMPINT01 and fail the disk -> overcloud-node3 # virsh destroy overcloud-node3 Domain overcloud-node3 destroyed Modified HD source file: <source file='/var/lib/libvirt/images/overcloud-node3.qcow2'/> <source file='/var/lib/libvirt/images/overcloud-node3-disable.qcow2'/> Power state changed to "power off" in ironic: [stack@undercloud ~]$ ironic node-list +--------------------------------------+-----------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+-----------------+--------------------------------------+-------------+--------------------+-------------+ | 955a408e-8cf1-41a5-a777-d630b6cbe533 | overcloud-node3 | 7f7296c5-faeb-42d6-82c7-c3eb144de8e8 | power off | active | False | | 8aca0579-add8-40a7-947c-5e1b8171e465 | overcloud-node4 | None | power off | available | False | | a34cbd6d-6102-40ae-b20a-a6f886a53cc1 | overcloud-node5 | None | power off | available | False | | 8a604223-779a-496f-b374-3b7fe7ffe000 | overcloud-node2 | b4c62f25-7f12-4e1b-9fcd-3046631c295c | power on | active | False | | b5a50354-ac17-47dd-a8b7-aa8528d2070e | overcloud-node1 | abf95736-75b5-4d48-a02b-f4f641eab188 | power on | active | False | +--------------------------------------+-----------------+--------------------------------------+-------------+--------------------+-------------+ $ cat delete-node.sh cd /home/stack openstack overcloud node delete --stack overcloud --templates \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/templates/network-environment.yaml \ -e /home/stack/templates/rhel-registration/environment-rhel-registration.yaml \ -e /home/stack/templates/rhel-registration/rhel-registration-resource-registry.yaml \ 7f7296c5-faeb-42d6-82c7-c3eb144de8e8 [stack@undercloud ~]$ sh ./delete-node.sh deleting nodes ['7f7296c5-faeb-42d6-82c7-c3eb144de8e8'] from stack overcloud 2016-08-25 05:54:45.616 11994 DEBUG heat.engine.scheduler [-] Task DependencyTaskGroup((destroy_resource) {SoftwareConfig "RHELRegistration" [895cf747-4185-4712-a2fe-6f434e2a67c1] Stack "overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3-NodeExtraConfig-4p4qqcxsjrzc" [7022d299-7d4e-40ba -b142-535a2ac40740]: {}, SoftwareDeployment "RHELRegistrationDeployment" [eed44451-e252-4dc7-a391-57fb1d11b54a] Stack "overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3-NodeExtraConfig-4p4qqcxsjrzc" [7022d299-7d4e-40ba-b142-535a2ac40740]: {SoftwareConfig "RHELRegistration" [895cf747-41 85-4712-a2fe-6f434e2a67c1] Stack "overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3-NodeExtraConfig-4p4qqcxsjrzc" [7022d299-7d4e-40ba-b142-535a2ac40740]}, SoftwareDeployment "RHELUnregistrationDeployment" Stack "overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3-NodeExtraConfig-4p4qqcxsjrzc" [7022d299-7d4e-40ba-b142-535a2ac40740]: {SoftwareConfig "RHELUnregistration" [a829d9d5-d2ed-411a-8e32-a064dca67ade] Stack "overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3-NodeExtraConfig-4p4qqcxsjrzc" [7022d299-7d4e-40ba-b142-535a2ac40740]}, SoftwareConfig "RHELUnregistration" [a829d9d5-d2ed-411a-8e32-a064dca67ade] Stack "overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3-NodeExtraConfig-4p4qqcxsjrzc" [7022d299-7d4e-40ba-b142-535a2ac40740]: {}}) running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:214 After timeout update fails: [stack@undercloud rhel-registration]$ heat stack-list +--------------------------------------+------------+---------------+---------------------+---------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+---------------+---------------------+---------------------+ | 1a53f935-d362-4a10-9230-e70b27a96fde | overcloud | UPDATE_FAILED | 2016-08-25T08:08:42 | 2016-08-25T09:27:16 | +--------------------------------------+------------+---------------+---------------------+---------------------+ We see that the unregister is still in progress: [stack@undercloud rhel-registration]$ heat resource-list -n 5 overcloud | grep -v COMPLE +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+--------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+--------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | f1023c5b-6b32-445c-98bd-8d61027eb43e | OS::TripleO::Compute | DELETE_FAILED | 2016-08-25T08:09:10 | overcloud-Compute-sogzfuruohoh | | NodeExtraConfig | 7022d299-7d4e-40ba-b142-535a2ac40740 | OS::TripleO::NodeExtraConfig | DELETE_IN_PROGRESS | 2016-08-25T08:09:11 | overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3 | | RHELUnregistrationDeployment | 47b05a76-c70b-4fdc-b17f-1dbcb3bd3372 | OS::Heat::SoftwareDeployment | DELETE_IN_PROGRESS | 2016-08-25T08:18:15 | overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3-NodeExtraConfig-4p4qqcxsjrzc | | Compute | 1eb1bdad-5044-41bf-b52a-ef7ad34b4eb7 | OS::Heat::ResourceGroup | UPDATE_FAILED | 2016-08-25T09:27:55 | overcloud | +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+--------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ Restarted heat engine to get the nested actions in PROGRESS to fail, too [stack@undercloud rhel-registration]$ heat resource-list -n 5 overcloud | grep -v COMPLE +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+-----------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+-----------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | f1023c5b-6b32-445c-98bd-8d61027eb43e | OS::TripleO::Compute | DELETE_FAILED | 2016-08-25T08:09:10 | overcloud-Compute-sogzfuruohoh | | NodeExtraConfig | 7022d299-7d4e-40ba-b142-535a2ac40740 | OS::TripleO::NodeExtraConfig | DELETE_FAILED | 2016-08-25T08:09:11 | overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3 | | RHELUnregistrationDeployment | 47b05a76-c70b-4fdc-b17f-1dbcb3bd3372 | OS::Heat::SoftwareDeployment | DELETE_FAILED | 2016-08-25T08:18:15 | overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3-NodeExtraConfig-4p4qqcxsjrzc | | Compute | 1eb1bdad-5044-41bf-b52a-ef7ad34b4eb7 | OS::Heat::ResourceGroup | UPDATE_FAILED | 2016-08-25T09:27:55 | overcloud | +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+-----------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ Tried to remove the instance from nova and delete then the node from heat. This procedure worked e.g. when the systems IPMI interface is also not reachable due to defect or such: [stack@undercloud ~]$ nova delete 7f7296c5-faeb-42d6-82c7-c3eb144de8e8 Request to delete server 7f7296c5-faeb-42d6-82c7-c3eb144de8e8 has been accepted. [stack@undercloud ~]$ ironic node-delete 955a408e-8cf1-41a5-a777-d630b6cbe533 Deleted node 955a408e-8cf1-41a5-a777-d630b6cbe533 [stack@undercloud ~]$ nova list +--------------------------------------+----------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+----------------+--------+------------+-------------+-----------------------+ | b4c62f25-7f12-4e1b-9fcd-3046631c295c | BALOSCOMPINT00 | ACTIVE | - | Running | ctlplane=10.254.17.37 | | abf95736-75b5-4d48-a02b-f4f641eab188 | BALOSCTL00 | ACTIVE | - | Running | ctlplane=10.254.17.39 | +--------------------------------------+----------------+--------+------------+-------------+-----------------------+ [stack@undercloud ~]$ ironic node-list +--------------------------------------+-----------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+-----------------+--------------------------------------+-------------+--------------------+-------------+ | 8aca0579-add8-40a7-947c-5e1b8171e465 | overcloud-node4 | None | power off | available | False | | a34cbd6d-6102-40ae-b20a-a6f886a53cc1 | overcloud-node5 | None | power off | available | False | | 8a604223-779a-496f-b374-3b7fe7ffe000 | overcloud-node2 | b4c62f25-7f12-4e1b-9fcd-3046631c295c | power on | active | False | | b5a50354-ac17-47dd-a8b7-aa8528d2070e | overcloud-node1 | abf95736-75b5-4d48-a02b-f4f641eab188 | power on | active | False | +--------------------------------------+-----------------+--------------------------------------+-------------+--------------------+-------------+ [stack@undercloud ~]$ sh ./delete-node.sh deleting nodes ['7f7296c5-faeb-42d6-82c7-c3eb144de8e8'] from stack overcloud Two objects are equal when all of the attributes are equal, if you want to identify whether two objects are same one with same id, please use is_same_obj() function. => same result [stack@undercloud ~]$ heat stack-list +--------------------------------------+------------+---------------+---------------------+---------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+---------------+---------------------+---------------------+ | 1a53f935-d362-4a10-9230-e70b27a96fde | overcloud | UPDATE_FAILED | 2016-08-25T08:08:42 | 2016-08-25T14:55:33 | +--------------------------------------+------------+---------------+---------------------+---------------------+ [stack@undercloud ~]$ heat resource-list -n 5 overcloud | grep -v COMPLE +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+--------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+--------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | f1023c5b-6b32-445c-98bd-8d61027eb43e | OS::TripleO::Compute | DELETE_FAILED | 2016-08-25T08:09:10 | overcloud-Compute-sogzfuruohoh | | NodeExtraConfig | 7022d299-7d4e-40ba-b142-535a2ac40740 | OS::TripleO::NodeExtraConfig | DELETE_FAILED | 2016-08-25T08:09:11 | overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3 | | RHELUnregistrationDeployment | 47b05a76-c70b-4fdc-b17f-1dbcb3bd3372 | OS::Heat::SoftwareDeployment | DELETE_IN_PROGRESS | 2016-08-25T08:18:15 | overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3-NodeExtraConfig-4p4qqcxsjrzc | | Compute | 1eb1bdad-5044-41bf-b52a-ef7ad34b4eb7 | OS::Heat::ResourceGroup | UPDATE_FAILED | 2016-08-25T14:56:08 | overcloud | +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+--------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ It also seems that the delete of the resource do not time out. Delete of the unregister is still in progress when I came in the next morning: [stack@undercloud ~]$ date Fri Aug 26 03:05:35 EDT 2016 => 9:05 CEST [stack@undercloud ~]$ heat resource-list -n 5 overcloud | grep -v COMPLE +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+--------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+--------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | f1023c5b-6b32-445c-98bd-8d61027eb43e | OS::TripleO::Compute | DELETE_FAILED | 2016-08-25T08:09:10 | overcloud-Compute-sogzfuruohoh | | NodeExtraConfig | 7022d299-7d4e-40ba-b142-535a2ac40740 | OS::TripleO::NodeExtraConfig | DELETE_IN_PROGRESS | 2016-08-25T08:09:11 | overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3 | | RHELUnregistrationDeployment | 47b05a76-c70b-4fdc-b17f-1dbcb3bd3372 | OS::Heat::SoftwareDeployment | DELETE_IN_PROGRESS | 2016-08-25T08:18:15 | overcloud-Compute-sogzfuruohoh-1-hlpidtyp3gs3-NodeExtraConfig-4p4qqcxsjrzc | | Compute | 1eb1bdad-5044-41bf-b52a-ef7ad34b4eb7 | OS::Heat::ResourceGroup | UPDATE_FAILED | 2016-08-25T14:59:25 | overcloud | +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------------------------------------+--------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ [stack@undercloud ~]$ heat resource-list 7022d299-7d4e-40ba-b142-535a2ac40740 +------------------------------+--------------------------------------+------------------------------+--------------------+---------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | +------------------------------+--------------------------------------+------------------------------+--------------------+---------------------+ | RHELRegistration | | OS::Heat::SoftwareConfig | DELETE_COMPLETE | 2016-08-25T08:18:15 | | RHELRegistrationDeployment | | OS::Heat::SoftwareDeployment | DELETE_COMPLETE | 2016-08-25T08:18:15 | | RHELUnregistration | a829d9d5-d2ed-411a-8e32-a064dca67ade | OS::Heat::SoftwareConfig | CREATE_COMPLETE | 2016-08-25T08:18:15 | | RHELUnregistrationDeployment | 47b05a76-c70b-4fdc-b17f-1dbcb3bd3372 | OS::Heat::SoftwareDeployment | DELETE_IN_PROGRESS | 2016-08-25T08:18:15 | +------------------------------+--------------------------------------+------------------------------+--------------------+---------------------+ [stack@undercloud ~]$ heat deployment-list |grep -v COMPLETE +--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+---------------------+-----------------------+ | id | config_id | server_id | action | status | creation_time | status_reason | +--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+---------------------+-----------------------+ | 47b05a76-c70b-4fdc-b17f-1dbcb3bd3372 | 93cc2115-9f8f-4d43-8352-2d87903e0c3c | 7f7296c5-faeb-42d6-82c7-c3eb144de8e8 | DELETE | IN_PROGRESS | 2016-08-25T09:28:00 | Deploy data available | +--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+---------------------+-----------------------+ [stack@undercloud ~]$ heat deployment-show 47b05a76-c70b-4fdc-b17f-1dbcb3bd3372 { "status": "IN_PROGRESS", "server_id": "7f7296c5-faeb-42d6-82c7-c3eb144de8e8", "config_id": "93cc2115-9f8f-4d43-8352-2d87903e0c3c", "output_values": null, "creation_time": "2016-08-25T09:28:00", "updated_time": "2016-08-25T14:59:30", "input_values": {}, "action": "DELETE", "status_reason": "Deploy data available", "id": "47b05a76-c70b-4fdc-b17f-1dbcb3bd3372" } How could we cleanup the stack in case such a system is broken and the unregister script can not be run any longer? BTW, steps done in comment4 were done on an OSP8 env This seem to be a duplicate of 1313885 . The workaround mentioned there works for me. *** This bug has been marked as a duplicate of bug 1313885 *** |