| Summary: | heat stack-delete of overcloud not cleaning up overcloud baremetal nodes | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | lokesh.jain |
| Component: | rhosp-director | Assignee: | Angus Thomas <athomas> |
| Status: | CLOSED NOTABUG | QA Contact: | Shai Revivo <srevivo> |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 7.0 (Kilo) | CC: | aguetta, athomas, dbecker, jcoufal, lokesh.jain, mburns, morazi, pcaruana, rhel-osp-director-maint, rkharwar, sbaker |
| Target Milestone: | --- | ||
| Target Release: | 10.0 (Newton) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-10-13 19:48:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
You shouldn't need to redo introspection at step 3, assuming the stack was deleted successfully it could be that some ironic nodes went to an ERROR state. At step 3, please confirm the following: "heat stack-list" is empty "nova list" is empty "ironic node-list" has all nodes in power off & available I am not able to confirm the previous stack-delete issue because I am running into this bug now: https://bugzilla.redhat.com//show_bug.cgi?id=1259834. Will update the bug when I am able to reproduce this. This bug did not make the OSP 8.0 release. It is being deferred to OSP 10. Able to reproduce this bug with relative ease. Steve: Here is the info you requested [stack@undercloud ~]$ heat stack-list +----+------------+--------------+---------------+--------------+ | id | stack_name | stack_status | creation_time | updated_time | +----+------------+--------------+---------------+--------------+ +----+------------+--------------+---------------+--------------+ [stack@undercloud ~]$ ironic node-list +--------------------------------------+-------------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+-------------------+--------------------------------------+-------------+--------------------+-------------+ | ce438c41-50e1-444b-a60c-3e6e0fdbae77 | over8-controller1 | None | power off | available | False | | 4aa8d990-3a93-4c4e-98e1-e9f9277a6a1e | over8-controller2 | 77d25a45-8671-41ec-a7e8-aeed5282e9d1 | power on | active | False | | 98324b7c-0730-4c37-838b-c85fef324a08 | over8-controller3 | 6303ec52-a8d8-4b34-9599-107cc576c521 | power on | deploy failed | False | | 98c8bd64-095a-4268-8f29-b1bfab41f250 | over8-ceph1 | None | power off | available | False | | 76e7fe8a-084a-4cfd-b00b-6ab116f4228b | over8-ceph2 | None | power off | available | False | | b3fc91c4-66a0-4a58-b981-60175f8ed6e4 | over8-ceph3 | a52d88f3-0ea2-4b8d-b558-fe234d6db1ff | power on | active | False | | dfb7ef4a-0576-4a38-b78d-cd7ade6c595b | over8-compute1 | c5fff957-af09-45b7-9fdb-b6ba780b883c | power on | active | False | +--------------------------------------+-------------------+--------------------------------------+-------------+--------------------+-------------+ [stack@undercloud ~]$ nova list +----+------+--------+------------+-------------+----------+ | ID | Name | Status | Task State | Power State | Networks | +----+------+--------+------------+-------------+----------+ +----+------+--------+------------+-------------+----------+ We have the same issue in OSPd 8 environment. The behavior is the same, the logs a bit different (different ralease), but the 'error 500' message: 2016-08-02 22:06:46 [ControllerClusterDeployment]: CREATE_COMPLETE state changed 2016-08-02 22:06:53 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2016-08-02 22:06:53 [NovaCompute]: DELETE_IN_PROGRESS state changed 2016-08-02 22:06:55 [NovaCompute]: DELETE_COMPLETE state changed 2016-08-02 22:07:13 [NovaCompute]: CREATE_IN_PROGRESS state changed 2016-08-02 22:14:09 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2016-08-02 22:14:09 [NovaCompute]: DELETE_IN_PROGRESS state changed 2016-08-02 22:14:12 [NovaCompute]: DELETE_COMPLETE state changed 2016-08-02 22:14:45 [NovaCompute]: CREATE_IN_PROGRESS state changed 2016-08-02 22:21:55 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2016-08-02 22:21:56 [overcloud-Compute-jinfa62mh2y4-0-wwcirrnn5yhi]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2016-08-02 22:21:57 [0]: CREATE_FAILED ResourceInError: resources[0].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2016-08-02 22:21:58 [overcloud-Compute-jinfa62mh2y4]: UPDATE_FAILED ResourceInError: resources[0].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" Stack overcloud CREATE_FAILED Heat Stack create failed. After the stack-delete, if there are no results from a "nova list" then as far as heat is aware the delete was a success. There may be manual cleanup required before attempting the next deploy. This cleanup will consist of looking at the output of "ironic node-list" and running ironic node commands to get all nodes back to a good state, specifically: - Any nodes in "Maintenance:True" need "ironic node-set-maintenance <node> False" - Any nodes in "Power State:power on" need "ironic node-set-power-state <node> off" - Any nodes not in "Provision State:available" need "ironic node-set-provision-state <node> deleted" You can then confirm that nove has the required capacity to deploy the cloud by running "nova hypervisor-stats". Later versions of OSP do an available node check before deploying the overcloud, which leads to early failure and a more obvious error message if there are not enough nodes available. Heat cleans up only nova instances, some manual clean up is required (especially ironic) as per comment #15. |
After heat stack-delete overcloud, baremetal nodes are still showing previous images and failing the subsequent deployment. Steps to reproduce: 1. Deploy overcloud on OSP-Director 7.2 with "openstack overcloud deploy --templates --control-scale 1 --compute-scale 2 --ceph-storage-scale 0 --block-storage-scale 0 --swift-storage-scale 0 --ntp-server pool.ntp.org" 2. Do a heat stack-delete overcloud to remove the previous deployment 3. Run introspection of the nodes "openstack baremetal introspection bulk start" 4. Re-deploy with the same nodes and same command "openstack overcloud deploy --templates --control-scale 1 --compute-scale 2 --ceph-storage-scale 0 --block-storage-scale 0 --swift-storage-scale 0 --ntp-server pool.ntp.org" After these steps, CREATE failed with: "message": "No valid host was found. Exceeded max scheduling attempts 3 for instance e10e19a2-aefb-4a98-9536-a81793773938. The nodes still had the image from the previous deployment. heat-api.log details: 2016-02-11 17:56:59.340 5612 INFO eventlet.wsgi.server [-] (5612) accepted ('192.0.2.1', 46766) 2016-02-11 17:56:59.342 5612 DEBUG heat.api.middleware.version_negotiation [-] Processing request: GET /v1/df00efd4218041be9abfc70e6c05f210/stacks Accept: application/json process_request /usr/lib/python2.7/site-packages/heat/api/middleware/version_negotiation.py:50 2016-02-11 17:56:59.342 5612 DEBUG heat.api.middleware.version_negotiation [-] Matched versioned URI. Version: 1.0 process_request /usr/lib/python2.7/site-packages/heat/api/middleware/version_negotiation.py:65 2016-02-11 17:56:59.343 5612 DEBUG keystoneclient.auth.identity.v2 [-] Making authentication request to http://192.0.2.1:35357/v2.0/tokens get_auth_ref /usr/lib/python2.7/site-packages/keystoneclient/auth/identity/v2.py:76 2016-02-11 17:56:59.512 5612 DEBUG keystoneclient.session [-] REQ: curl -g -i -X GET http://192.0.2.1:35357/v3/auth/tokens -H "X-Subject-Token: {SHA1}2984da1d915161a3f91a87e64c4bc1d3a7759427" -H "User-Agent: python-keystoneclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}03eedc6987f3137e1206798d66044c45fa1ba215" _http_log_request /usr/lib/python2.7/site-packages/keystoneclient/session.py:195 2016-02-11 17:56:59.599 5612 DEBUG keystoneclient.session [-] RESP: [200] content-length: 6349 x-subject-token: {SHA1}2984da1d915161a3f91a87e64c4bc1d3a7759427 vary: X-Auth-Token connection: keep-alive date: Thu, 11 Feb 2016 22:56:59 GMT content-type: application/json x-openstack-request-id: req-4e563a4d-050a-4f90-97c5-93c68d33e9cd RESP BODY: {"token": {"methods": ["password", "token"], "roles": [{"id": "9fe2ff9ee4384b1894a90878d3e92bab", "name": "_member_"}, {"id": "1f36114af125490e964851e05a972259", "name": "admin"}], "expires_at": "2016-02-12T02:56:59.000000Z", "project": {"domain": {"id": "default", "name": "Default"}, "id": "df00efd4218041be9abfc70e6c05f210", "name": "admin"}, "catalog": "<removed>", "extras": {}, "user": {"domain": {"id": "default", "name": "Default"}, "id": "b56faf710c2f476bad2199f2fc6d8127", "name": "admin"}, "audit_ids": ["ypUarebwRgOQZa0zeWMl1g"], "issued_at": "2016-02-11T22:56:59.316033"}} _http_log_response /usr/lib/python2.7/site-packages/keystoneclient/session.py:224 2016-02-11 17:56:59.603 5612 DEBUG heat.openstack.common.policy [req-0839eff5-5166-4694-8b63-a5c72cf10bcf b56faf710c2f476bad2199f2fc6d8127 df00efd4218041be9abfc70e6c05f210] Rules successfully reloaded _load_policy_file /usr/lib/python2.7/site-packages/heat/openstack/common/policy.py:295 2016-02-11 17:56:59.604 5612 INFO heat.openstack.common.policy [req-0839eff5-5166-4694-8b63-a5c72cf10bcf b56faf710c2f476bad2199f2fc6d8127 df00efd4218041be9abfc70e6c05f210] Can not find policy directory: policy.d 2016-02-11 17:56:59.605 5612 DEBUG heat.common.wsgi [req-0839eff5-5166-4694-8b63-a5c72cf10bcf b56faf710c2f476bad2199f2fc6d8127 df00efd4218041be9abfc70e6c05f210] Calling <heat.api.openstack.v1.stacks.StackController object at 0x3152890> : index __call__ /usr/lib/python2.7/site-packages/heat/common/wsgi.py:667 2016-02-11 17:56:59.606 5612 INFO heat.openstack.common.policy [req-0839eff5-5166-4694-8b63-a5c72cf10bcf b56faf710c2f476bad2199f2fc6d8127 df00efd4218041be9abfc70e6c05f210] Can not find policy directory: policy.d 2016-02-11 17:56:59.607 5612 DEBUG oslo_messaging._drivers.amqpdriver [req-0839eff5-5166-4694-8b63-a5c72cf10bcf b56faf710c2f476bad2199f2fc6d8127 df00efd4218041be9abfc70e6c05f210] MSG_ID is a2430edbb84c472da9fe38ed61fff872 _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:311 2016-02-11 17:56:59.607 5612 DEBUG oslo_messaging._drivers.amqp [req-0839eff5-5166-4694-8b63-a5c72cf10bcf b56faf710c2f476bad2199f2fc6d8127 df00efd4218041be9abfc70e6c05f210] UNIQUE_ID is 209c449205d44d46a118c994a1bd8513. _add_unique_id /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqp.py:258 2016-02-11 17:56:59.691 5612 DEBUG heat.common.serializers [req-0839eff5-5166-4694-8b63-a5c72cf10bcf b56faf710c2f476bad2199f2fc6d8127 df00efd4218041be9abfc70e6c05f210] JSON response : {"stacks": [{"parent": null, "description": "Nova API,Keystone,Heat Engine and API,Glance,Neutron,Dedicated MySQL server,Dedicated RabbitMQ Server,Group of Nova Computes\n", "links": [{"href": "http://192.0.2.1:8004/v1/df00efd4218041be9abfc70e6c05f210/stacks/overcloud/29f0da81-5e96-43ef-83cc-c31ca9f127f0", "rel": "self"}], "stack_status_reason": "Resource CREATE failed: ResourceInError: resources.Compute.resources[0].resources.NovaCompute: Went to status ERROR due to \"Message: No valid host was found. Exceeded max scheduling attempts 3 for instance 4be4a480-a82b-45d0-b574-e7b558cf600c. Last exception: [u'Traceback (most recent call last): \\n', u' File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 2261, in _do, Code: 500\"", "stack_name": "overcloud", "stack_user_project_id": "39c7227225eb489b8282175e54aff5cc", "creation_time": "2016-02-05T18:14:44Z", "updated_time": null, "stack_owner": "admin", "stack_status": "CREATE_FAILED", "id": "29f0da81-5e96-43ef-83cc-c31ca9f127f0"}]} to_json /usr/lib/python2.7/site-packages/heat/common/serializers.py:42 2016-02-11 17:56:59.692 5612 INFO eventlet.wsgi.server [req-0839eff5-5166-4694-8b63-a5c72cf10bcf b56faf710c2f476bad2199f2fc6d8127 df00efd4218041be9abfc70e6c05f210] 192.0.2.1 - - [11/Feb/2016 17:56:59] "GET /v1/df00efd4218041be9abfc70e6c05f210/stacks HTTP/1.1" 200 1228 0.351209