Description of problem: I wanted to delete a stack, which was in CREATE_FAILED due to bad configuration of network isolation (we attempted to deploy without vlans, only by overloading IPs on the same interface, but os-config doesn't support that). So I started with stack-delete and got a DELETE_FAILED (that happens almost every time even when deleting perfectly healthy stacks). I ran "nova list" to see if any nodes are in error - and indeed one node was in error state (the other 3 nodes were still in ACTIVE state). I deleted the error node with "nova delete" and ran stack-delete again. The 2nd stack-delete also failed and "nova list" still showed the remaining nodes in ACTIVE. To find out which resource is in error and preventing the stack from deleting I tried "heat resource-list" - and if complained that the stack is not found! I tried to list the resources of the stack several times, using the stack name and also trying with the stack's uuid - but when doing resource-list the stack is not found. So I tried it with --debug and got: [stack@puma01 ~]$ heat --debug resource-list overcloud DEBUG (session) REQ: curl -g -i -X GET http://192.0.2.1:5000/v2.0 -H "Accept: application/json" -H "User-Agent: python-keystoneclient" INFO (connectionpool) Starting new HTTP connection (1): 192.0.2.1 DEBUG (connectionpool) "GET /v2.0 HTTP/1.1" 200 335 DEBUG (session) RESP: [200] content-length: 335 vary: X-Auth-Token connection: keep-alive date: Sun, 19 Jul 2015 06:22:50 GMT content-type: application/json x-openstack-request-id: req-dd84bdd4-d189-4ed9-9a62-dcb3a85d8b63 RESP BODY: {"version": {"status": "stable", "updated": "2014-04-17T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://192.0.2.1:5000/v2.0/", "rel": "self"}, {"href": "http://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}} DEBUG (v2) Making authentication request to http://192.0.2.1:5000/v2.0/tokens DEBUG (connectionpool) "POST /v2.0/tokens HTTP/1.1" 200 3493 DEBUG (session) REQ: curl -g -i -X GET http://192.0.2.1:8004/v1/97cdf91392364c69b473b4d2c63f971b/stacks/overcloud/resources -H "User-Agent: python-heatclient" -H "Content-Type: application/json" -H "X-Auth-Url: http://192.0.2.1:5000/v2.0" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}afe9cc4604b09047df055dceffddb07c1c5ad2c1" INFO (connectionpool) Starting new HTTP connection (1): 192.0.2.1 DEBUG (connectionpool) "GET /v1/97cdf91392364c69b473b4d2c63f971b/stacks/overcloud/resources HTTP/1.1" 302 201 DEBUG (session) RESP: [302] content-length: 201 connection: keep-alive location: http://192.0.2.1:8004/v1/97cdf91392364c69b473b4d2c63f971b/stacks/overcloud/4c5abe97-c4fe-4ed7-b963-7877b1974f06/resources date: Sun, 19 Jul 2015 06:22:50 GMT content-type: text/plain; charset=UTF-8 x-openstack-request-id: req-6fb352b5-4f90-45fe-899c-6591d444bbcf RESP BODY: 302 Found The resource was found at http://192.0.2.1:8004/v1/97cdf91392364c69b473b4d2c63f971b/stacks/overcloud/4c5abe97-c4fe-4ed7-b963-7877b1974f06/resources; you should be redirected automatically. DEBUG (connectionpool) "GET /v1/97cdf91392364c69b473b4d2c63f971b/stacks/overcloud/4c5abe97-c4fe-4ed7-b963-7877b1974f06/resources HTTP/1.1" 404 970 DEBUG (session) RESP: Traceback (most recent call last): File "/usr/bin/heat", line 10, in <module> sys.exit(main()) File "/usr/lib/python2.7/site-packages/heatclient/shell.py", line 705, in main HeatShell().main(args) File "/usr/lib/python2.7/site-packages/heatclient/shell.py", line 655, in main args.func(client, args) File "/usr/lib/python2.7/site-packages/heatclient/v1/shell.py", line 719, in do_resource_list raise exc.CommandError(_('Stack not found: %s') % args.id) heatclient.exc.CommandError: Stack not found: overcloud Version-Release number of selected component (if applicable): openstack-heat-api-2015.1.0-4.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch openstack-heat-common-2015.1.0-4.el7ost.noarch python-heatclient-0.6.0-1.el7ost.noarch openstack-heat-engine-2015.1.0-4.el7ost.noarch openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-35.el7ost.noarch openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch How reproducible: may never be reproduced again :( Steps to Reproduce: 1. Delete a stack. If it fails - delete the error nodes from nova and try the stack-delete again. If it fails again - see if you see resources in error when you run resource-list. Actual results: The stack is not found when you run resource-list, but still exists (in DELETE_FAILED state) when you run stack-list. Expected results: When I delete a stack it should get deleted without me having to help it resource-by-resource.
I've seen this and have been trying to write some standalone templates to reproduce. Here is what I have learnt so far. 1. The overcloud stack somehow gets itself into a state where a ResourceGroup has resources defined in its template, but no corresponding Resource database records. 2. Calling heat resource-list overcloud is triggering a NotFound on listing resources in the nested stack (the db api raises NotFound for a zero resource count, which is stupid but should be caught in any code calling it.) 3. This NotFound exception is floating all the way up and being seen as an overcloud stack NotFound, which it isn't. I tried replicating this by writing a simple template which contained a ResourceGroup of things, then deleting the Resource database records, but it didn't trigger the issue. I will continue to look for a simple way to replicate this so a bug can be raised upstream.
Note that the issue of Heat failing to delete the servers from Nova on the first attempt is covered by bug 1242796 - we believe there's an upstream fix available for that part.
I've found an easy way to reproduce this. - skip baremetal introspection steps - deploy overcloud - wait for deploy to fail because servers will go to error - heat stack-delete overcloud - wait for delete to fail - heat resource-list overcloud
Verified in: openstack-heat-common-2015.1.1-1.el7ost.noarch openstack-heat-api-cfn-2015.1.1-1.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch openstack-heat-api-cloudwatch-2015.1.1-1.el7ost.noarch openstack-heat-api-2015.1.1-1.el7ost.noarch openstack-heat-engine-2015.1.1-1.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:1865