Bug 1244485 - Heat stack partially deleted, it shows in stack-list but isn't found when you do resource-list
Summary: Heat stack partially deleted, it shows in stack-list but isn't found when you...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: Director
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: z2
: 7.0 (Kilo)
Assignee: Steve Baker
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-19 07:18 UTC by Udi Kalifon
Modified: 2023-02-22 23:02 UTC (History)
5 users (show)

Fixed In Version: openstack-heat-2015.1.1-1.el7ost
Doc Type: Bug Fix
Doc Text:
If a resource did not have an associated nested stack, a NotFound exception would be raised. Whenever this occurred, the 'heat resource-list' incorrectly associated the NotFound exception with the root stack rather than the nested stack. With this release, the internal API contract now correctly returns 'None' when there are missing nested stacks. As such, 'heat resource-list' now works as expected.
Clone Of:
Environment:
Last Closed: 2015-10-08 12:20:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1476834 0 None None None Never
Launchpad 1479579 0 None None None Never
Red Hat Product Errata RHBA-2015:1865 0 normal SHIPPED_LIVE openstack-heat bug fix advisory 2015-10-08 16:06:10 UTC

Description Udi Kalifon 2015-07-19 07:18:29 UTC
Description of problem:
I wanted to delete a stack, which was in CREATE_FAILED due to bad configuration of network isolation (we attempted to deploy without vlans, only by overloading IPs on the same interface, but os-config doesn't support that). 

So I started with stack-delete and got a DELETE_FAILED (that happens almost every time even when deleting perfectly healthy stacks). I ran "nova list" to see if any nodes are in error - and indeed one node was in error state (the other 3 nodes were still in ACTIVE state). I deleted the error node with "nova delete" and ran stack-delete again.

The 2nd stack-delete also failed and "nova list" still showed the remaining nodes in ACTIVE. To find out which resource is in error and preventing the stack from deleting I tried "heat resource-list" - and if complained that the stack is not found!

I tried to list the resources of the stack several times, using the stack name and also trying with the stack's uuid - but when doing resource-list the stack is not found. So I tried it with --debug and got:


[stack@puma01 ~]$ heat --debug resource-list overcloud
DEBUG (session) REQ: curl -g -i -X GET http://192.0.2.1:5000/v2.0 -H "Accept: application/json" -H "User-Agent: python-keystoneclient"
INFO (connectionpool) Starting new HTTP connection (1): 192.0.2.1
DEBUG (connectionpool) "GET /v2.0 HTTP/1.1" 200 335
DEBUG (session) RESP: [200] content-length: 335 vary: X-Auth-Token connection: keep-alive date: Sun, 19 Jul 2015 06:22:50 GMT content-type: application/json x-openstack-request-id: req-dd84bdd4-d189-4ed9-9a62-dcb3a85d8b63 
RESP BODY: {"version": {"status": "stable", "updated": "2014-04-17T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://192.0.2.1:5000/v2.0/", "rel": "self"}, {"href": "http://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}}

DEBUG (v2) Making authentication request to http://192.0.2.1:5000/v2.0/tokens
DEBUG (connectionpool) "POST /v2.0/tokens HTTP/1.1" 200 3493
DEBUG (session) REQ: curl -g -i -X GET http://192.0.2.1:8004/v1/97cdf91392364c69b473b4d2c63f971b/stacks/overcloud/resources -H "User-Agent: python-heatclient" -H "Content-Type: application/json" -H "X-Auth-Url: http://192.0.2.1:5000/v2.0" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}afe9cc4604b09047df055dceffddb07c1c5ad2c1"
INFO (connectionpool) Starting new HTTP connection (1): 192.0.2.1
DEBUG (connectionpool) "GET /v1/97cdf91392364c69b473b4d2c63f971b/stacks/overcloud/resources HTTP/1.1" 302 201
DEBUG (session) RESP: [302] content-length: 201 connection: keep-alive location: http://192.0.2.1:8004/v1/97cdf91392364c69b473b4d2c63f971b/stacks/overcloud/4c5abe97-c4fe-4ed7-b963-7877b1974f06/resources date: Sun, 19 Jul 2015 06:22:50 GMT content-type: text/plain; charset=UTF-8 x-openstack-request-id: req-6fb352b5-4f90-45fe-899c-6591d444bbcf 
RESP BODY: 302 Found

The resource was found at http://192.0.2.1:8004/v1/97cdf91392364c69b473b4d2c63f971b/stacks/overcloud/4c5abe97-c4fe-4ed7-b963-7877b1974f06/resources; you should be redirected automatically.  

DEBUG (connectionpool) "GET /v1/97cdf91392364c69b473b4d2c63f971b/stacks/overcloud/4c5abe97-c4fe-4ed7-b963-7877b1974f06/resources HTTP/1.1" 404 970
DEBUG (session) RESP:

Traceback (most recent call last):
  File "/usr/bin/heat", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/site-packages/heatclient/shell.py", line 705, in main
    HeatShell().main(args)
  File "/usr/lib/python2.7/site-packages/heatclient/shell.py", line 655, in main
    args.func(client, args)
  File "/usr/lib/python2.7/site-packages/heatclient/v1/shell.py", line 719, in do_resource_list
    raise exc.CommandError(_('Stack not found: %s') % args.id)
heatclient.exc.CommandError: Stack not found: overcloud


Version-Release number of selected component (if applicable):
openstack-heat-api-2015.1.0-4.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-heat-common-2015.1.0-4.el7ost.noarch
python-heatclient-0.6.0-1.el7ost.noarch
openstack-heat-engine-2015.1.0-4.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-35.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch


How reproducible:
may never be reproduced again :(


Steps to Reproduce:
1. Delete a stack. If it fails - delete the error nodes from nova and try the stack-delete again. If it fails again - see if you see resources in error when you run resource-list.


Actual results:
The stack is not found when you run resource-list, but still exists (in DELETE_FAILED state) when you run stack-list.


Expected results:
When I delete a stack it should get deleted without me having to help it resource-by-resource.

Comment 3 Steve Baker 2015-07-20 03:59:08 UTC
I've seen this and have been trying to write some standalone templates to reproduce.

Here is what I have learnt so far.

1. The overcloud stack somehow gets itself into a state where a ResourceGroup has resources defined in its template, but no corresponding Resource database records.
2. Calling heat resource-list overcloud is triggering a NotFound on listing resources in the nested stack (the db api raises NotFound for a zero resource count, which is stupid but should be caught in any code calling it.)
3. This NotFound exception is floating all the way up and being seen as an overcloud stack NotFound, which it isn't.

I tried replicating this by writing a simple template which contained a ResourceGroup of things, then deleting the Resource database records, but it didn't trigger the issue.

I will continue to look for a simple way to replicate this so a bug can be raised upstream.

Comment 4 Zane Bitter 2015-07-20 15:19:35 UTC
Note that the issue of Heat failing to delete the servers from Nova on the first attempt is covered by bug 1242796 - we believe there's an upstream fix available for that part.

Comment 5 Steve Baker 2015-07-21 05:18:17 UTC
I've found an easy way to reproduce this. 

- skip baremetal introspection steps 
- deploy overcloud
- wait for deploy to fail because servers will go to error
- heat stack-delete overcloud
- wait for delete to fail
- heat resource-list overcloud

Comment 9 Udi Kalifon 2015-09-10 08:11:15 UTC
Verified in:
openstack-heat-common-2015.1.1-1.el7ost.noarch
openstack-heat-api-cfn-2015.1.1-1.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.1-1.el7ost.noarch
openstack-heat-api-2015.1.1-1.el7ost.noarch
openstack-heat-engine-2015.1.1-1.el7ost.noarch

Comment 11 errata-xmlrpc 2015-10-08 12:20:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1865


Note You need to log in before you can comment on or make changes to this bug.