Description of problem: heat stack-delete fails with: ERROR: Stack overcloud already has an action (CREATE) in progress. Version-Release number of selected component (if applicable): openstack-heat-engine-2015.1.2-9 puddle 2016-02-12.1 How reproducible: Rare logs are here (you need lrzuntar to open): http://ikook.tlv.redhat.com/unrelated/files/report.tar.lrz Steps to Reproduce: (first was a configuration issue that resulted in a timeout) $ heat stack-list +--------------------------------------+------------+---------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+---------------+----------------------+ | 05cad2a7-582c-47e4-990d-6919a2a9394e | overcloud | CREATE_FAILED | 2016-02-14T12:06:22Z | +--------------------------------------+------------+---------------+----------------------+ $ heat stack-delete overcloud +--------------------------------------+------------+--------------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+--------------------+----------------------+ | 05cad2a7-582c-47e4-990d-6919a2a9394e | overcloud | DELETE_IN_PROGRESS | 2016-02-14T12:06:22Z | +--------------------------------------+------------+--------------------+----------------------+ $ heat stack-list +----+------------+--------------+---------------+ | id | stack_name | stack_status | creation_time | +----+------------+--------------+---------------+ +----+------------+--------------+---------------+ openstack overcloud deploy \ --templates \ -e ${THT}/environments/network-isolation-v6.yaml \ -e /home/stack/network-environment-v6.yaml \ -e /home/stack/ssl-heat-templates/environments/enable-tls.yaml \ -e /home/stack/ssl-heat-templates/environments/inject-trust-anchor.yaml \ --control-scale 3 \ --compute-scale 2 \ --ceph-storage-scale 1 \ --neutron-disable-tunneling \ --neutron-network-type vlan \ --neutron-network-vlan-ranges datacentre:50:60 \ --ntp-server clock.redhat.com \ --debug &> /home/stack/deployment.log <-- attached $ nova list +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ | e4951939-5141-464c-89b0-ac689b8a3616 | overcloud-cephstorage-0 | BUILD | spawning | NOSTATE | ctlplane=192.0.2.14 | | ee58e70a-00f4-44fd-85ca-4366cb58f3b4 | overcloud-compute-0 | BUILD | spawning | NOSTATE | ctlplane=192.0.2.18 | | 52f90063-2caf-4140-bea4-73291afdc0a5 | overcloud-compute-1 | BUILD | spawning | NOSTATE | ctlplane=192.0.2.16 | | ea04e681-e786-40e4-8fb9-4c3c7c06cde5 | overcloud-controller-0 | BUILD | spawning | NOSTATE | ctlplane=192.0.2.17 | | c2ed8f12-f3c5-407e-af4e-fe6245b7dcf4 | overcloud-controller-1 | BUILD | spawning | NOSTATE | ctlplane=192.0.2.15 | | 7387d5fb-2631-41ab-80f1-5c65860070ef | overcloud-controller-2 | BUILD | spawning | NOSTATE | ctlplane=192.0.2.19 | +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ After a while... $ ironic node-list +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ | UUID | Name | Instance UUID | Power State | Provision State | Maintenance | +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ | 30eb4f37-c59a-4fd7-9b6f-302f454cf511 | None | ea04e681-e786-40e4-8fb9-4c3c7c06cde5 | power off | deploy failed | False | | 616e603f-7ef5-41f6-b068-a3c0bc529c70 | None | c2ed8f12-f3c5-407e-af4e-fe6245b7dcf4 | power on | active | False | | 80d4f68d-5e85-4d44-881d-dc97961bfce9 | None | e4951939-5141-464c-89b0-ac689b8a3616 | power on | active | False | | 1e7325f0-d41d-4762-9946-2adf403fec5e | None | ee58e70a-00f4-44fd-85ca-4366cb58f3b4 | power on | active | False | | c1abe5f5-d3d5-41bc-bb79-27e701ba8ab0 | None | 52f90063-2caf-4140-bea4-73291afdc0a5 | power on | active | False | | cbba9d9d-73d2-44a2-8bda-30c0d5851c62 | None | None | power off | available | False | | eaf1bfe0-64f1-4811-906f-10af10b66866 | None | None | power off | available | False | | 69d710fe-3722-476c-b5e3-42c0a665e161 | None | 7387d5fb-2631-41ab-80f1-5c65860070ef | power on | active | False | +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ $ heat --debug stack-delete overcloud DEBUG (session) REQ: curl -g -i -X GET http://192.0.2.1:5000/v2.0 -H "Accept: application/json" -H "User-Agent: python-keystoneclient" INFO (connectionpool) Starting new HTTP connection (1): 192.0.2.1 DEBUG (connectionpool) "GET /v2.0 HTTP/1.1" 200 335 DEBUG (session) RESP: [200] content-length: 335 vary: X-Auth-Token connection: keep-alive date: Mon, 15 Feb 2016 11:22:42 GMT content-type: application/json x-openstack-request-id: req-b2f9026f-24f0-42f8-a129-77946aad4c9b RESP BODY: {"version": {"status": "stable", "updated": "2014-04-17T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://192.0.2.1:5000/v2.0/", "rel": "self"}, {"href": "http://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}} DEBUG (v2) Making authentication request to http://192.0.2.1:5000/v2.0/tokens DEBUG (connectionpool) "POST /v2.0/tokens HTTP/1.1" 200 3493 DEBUG (session) REQ: curl -g -i -X DELETE http://192.0.2.1:8004/v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud -H "User-Agent: python-heatclient" -H "Content-Type: application/octet-stream" -H "X-Auth-Url: http://192.0.2.1:5000/v2.0" -H "X-Auth-Token: {SHA1}c789e865d86eee83e7f528ae15f882232b2fc181" INFO (connectionpool) Starting new HTTP connection (1): 192.0.2.1 DEBUG (connectionpool) "DELETE /v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud HTTP/1.1" 302 405 DEBUG (session) RESP: [302] content-length: 405 connection: keep-alive location: http://192.0.2.1:8004/v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud/d9e1b7bb-a749-4085-a547-6a36556e9d29 date: Mon, 15 Feb 2016 11:22:43 GMT content-type: text/html; charset=UTF-8 x-openstack-request-id: req-2b7244e3-f525-4f4c-8efe-953c6afd4b07 RESP BODY: <html> <head> <title>302 Found</title> </head> <body> <h1>302 Found</h1> The resource was found at <a href="http://192.0.2.1:8004/v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud/d9e1b7bb-a749-4085-a547-6a36556e9d29">http://192.0.2.1:8004/v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud/d9e1b7bb-a749-4085-a547-6a36556e9d29</a>; you should be redirected automatically. </body> </html> DEBUG (connectionpool) "DELETE /v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud/d9e1b7bb-a749-4085-a547-6a36556e9d29 HTTP/1.1" 409 1098 DEBUG (session) RESP: Traceback (most recent call last): File "/bin/heat", line 10, in <module> sys.exit(main()) File "/usr/lib/python2.7/site-packages/heatclient/shell.py", line 705, in main HeatShell().main(args) File "/usr/lib/python2.7/site-packages/heatclient/shell.py", line 655, in main args.func(client, args) File "/usr/lib/python2.7/site-packages/heatclient/v1/shell.py", line 289, in do_stack_delete hc.stacks.delete(**fields) File "/usr/lib/python2.7/site-packages/heatclient/v1/stacks.py", line 146, in delete self._delete("/stacks/%s" % stack_id) File "/usr/lib/python2.7/site-packages/heatclient/openstack/common/apiclient/base.py", line 224, in _delete return self.client.delete(url) File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 300, in delete return self.raw_request("DELETE", url, **kwargs) File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 281, in raw_request return self._http_request(url, method, **kwargs) File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 363, in _http_request raise exc.from_response(resp) heatclient.exc.HTTPConflict: ERROR: Stack overcloud already has an action (CREATE) in progress. $ heat stack-list +--------------------------------------+------------+--------------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+--------------------+----------------------+ | d9e1b7bb-a749-4085-a547-6a36556e9d29 | overcloud | CREATE_IN_PROGRESS | 2016-02-15T10:47:38Z | +--------------------------------------+------------+--------------------+----------------------+
The stack was stuck at CREATE_IN_PROGRESS even after I nova deleted and ironic node-deleted all the nodes. only a reboot to the undercloud machine freed it.
Here's what the log shows: 2016-02-15 13:21:54.843 3538 INFO heat.engine.service [req-64025314-8e6d-4f53-bb 06-f7adb9a967ab 793c5d1bf48e4a20bb9172c5c8ebc765 f49fdcf77d814ed89d2c75b5f5c20bf 0] Deleting stack overcloud ... 2016-02-15 13:21:54.889 3538 DEBUG heat.engine.service [req-64025314-8e6d-4f53-bb06-f7adb9a967ab 793c5d1bf48e4a20bb9172c5c8ebc765 f49fdcf77d814ed89d2c75b5f5c20bf0] Successfully stopped remote task on engine e388c109-d2d7-4850-be82-ad54606134f0 delete_stack /usr/lib/python2.7/site-packages/heat/engine/service.py:952 ... 2016-02-15 13:21:54.937 3538 DEBUG heat.engine.stack_lock [req-64025314-8e6d-4f53-bb06-f7adb9a967ab 793c5d1bf48e4a20bb9172c5c8ebc765 f49fdcf77d814ed89d2c75b5f5c20bf0] Lock on stack d9e1b7bb-a749-4085-a547-6a36556e9d29 is owned by engine e388c109-d2d7-4850-be82-ad54606134f0 acquire /usr/lib/python2.7/site-packages/heat/engine/stack_lock.py:87 2016-02-15 13:21:54.938 3538 DEBUG oslo_messaging.rpc.dispatcher [req-64025314-8e6d-4f53-bb06-f7adb9a967ab 793c5d1bf48e4a20bb9172c5c8ebc765 f49fdcf77d814ed89d2c75b5f5c20bf0] Expected exception during message handling (Stack overcloud already has an action (CREATE) in progress.) _dispatch_and_reply /usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py:145 So it claims to have stopped the remote task, which should result in the lock being removed, but the lock still exists when we go to acquire it again in order to do the delete.
I have seen this error message upstream on at least one gate test: http://logs.openstack.org/31/278831/1/check/gate-heat-dsvm-functional-orig-mysql/908bbfd/console.html.gz The test was for a stable/liberty backport of code that only touched the update patch, not create or delete, so it wasn't caused by that particular patch (https://review.openstack.org/#/c/278831). That one looks like it may have been caused by a DB transaction rollback though, and there's nothing similar here.
OK, after reading the description and the log more carefully I think I see what happened here. At some point during the stack creation, Heat lost access to the database, causing a bunch of errors, some of the unexpected variety. Heat is now actually pretty good at cleaning up after an unexpected error - where cleaning up means releasing the lock (in the database) and setting the stack state back to FAILED (in the database). There's pretty much nothing it can do about not being able to access the database though. Anyway, once that has bombed out (and the database has started up again) we're left in the state where the resource is stuck IN_PROGRESS but there's no operation running on it, the lock is still present, and the engine that owns the lock is still alive. It's fairly easy to recover from this situation by just rebooting heat-engine - the engine owning the lock going down will make other engines willing to steal it again, and a new engine starting up will locate any IN_PROGRESS stacks with stealable locks and set them back to FAILED. I filed https://bugs.launchpad.net/heat/+bug/1555840 upstream, but I think it's safe to say that this is unlikely to get fixed in RHOS 7. The HA architecture should largely prevent this from happening on the overcloud, and it's also fairly unlikely to begin with.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.