Bug 1308517 - heat stack-delete fails with already has an action (CREATE) in progress
heat stack-delete fails with already has an action (CREATE) in progress
Status: CLOSED WONTFIX
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat (Show other bugs)
7.0 (Kilo)
x86_64 Linux
unspecified Severity urgent
: ---
: 7.0 (Kilo)
Assigned To: Zane Bitter
Amit Ugol
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-15 07:52 EST by Amit Ugol
Modified: 2016-04-26 15:44 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-10 17:55:57 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1555840 None None None 2016-03-10 17:43 EST

  None (edit)
Description Amit Ugol 2016-02-15 07:52:40 EST
Description of problem:
heat stack-delete fails with:
ERROR: Stack overcloud already has an action (CREATE) in progress.

Version-Release number of selected component (if applicable):
openstack-heat-engine-2015.1.2-9
puddle 2016-02-12.1


How reproducible:
Rare

logs are here (you need lrzuntar to open):

http://ikook.tlv.redhat.com/unrelated/files/report.tar.lrz

Steps to Reproduce:
(first was a configuration issue that resulted in a timeout)
$ heat stack-list 
+--------------------------------------+------------+---------------+----------------------+
| id                                   | stack_name | stack_status  | creation_time        |
+--------------------------------------+------------+---------------+----------------------+
| 05cad2a7-582c-47e4-990d-6919a2a9394e | overcloud  | CREATE_FAILED | 2016-02-14T12:06:22Z |
+--------------------------------------+------------+---------------+----------------------+




$ heat stack-delete overcloud
+--------------------------------------+------------+--------------------+----------------------+
| id                                   | stack_name | stack_status       | creation_time        |
+--------------------------------------+------------+--------------------+----------------------+
| 05cad2a7-582c-47e4-990d-6919a2a9394e | overcloud  | DELETE_IN_PROGRESS | 2016-02-14T12:06:22Z |
+--------------------------------------+------------+--------------------+----------------------+




$ heat stack-list
+----+------------+--------------+---------------+
| id | stack_name | stack_status | creation_time |
+----+------------+--------------+---------------+
+----+------------+--------------+---------------+




openstack overcloud deploy \
    --templates \
    -e ${THT}/environments/network-isolation-v6.yaml \
    -e /home/stack/network-environment-v6.yaml \
    -e /home/stack/ssl-heat-templates/environments/enable-tls.yaml \
    -e /home/stack/ssl-heat-templates/environments/inject-trust-anchor.yaml \
    --control-scale 3 \
    --compute-scale 2 \
    --ceph-storage-scale 1 \
    --neutron-disable-tunneling \
    --neutron-network-type vlan \
    --neutron-network-vlan-ranges datacentre:50:60 \
    --ntp-server clock.redhat.com \
    --debug &> /home/stack/deployment.log  <-- attached




$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks            |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| e4951939-5141-464c-89b0-ac689b8a3616 | overcloud-cephstorage-0 | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.14 |
| ee58e70a-00f4-44fd-85ca-4366cb58f3b4 | overcloud-compute-0     | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.18 |
| 52f90063-2caf-4140-bea4-73291afdc0a5 | overcloud-compute-1     | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.16 |
| ea04e681-e786-40e4-8fb9-4c3c7c06cde5 | overcloud-controller-0  | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.17 |
| c2ed8f12-f3c5-407e-af4e-fe6245b7dcf4 | overcloud-controller-1  | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.15 |
| 7387d5fb-2631-41ab-80f1-5c65860070ef | overcloud-controller-2  | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.19 |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+


After a while...


$ ironic node-list
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| UUID                                 | Name | Instance UUID                        | Power State | Provision State | Maintenance |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| 30eb4f37-c59a-4fd7-9b6f-302f454cf511 | None | ea04e681-e786-40e4-8fb9-4c3c7c06cde5 | power off   | deploy failed   | False       |
| 616e603f-7ef5-41f6-b068-a3c0bc529c70 | None | c2ed8f12-f3c5-407e-af4e-fe6245b7dcf4 | power on    | active          | False       |
| 80d4f68d-5e85-4d44-881d-dc97961bfce9 | None | e4951939-5141-464c-89b0-ac689b8a3616 | power on    | active          | False       |
| 1e7325f0-d41d-4762-9946-2adf403fec5e | None | ee58e70a-00f4-44fd-85ca-4366cb58f3b4 | power on    | active          | False       |
| c1abe5f5-d3d5-41bc-bb79-27e701ba8ab0 | None | 52f90063-2caf-4140-bea4-73291afdc0a5 | power on    | active          | False       |
| cbba9d9d-73d2-44a2-8bda-30c0d5851c62 | None | None                                 | power off   | available       | False       |
| eaf1bfe0-64f1-4811-906f-10af10b66866 | None | None                                 | power off   | available       | False       |
| 69d710fe-3722-476c-b5e3-42c0a665e161 | None | 7387d5fb-2631-41ab-80f1-5c65860070ef | power on    | active          | False       |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+





$ heat --debug stack-delete overcloud
DEBUG (session) REQ: curl -g -i -X GET http://192.0.2.1:5000/v2.0 -H "Accept: application/json" -H "User-Agent: python-keystoneclient"
INFO (connectionpool) Starting new HTTP connection (1): 192.0.2.1
DEBUG (connectionpool) "GET /v2.0 HTTP/1.1" 200 335
DEBUG (session) RESP: [200] content-length: 335 vary: X-Auth-Token connection: keep-alive date: Mon, 15 Feb 2016 11:22:42 GMT content-type: application/json x-openstack-request-id: req-b2f9026f-24f0-42f8-a129-77946aad4c9b 
RESP BODY: {"version": {"status": "stable", "updated": "2014-04-17T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://192.0.2.1:5000/v2.0/", "rel": "self"}, {"href": "http://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}}

DEBUG (v2) Making authentication request to http://192.0.2.1:5000/v2.0/tokens
DEBUG (connectionpool) "POST /v2.0/tokens HTTP/1.1" 200 3493
DEBUG (session) REQ: curl -g -i -X DELETE http://192.0.2.1:8004/v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud -H "User-Agent: python-heatclient" -H "Content-Type: application/octet-stream" -H "X-Auth-Url: http://192.0.2.1:5000/v2.0" -H "X-Auth-Token: {SHA1}c789e865d86eee83e7f528ae15f882232b2fc181"
INFO (connectionpool) Starting new HTTP connection (1): 192.0.2.1
DEBUG (connectionpool) "DELETE /v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud HTTP/1.1" 302 405
DEBUG (session) RESP: [302] content-length: 405 connection: keep-alive location: http://192.0.2.1:8004/v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud/d9e1b7bb-a749-4085-a547-6a36556e9d29 date: Mon, 15 Feb 2016 11:22:43 GMT content-type: text/html; charset=UTF-8 x-openstack-request-id: req-2b7244e3-f525-4f4c-8efe-953c6afd4b07 
RESP BODY: <html>
 <head>
  <title>302 Found</title>
 </head>
 <body>
  <h1>302 Found</h1>
  The resource was found at <a href="http://192.0.2.1:8004/v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud/d9e1b7bb-a749-4085-a547-6a36556e9d29">http://192.0.2.1:8004/v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud/d9e1b7bb-a749-4085-a547-6a36556e9d29</a>;
you should be redirected automatically.


 </body>
</html>

DEBUG (connectionpool) "DELETE /v1/f49fdcf77d814ed89d2c75b5f5c20bf0/stacks/overcloud/d9e1b7bb-a749-4085-a547-6a36556e9d29 HTTP/1.1" 409 1098
DEBUG (session) RESP:
Traceback (most recent call last):
  File "/bin/heat", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/site-packages/heatclient/shell.py", line 705, in main
    HeatShell().main(args)
  File "/usr/lib/python2.7/site-packages/heatclient/shell.py", line 655, in main
    args.func(client, args)
  File "/usr/lib/python2.7/site-packages/heatclient/v1/shell.py", line 289, in do_stack_delete
    hc.stacks.delete(**fields)
  File "/usr/lib/python2.7/site-packages/heatclient/v1/stacks.py", line 146, in delete
    self._delete("/stacks/%s" % stack_id)
  File "/usr/lib/python2.7/site-packages/heatclient/openstack/common/apiclient/base.py", line 224, in _delete
    return self.client.delete(url)
  File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 300, in delete
    return self.raw_request("DELETE", url, **kwargs)
  File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 281, in raw_request
    return self._http_request(url, method, **kwargs)
  File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 363, in _http_request
    raise exc.from_response(resp)
heatclient.exc.HTTPConflict: ERROR: Stack overcloud already has an action (CREATE) in progress.





$ heat stack-list
+--------------------------------------+------------+--------------------+----------------------+
| id                                   | stack_name | stack_status       | creation_time        |
+--------------------------------------+------------+--------------------+----------------------+
| d9e1b7bb-a749-4085-a547-6a36556e9d29 | overcloud  | CREATE_IN_PROGRESS | 2016-02-15T10:47:38Z |
+--------------------------------------+------------+--------------------+----------------------+
Comment 1 Amit Ugol 2016-02-15 08:13:01 EST
The stack was stuck at CREATE_IN_PROGRESS even after I nova deleted and ironic node-deleted all the nodes. only a reboot to the undercloud machine freed it.
Comment 2 Zane Bitter 2016-02-15 12:59:39 EST
Here's what the log shows:

2016-02-15 13:21:54.843 3538 INFO heat.engine.service [req-64025314-8e6d-4f53-bb
06-f7adb9a967ab 793c5d1bf48e4a20bb9172c5c8ebc765 f49fdcf77d814ed89d2c75b5f5c20bf
0] Deleting stack overcloud
...
2016-02-15 13:21:54.889 3538 DEBUG heat.engine.service [req-64025314-8e6d-4f53-bb06-f7adb9a967ab 793c5d1bf48e4a20bb9172c5c8ebc765 f49fdcf77d814ed89d2c75b5f5c20bf0] Successfully stopped remote task on engine e388c109-d2d7-4850-be82-ad54606134f0 delete_stack /usr/lib/python2.7/site-packages/heat/engine/service.py:952
...
2016-02-15 13:21:54.937 3538 DEBUG heat.engine.stack_lock [req-64025314-8e6d-4f53-bb06-f7adb9a967ab 793c5d1bf48e4a20bb9172c5c8ebc765 f49fdcf77d814ed89d2c75b5f5c20bf0] Lock on stack d9e1b7bb-a749-4085-a547-6a36556e9d29 is owned by engine e388c109-d2d7-4850-be82-ad54606134f0 acquire /usr/lib/python2.7/site-packages/heat/engine/stack_lock.py:87
2016-02-15 13:21:54.938 3538 DEBUG oslo_messaging.rpc.dispatcher [req-64025314-8e6d-4f53-bb06-f7adb9a967ab 793c5d1bf48e4a20bb9172c5c8ebc765 f49fdcf77d814ed89d2c75b5f5c20bf0] Expected exception during message handling (Stack overcloud already has an action (CREATE) in progress.) _dispatch_and_reply /usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py:145

So it claims to have stopped the remote task, which should result in the lock being removed, but the lock still exists when we go to acquire it again in order to do the delete.
Comment 3 Zane Bitter 2016-02-17 18:33:36 EST
I have seen this error message upstream on at least one gate test:

http://logs.openstack.org/31/278831/1/check/gate-heat-dsvm-functional-orig-mysql/908bbfd/console.html.gz

The test was for a stable/liberty backport of code that only touched the update patch, not create or delete, so it wasn't caused by that particular patch (https://review.openstack.org/#/c/278831).

That one looks like it may have been caused by a DB transaction rollback though, and there's nothing similar here.
Comment 4 Zane Bitter 2016-03-10 17:43:49 EST
OK, after reading the description and the log more carefully I think I see what happened here. At some point during the stack creation, Heat lost access to the database, causing a bunch of errors, some of the unexpected variety. Heat is now actually pretty good at cleaning up after an unexpected error - where cleaning up means releasing the lock (in the database) and setting the stack state back to FAILED (in the database). There's pretty much nothing it can do about not being able to access the database though.

Anyway, once that has bombed out (and the database has started up again) we're left in the state where the resource is stuck IN_PROGRESS but there's no operation running on it, the lock is still present, and the engine that owns the lock is still alive. It's fairly easy to recover from this situation by just rebooting heat-engine - the engine owning the lock going down will make other engines willing to steal it again, and a new engine starting up will locate any IN_PROGRESS stacks with stealable locks and set them back to FAILED.

I filed https://bugs.launchpad.net/heat/+bug/1555840 upstream, but I think it's safe to say that this is unlikely to get fixed in RHOS 7. The HA architecture should largely prevent this from happening on the overcloud, and it's also fairly unlikely to begin with.
Comment 5 RHEL Product and Program Management 2016-03-10 17:55:57 EST
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.