Bug 1265676 - Cannot signal resource during DELETE
Cannot signal resource during DELETE
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
urgent Severity unspecified
: z2
: 7.0 (Kilo)
Assigned To: Zane Bitter
Alexander Chuzhoy
: Triaged, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-23 09:10 EDT by Jan Provaznik
Modified: 2016-04-26 13:37 EDT (History)
9 users (show)

See Also:
Fixed In Version: openstack-heat-2015.1.1-5.el7ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-08 08:21:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jan Provaznik 2015-09-23 09:10:45 EDT
Description of problem:
If RHEL registration is used on Overcloud nodes, then nodes are being unregistered when being deleted. The problem is that heat resource representing unregistration is stuck in DELETE_IN_PROGRESS and never finishes (but logs on OC node say unregistration finished successfully and a signal was sent back to heat). If I try to signal the stuck resource manually I get an error in resources events:
| 1             | fe1abf6f-1cd7-4397-af8d-0700d66be3bd | Cannot signal resource during DELETE | DELETE_IN_PROGRESS | 2015-09-23T12:41:15Z |




Steps to Reproduce:
1. openstack overcloud deploy --templates --rhel-reg --reg-method portal --reg-org xxx --reg-activation-key 'key' --compute-scale 2
2. openstack overcloud node delete --templates --stack overcloud 86894343-a032-45f2-a619-122a23a3d5ba

Actual results:
stack is stuck in UPDATE_IN_PROGRESS:
| 9f94dcbb-d554-4143-b258-4d89230c0ea2 | overcloud  | UPDATE_IN_PROGRESS | 2015-09-23T11:32:43Z |

[stack@instack ~]$ heat resource-list -n 5 overcloud |grep -i PROGRE
| 1                                           | 88db3645-3c85-4b10-8387-a8df0e9336a1          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-09-23T11:57:35Z
 | RHELUnregistrationDeployment                |
| ComputeNodesPostDeployment                  | 40d50505-8b49-4c7a-9269-54104000a1c3          | OS::TripleO::ComputePostDeployment                | UPDATE_IN_PROGRESS | 2015-09-23T12:29:11Z
 |                                             |
| ExtraConfig                                 | 6e6b6f55-338b-4bae-9e43-37d9bb4ed790          | OS::TripleO::NodeExtraConfigPost                  | UPDATE_IN_PROGRESS | 2015-09-23T12:30:02Z
 | ComputeNodesPostDeployment                  |
| RHELUnregistrationDeployment                | 9ae1cde3-c0fc-426d-8464-fcf01c937eb5          | OS::Heat::StructuredDeployments                   | UPDATE_IN_PROGRESS | 2015-09-23T12:30:06Z
 | ExtraConfig                                 |
[stack@instack ~]$ heat deployment-show 9ae1cde3-c0fc-426d-8464-fcf01c937eb5
Deployment not found: 9ae1cde3-c0fc-426d-8464-fcf01c937eb5
[stack@instack ~]$ heat deployment-show 88db3645-3c85-4b10-8387-a8df0e9336a1
{
  "status": "IN_PROGRESS",
  "server_id": "86894343-a032-45f2-a619-122a23a3d5ba",
  "config_id": "a9584777-1b1f-4f84-a373-8c15bc3286da",
  "output_values": null,
  "creation_time": "2015-09-23T12:30:09Z",
  "input_values": {},
  "action": "DELETE",
  "status_reason": "Deploy data available",
  "id": "88db3645-3c85-4b10-8387-a8df0e9336a1"
}

[stack@instack ~]$ heat resource-signal 9ae1cde3-c0fc-426d-8464-fcf01c937eb5 1
[stack@instack ~]$ heat deployment-show 88db3645-3c85-4b10-8387-a8df0e9336a1
{
  "status": "IN_PROGRESS", 
  "server_id": "86894343-a032-45f2-a619-122a23a3d5ba", 
  "config_id": "a9584777-1b1f-4f84-a373-8c15bc3286da", 
  "output_values": null, 
  "creation_time": "2015-09-23T12:30:09Z", 
  "input_values": {}, 
  "action": "DELETE",
  "status_reason": "Deploy data available",
  "id": "88db3645-3c85-4b10-8387-a8df0e9336a1"
}



[stack@instack ~]$ heat event-list -r 1 9ae1cde3-c0fc-426d-8464-fcf01c937eb5
+---------------+--------------------------------------+--------------------------------------+--------------------+----------------------+
| resource_name | id                                   | resource_status_reason               | resource_status    | event_time           |
+---------------+--------------------------------------+--------------------------------------+--------------------+----------------------+
| 1             | 7dc9a75c-960d-4137-8fee-712b17b05caa | state changed                        | CREATE_IN_PROGRESS | 2015-09-23T11:57:35Z |
| 1             | e19fe81b-fd50-4a26-9065-c9ecd1a83c1f | state changed                        | CREATE_COMPLETE    | 2015-09-23T11:57:37Z |
| 1             | 33c20b17-7886-4afc-a144-8facf8ded393 | state changed                        | DELETE_IN_PROGRESS | 2015-09-23T12:30:09Z |
| 1             | fe1abf6f-1cd7-4397-af8d-0700d66be3bd | Cannot signal resource during DELETE | DELETE_IN_PROGRESS | 2015-09-23T12:41:15Z |
+---------------+--------------------------------------+--------------------------------------+--------------------+----------------------+

Expected results:
update is completed

Additional info:
openstack-heat-engine-2015.1.1-4.el7ost.noarch
Comment 2 Steven Hardy 2015-09-23 09:21:24 EDT
So, this was previously fixed via:

https://bugs.launchpad.net/heat/+bug/1444087

So, this must've worked at one point on kilo.

I've also proven it works w/trunk liberty RDO:

http://paste.openstack.org/show/473762/

So, I'm guessing we've backported a regression, but I don't yet know what.
Comment 3 Steven Hardy 2015-09-23 09:24:04 EDT
Note, for the minimal reproducer in the example above to work on kilo, you either need these two patches:

https://review.openstack.org/#/c/199652/

https://review.openstack.org/#/c/221656

Or, you need to pass an actual server reference in via the servers property ;)
Comment 4 Jan Provaznik 2015-09-23 11:43:08 EDT
I tried patches above:
https://review.openstack.org/#/c/199652/
https://review.openstack.org/#/c/221656

and also:
https://review.openstack.org/#/c/225537/

I can already manually signal unregistration deployment, and after this manual signal, stack-update is completed. But without signalling resource explicitly stack is stuck on this resource:

[stack@instack ~]$ heat resource-list -n 5 overcloud |grep PROG
| 1                                           | a2884c7d-1c02-4f5e-83ec-bbfa97891852          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-09-23T15:22:09Z
 | RHELUnregistrationDeployment                |
| ComputeNodesPostDeployment                  | ea6fbd88-cc95-4230-9233-3f2b9c3a3b34          | OS::TripleO::ComputePostDeployment                | UPDATE_IN_PROGRESS | 2015-09-23T15:30:43Z
 |                                             |
| ExtraConfig                                 | dfcf7122-64c2-46d1-93d0-dff65880dc21          | OS::TripleO::NodeExtraConfigPost                  | UPDATE_IN_PROGRESS | 2015-09-23T15:31:31Z
 | ComputeNodesPostDeployment                  |
| RHELUnregistrationDeployment                | 6b9e3d82-28a8-4972-8220-3df567310770          | OS::Heat::StructuredDeployments                   | UPDATE_IN_PROGRESS | 2015-09-23T15:31:35Z
 | ExtraConfig                                 |


So signal itself is not only problem. Important note might be that the node (nova instance) is actually properly deleted - instance doesn't exist in time when it's waiting in DELETE_IN_PROGRESS (AIUI it should not - if unregistration fails, instance should not be deleted) - maybe wrong dependencies between defined resources in THT?
Comment 5 Zane Bitter 2015-09-23 13:09:37 EDT
Jan found https://review.openstack.org/#/c/225537/ (https://bugs.launchpad.net/heat/+bug/1458095), which seems almost certain to be the fix.

That would make the cause https://review.openstack.org/#/c/166914/ (the commit message states "All resources, which have handle_signal, check condition if action in DELETE or SUSPEND state." - which is demonstrably incorrect: https://review.openstack.org/gitweb?p=openstack/heat.git;a=blob;f=heat/engine/resources/openstack/heat/software_deployment.py;h=590047ddd5cb98b993e5d1f4286ef85f2ac23e8e;hb=664f70383bfdd704322e136de226e9627c48d1a7#l533 - and it also unintentionally changes the logic with respect to hooks so it wasn't a great patch all round), and that was actually merged before the fix for https://bugs.launchpad.net/heat/+bug/1444087 so it's likely that this has never worked on kilo.
Comment 8 Alexander Chuzhoy 2015-10-02 16:55:28 EDT
Verified:

Environment:
openstack-heat-common-2015.1.1-5.el7ost.noarch

Deployed while registering the nodes via portal (with -rhel-reg --reg-method portal).

Deleted the node successfully with: openstack overcloud node delete --templates --stack overcloud <UUID>

Also verified that the node was unregistered from rhn.
Comment 10 errata-xmlrpc 2015-10-08 08:21:13 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1865

Note You need to log in before you can comment on or make changes to this bug.