Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1265676

Summary: Cannot signal resource during DELETE
Product: Red Hat OpenStack Reporter: Jan Provaznik <jprovazn>
Component: openstack-heatAssignee: Zane Bitter <zbitter>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: kbasil, mburns, ohochman, rhel-osp-director-maint, sasha, sbaker, shardy, yeylon
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-heat-2015.1.1-5.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-08 12:21:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Provaznik 2015-09-23 13:10:45 UTC
Description of problem:
If RHEL registration is used on Overcloud nodes, then nodes are being unregistered when being deleted. The problem is that heat resource representing unregistration is stuck in DELETE_IN_PROGRESS and never finishes (but logs on OC node say unregistration finished successfully and a signal was sent back to heat). If I try to signal the stuck resource manually I get an error in resources events:
| 1             | fe1abf6f-1cd7-4397-af8d-0700d66be3bd | Cannot signal resource during DELETE | DELETE_IN_PROGRESS | 2015-09-23T12:41:15Z |




Steps to Reproduce:
1. openstack overcloud deploy --templates --rhel-reg --reg-method portal --reg-org xxx --reg-activation-key 'key' --compute-scale 2
2. openstack overcloud node delete --templates --stack overcloud 86894343-a032-45f2-a619-122a23a3d5ba

Actual results:
stack is stuck in UPDATE_IN_PROGRESS:
| 9f94dcbb-d554-4143-b258-4d89230c0ea2 | overcloud  | UPDATE_IN_PROGRESS | 2015-09-23T11:32:43Z |

[stack@instack ~]$ heat resource-list -n 5 overcloud |grep -i PROGRE
| 1                                           | 88db3645-3c85-4b10-8387-a8df0e9336a1          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-09-23T11:57:35Z
 | RHELUnregistrationDeployment                |
| ComputeNodesPostDeployment                  | 40d50505-8b49-4c7a-9269-54104000a1c3          | OS::TripleO::ComputePostDeployment                | UPDATE_IN_PROGRESS | 2015-09-23T12:29:11Z
 |                                             |
| ExtraConfig                                 | 6e6b6f55-338b-4bae-9e43-37d9bb4ed790          | OS::TripleO::NodeExtraConfigPost                  | UPDATE_IN_PROGRESS | 2015-09-23T12:30:02Z
 | ComputeNodesPostDeployment                  |
| RHELUnregistrationDeployment                | 9ae1cde3-c0fc-426d-8464-fcf01c937eb5          | OS::Heat::StructuredDeployments                   | UPDATE_IN_PROGRESS | 2015-09-23T12:30:06Z
 | ExtraConfig                                 |
[stack@instack ~]$ heat deployment-show 9ae1cde3-c0fc-426d-8464-fcf01c937eb5
Deployment not found: 9ae1cde3-c0fc-426d-8464-fcf01c937eb5
[stack@instack ~]$ heat deployment-show 88db3645-3c85-4b10-8387-a8df0e9336a1
{
  "status": "IN_PROGRESS",
  "server_id": "86894343-a032-45f2-a619-122a23a3d5ba",
  "config_id": "a9584777-1b1f-4f84-a373-8c15bc3286da",
  "output_values": null,
  "creation_time": "2015-09-23T12:30:09Z",
  "input_values": {},
  "action": "DELETE",
  "status_reason": "Deploy data available",
  "id": "88db3645-3c85-4b10-8387-a8df0e9336a1"
}

[stack@instack ~]$ heat resource-signal 9ae1cde3-c0fc-426d-8464-fcf01c937eb5 1
[stack@instack ~]$ heat deployment-show 88db3645-3c85-4b10-8387-a8df0e9336a1
{
  "status": "IN_PROGRESS", 
  "server_id": "86894343-a032-45f2-a619-122a23a3d5ba", 
  "config_id": "a9584777-1b1f-4f84-a373-8c15bc3286da", 
  "output_values": null, 
  "creation_time": "2015-09-23T12:30:09Z", 
  "input_values": {}, 
  "action": "DELETE",
  "status_reason": "Deploy data available",
  "id": "88db3645-3c85-4b10-8387-a8df0e9336a1"
}



[stack@instack ~]$ heat event-list -r 1 9ae1cde3-c0fc-426d-8464-fcf01c937eb5
+---------------+--------------------------------------+--------------------------------------+--------------------+----------------------+
| resource_name | id                                   | resource_status_reason               | resource_status    | event_time           |
+---------------+--------------------------------------+--------------------------------------+--------------------+----------------------+
| 1             | 7dc9a75c-960d-4137-8fee-712b17b05caa | state changed                        | CREATE_IN_PROGRESS | 2015-09-23T11:57:35Z |
| 1             | e19fe81b-fd50-4a26-9065-c9ecd1a83c1f | state changed                        | CREATE_COMPLETE    | 2015-09-23T11:57:37Z |
| 1             | 33c20b17-7886-4afc-a144-8facf8ded393 | state changed                        | DELETE_IN_PROGRESS | 2015-09-23T12:30:09Z |
| 1             | fe1abf6f-1cd7-4397-af8d-0700d66be3bd | Cannot signal resource during DELETE | DELETE_IN_PROGRESS | 2015-09-23T12:41:15Z |
+---------------+--------------------------------------+--------------------------------------+--------------------+----------------------+

Expected results:
update is completed

Additional info:
openstack-heat-engine-2015.1.1-4.el7ost.noarch

Comment 2 Steven Hardy 2015-09-23 13:21:24 UTC
So, this was previously fixed via:

https://bugs.launchpad.net/heat/+bug/1444087

So, this must've worked at one point on kilo.

I've also proven it works w/trunk liberty RDO:

http://paste.openstack.org/show/473762/

So, I'm guessing we've backported a regression, but I don't yet know what.

Comment 3 Steven Hardy 2015-09-23 13:24:04 UTC
Note, for the minimal reproducer in the example above to work on kilo, you either need these two patches:

https://review.openstack.org/#/c/199652/

https://review.openstack.org/#/c/221656

Or, you need to pass an actual server reference in via the servers property ;)

Comment 4 Jan Provaznik 2015-09-23 15:43:08 UTC
I tried patches above:
https://review.openstack.org/#/c/199652/
https://review.openstack.org/#/c/221656

and also:
https://review.openstack.org/#/c/225537/

I can already manually signal unregistration deployment, and after this manual signal, stack-update is completed. But without signalling resource explicitly stack is stuck on this resource:

[stack@instack ~]$ heat resource-list -n 5 overcloud |grep PROG
| 1                                           | a2884c7d-1c02-4f5e-83ec-bbfa97891852          | OS::Heat::StructuredDeployment                    | DELETE_IN_PROGRESS | 2015-09-23T15:22:09Z
 | RHELUnregistrationDeployment                |
| ComputeNodesPostDeployment                  | ea6fbd88-cc95-4230-9233-3f2b9c3a3b34          | OS::TripleO::ComputePostDeployment                | UPDATE_IN_PROGRESS | 2015-09-23T15:30:43Z
 |                                             |
| ExtraConfig                                 | dfcf7122-64c2-46d1-93d0-dff65880dc21          | OS::TripleO::NodeExtraConfigPost                  | UPDATE_IN_PROGRESS | 2015-09-23T15:31:31Z
 | ComputeNodesPostDeployment                  |
| RHELUnregistrationDeployment                | 6b9e3d82-28a8-4972-8220-3df567310770          | OS::Heat::StructuredDeployments                   | UPDATE_IN_PROGRESS | 2015-09-23T15:31:35Z
 | ExtraConfig                                 |


So signal itself is not only problem. Important note might be that the node (nova instance) is actually properly deleted - instance doesn't exist in time when it's waiting in DELETE_IN_PROGRESS (AIUI it should not - if unregistration fails, instance should not be deleted) - maybe wrong dependencies between defined resources in THT?

Comment 5 Zane Bitter 2015-09-23 17:09:37 UTC
Jan found https://review.openstack.org/#/c/225537/ (https://bugs.launchpad.net/heat/+bug/1458095), which seems almost certain to be the fix.

That would make the cause https://review.openstack.org/#/c/166914/ (the commit message states "All resources, which have handle_signal, check condition if action in DELETE or SUSPEND state." - which is demonstrably incorrect: https://review.openstack.org/gitweb?p=openstack/heat.git;a=blob;f=heat/engine/resources/openstack/heat/software_deployment.py;h=590047ddd5cb98b993e5d1f4286ef85f2ac23e8e;hb=664f70383bfdd704322e136de226e9627c48d1a7#l533 - and it also unintentionally changes the logic with respect to hooks so it wasn't a great patch all round), and that was actually merged before the fix for https://bugs.launchpad.net/heat/+bug/1444087 so it's likely that this has never worked on kilo.

Comment 8 Alexander Chuzhoy 2015-10-02 20:55:28 UTC
Verified:

Environment:
openstack-heat-common-2015.1.1-5.el7ost.noarch

Deployed while registering the nodes via portal (with -rhel-reg --reg-method portal).

Deleted the node successfully with: openstack overcloud node delete --templates --stack overcloud <UUID>

Also verified that the node was unregistered from rhn.

Comment 10 errata-xmlrpc 2015-10-08 12:21:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1865