Bug 1293421

Summary: Uncaught exceptions can leave stacks hanging UPDATE_IN_PROGRESS
Product: Red Hat OpenStack Reporter: Jon Schlueter <jschluet>
Component: openstack-heatAssignee: Zane Bitter <zbitter>
Status: CLOSED CURRENTRELEASE QA Contact: Amit Ugol <augol>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: augol, jcoufal, jstransk, kbasil, mburns, mcornea, rhel-osp-director-maint, rybrown, sbaker, sclewis, shardy, yeylon
Target Milestone: ga   
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1280094 Environment:
Last Closed: 2016-01-06 00:03:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Zane Bitter 2016-01-06 00:03:33 UTC
So the remaining issue here was this:


The end result was an update failed and the recourse list showed this:

]$ heat resource-list -n 3 overcloud | grep -v COMPLETE
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status    | updated_time         | parent_resource                             |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
| 0                                           | f4b63ac6-5461-4bad-a352-637e6ab816ed          | OS::TripleO::Controller                           | UPDATE_IN_PROGRESS | 2015-12-09T09:59:23Z | Controller                                  |
| Compute                                     | 03c29db3-2ada-446e-9636-c41d1aa60fc9          | OS::Heat::ResourceGroup                           | UPDATE_FAILED      | 2015-12-09T09:59:24Z |                                             |
| 0                                           | ee4b419e-512e-481e-8f84-1f114511a7b7          | OS::TripleO::Compute                              | UPDATE_IN_PROGRESS | 2015-12-09T09:59:26Z | Compute                                     |
| Controller                                  | 42661f8f-6f60-4a8e-bc7a-56a7c4f08d5d          | OS::Heat::ResourceGroup                           | UPDATE_FAILED      | 2015-12-09T10:31:05Z |                                             |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+


...and it looks to me like this was caused by failing to cancel an operation on a child stack (which is a known problem for Kilo, fixed in Liberty: see https://bugs.launchpad.net/heat/+bug/1446252 for more details). It doesn't appear to be caused by an uncaught exception - the child stacks should eventually time out, they're unlikely to be permanently stuck in that state. Therefore, I'm going to close this bug. We can reopen if we see specific evidence that there are uncaught exceptions happening again.

Comment 2 Zane Bitter 2016-01-06 21:34:39 UTC
*** Bug 1293117 has been marked as a duplicate of this bug. ***