Description of problem: When running a stack update in TripleO, overcloud heat stack gets stuck in UPDATE_IN_PROGRESS, even though no operations are happening (no child resources are reported as IN_PROGRESS). [stack@instack ~]$ heat resource-list overcloud -n5 | grep PROG [stack@instack ~]$ heat stack-list +--------------------------------------+------------+--------------------+---------------------+---------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+--------------------+---------------------+---------------------+ | f91b188f-7d79-4a05-8ba7-37f218215fa1 | overcloud | UPDATE_IN_PROGRESS | 2016-01-11T15:52:24 | 2016-01-14T13:32:05 | +--------------------------------------+------------+--------------------+---------------------+---------------------+ The root cause within Heat could be the same as in OSP 7 bug 1293421. I found an exception in heat-engine log: led 14 08:32:05 instack.localdomain heat-engine[18136]: Traceback (most recent call last): led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, in fire_timers led 14 08:32:05 instack.localdomain heat-engine[18136]: timer() led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__ led 14 08:32:05 instack.localdomain heat-engine[18136]: cb(*args, **kw) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main led 14 08:32:05 instack.localdomain heat-engine[18136]: result = function(*args, **kwargs) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 117, in _start_with_trace led 14 08:32:05 instack.localdomain heat-engine[18136]: return func(*args, **kwargs) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper led 14 08:32:05 instack.localdomain heat-engine[18136]: return f(*args, **kwargs) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 974, in update led 14 08:32:05 instack.localdomain heat-engine[18136]: updater() led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 169, in __call__ led 14 08:32:05 instack.localdomain heat-engine[18136]: self.start(timeout=timeout) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 194, in start led 14 08:32:05 instack.localdomain heat-engine[18136]: self.step() led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 217, in step led 14 08:32:05 instack.localdomain heat-engine[18136]: next(self._runner) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 285, in wrapper led 14 08:32:05 instack.localdomain heat-engine[18136]: subtask = next(parent) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 1206, in update_task led 14 08:32:05 instack.localdomain heat-engine[18136]: updater.start(timeout=self.timeout_secs()) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 194, in start led 14 08:32:05 instack.localdomain heat-engine[18136]: self.step() led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 217, in step led 14 08:32:05 instack.localdomain heat-engine[18136]: next(self._runner) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 285, in wrapper led 14 08:32:05 instack.localdomain heat-engine[18136]: subtask = next(parent) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/update.py", line 53, in __call__ led 14 08:32:05 instack.localdomain heat-engine[18136]: self.previous_stack.dependencies, led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 282, in dependencies led 14 08:32:05 instack.localdomain heat-engine[18136]: six.itervalues(self.resources)) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 242, in resources led 14 08:32:05 instack.localdomain heat-engine[18136]: self.t.resource_definitions(self).items()) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 241, in <genexpr> led 14 08:32:05 instack.localdomain heat-engine[18136]: for (name, data) in led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 137, in __new__ led 14 08:32:05 instack.localdomain heat-engine[18136]: files=stack.t.files) led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/environment.py", line 435, in get_class led 14 08:32:05 instack.localdomain heat-engine[18136]: raise exception.ResourceTypeNotFound(type_name=resource_type) led 14 08:32:05 instack.localdomain heat-engine[18136]: ResourceTypeNotFound: The Resource Type (OS::TripleO::EndpointMap) could not be found. Version-Release number of selected component (if applicable): openstack-heat-engine-5.0.0-1.el7ost.noarch Steps to Reproduce: Deploy OSP 7.2 with OSPd 7.2, update undercloud to OSP 8 openstack-heat-engine-5.0.0-1.el7ost.noarch, attempt to run a stack-update on the overcloud stack with stable/liberty tripleo-heat-templates. Expected results: If there's a problem detected by Heat during the update, stack should be UPDATE_FAILED, otherwise it should progress forward with the stack update.
openstack-heat-engine-5.0.0-1.el7ost.noarch was built before the fix for bug 1293421 was implemented, so quite possibly we're just missing a backport here.
The fix for bug 1293421 is in stable/liberty upstream and we're expecting a release of that next week and plan to rebase then to pick it up. That will be sufficient to prevent the stack getting stuck in UPDATE_IN_PROGRESS, but it cannot solve the actual root cause, which is that we should not be getting an exception about a resource type not being found after the update has got underway.
*** Bug 1293117 has been marked as a duplicate of this bug. ***
Please, could you back port this fix ASAP on to RHOSP 7.2 or 7.3? We in Telefónica are apparently hitting this bug when trying to update packages in the Overcloud nodes.
(In reply to Felipe Alfaro Solana from comment #5) > Please, could you back port this fix ASAP on to RHOSP 7.2 or 7.3? We in > Telefónica are apparently hitting this bug when trying to update packages in > the Overcloud nodes. Can you clarify, are you hitting the "ResourceTypeNotFound: The Resource Type (OS::TripleO::EndpointMap) could not be found." error specifically, or are you experiencing a problem where the stack remains stuck IN_PROGRESS even when it is no longer doing anything due to an exception? Because the latter was already fixed in 7.2 as bug 1280094.
The ResourceTypeNotFound part looks like the same issue as bug 1290950 in RHOS 7: https://bugzilla.redhat.com/show_bug.cgi?id=1290950#c36 (Note that I think this only happens after cancelling an update in mid-flight, e.g. by restarting heat-engine).
This is an 8.0 bug. Please keep 7.3 discussion on bug 1290950.
(7.2 to 8.0 upgrade is: not supported). unable to reproduce the issue upgrade 7.3 to 8.0: -------------------------------------------------- openstack-heat-engine-5.0.1-5.el7ost.noarch openstack-heat-api-cloudwatch-5.0.1-5.el7ost.noarch openstack-heat-templates-0-0.8.20150605git.el7ost.noarch python-heatclient-1.0.0-1.el7ost.noarch openstack-heat-common-5.0.1-5.el7ost.noarch openstack-heat-api-5.0.1-5.el7ost.noarch openstack-tripleo-heat-templates-0.8.14-7.el7ost.noarch openstack-heat-api-cfn-5.0.1-5.el7ost.noarch heat-cfntools-1.2.8-2.el7.noarch openstack-tripleo-heat-templates-kilo-0.8.14-7.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0636.html