Bug 1298589

Summary: Overcloud update to 8.0 fails with "The Resource Type (OS::TripleO::EndpointMap) could not be found"
Product: Red Hat OpenStack Reporter: Jiri Stransky <jstransk>
Component: openstack-heatAssignee: Zane Bitter <zbitter>
Status: CLOSED ERRATA QA Contact: Omri Hochman <ohochman>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: augol, dhill, felipe.alfaro, jcoufal, jschluet, kholden, mburns, mcornea, rhel-osp-director-maint, sbaker, shardy, yeylon, zbitter
Target Milestone: gaKeywords: Regression, TestOnly, ZStream
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-heat-5.0.1-2.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-15 13:46:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1290950    
Bug Blocks:    

Description Jiri Stransky 2016-01-14 13:51:45 UTC
Description of problem:

When running a stack update in TripleO, overcloud heat stack gets stuck in UPDATE_IN_PROGRESS, even though no operations are happening (no child resources are reported as IN_PROGRESS).

[stack@instack ~]$ heat resource-list overcloud -n5 | grep PROG
[stack@instack ~]$ heat stack-list
+--------------------------------------+------------+--------------------+---------------------+---------------------+
| id                                   | stack_name | stack_status       | creation_time       | updated_time        |
+--------------------------------------+------------+--------------------+---------------------+---------------------+
| f91b188f-7d79-4a05-8ba7-37f218215fa1 | overcloud  | UPDATE_IN_PROGRESS | 2016-01-11T15:52:24 | 2016-01-14T13:32:05 |
+--------------------------------------+------------+--------------------+---------------------+---------------------+

The root cause within Heat could be the same as in OSP 7 bug 1293421. I found an exception in heat-engine log:

led 14 08:32:05 instack.localdomain heat-engine[18136]: Traceback (most recent call last):
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, in fire_timers
led 14 08:32:05 instack.localdomain heat-engine[18136]: timer()
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
led 14 08:32:05 instack.localdomain heat-engine[18136]: cb(*args, **kw)
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
led 14 08:32:05 instack.localdomain heat-engine[18136]: result = function(*args, **kwargs)
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 117, in _start_with_trace
led 14 08:32:05 instack.localdomain heat-engine[18136]: return func(*args, **kwargs)
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper
led 14 08:32:05 instack.localdomain heat-engine[18136]: return f(*args, **kwargs)
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 974, in update
led 14 08:32:05 instack.localdomain heat-engine[18136]: updater()
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 169, in __call__
led 14 08:32:05 instack.localdomain heat-engine[18136]: self.start(timeout=timeout)
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 194, in start
led 14 08:32:05 instack.localdomain heat-engine[18136]: self.step()
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 217, in step
led 14 08:32:05 instack.localdomain heat-engine[18136]: next(self._runner)
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 285, in wrapper
led 14 08:32:05 instack.localdomain heat-engine[18136]: subtask = next(parent)
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 1206, in update_task
led 14 08:32:05 instack.localdomain heat-engine[18136]: updater.start(timeout=self.timeout_secs())
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 194, in start
led 14 08:32:05 instack.localdomain heat-engine[18136]: self.step()
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 217, in step
led 14 08:32:05 instack.localdomain heat-engine[18136]: next(self._runner)
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 285, in wrapper
led 14 08:32:05 instack.localdomain heat-engine[18136]: subtask = next(parent)
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/update.py", line 53, in __call__
led 14 08:32:05 instack.localdomain heat-engine[18136]: self.previous_stack.dependencies,
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 282, in dependencies
led 14 08:32:05 instack.localdomain heat-engine[18136]: six.itervalues(self.resources))
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 242, in resources
led 14 08:32:05 instack.localdomain heat-engine[18136]: self.t.resource_definitions(self).items())
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 241, in <genexpr>
led 14 08:32:05 instack.localdomain heat-engine[18136]: for (name, data) in
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 137, in __new__
led 14 08:32:05 instack.localdomain heat-engine[18136]: files=stack.t.files)
led 14 08:32:05 instack.localdomain heat-engine[18136]: File "/usr/lib/python2.7/site-packages/heat/engine/environment.py", line 435, in get_class
led 14 08:32:05 instack.localdomain heat-engine[18136]: raise exception.ResourceTypeNotFound(type_name=resource_type)
led 14 08:32:05 instack.localdomain heat-engine[18136]: ResourceTypeNotFound: The Resource Type (OS::TripleO::EndpointMap) could not be found.


Version-Release number of selected component (if applicable):

openstack-heat-engine-5.0.0-1.el7ost.noarch


Steps to Reproduce:

Deploy OSP 7.2 with OSPd 7.2, update undercloud to OSP 8 openstack-heat-engine-5.0.0-1.el7ost.noarch, attempt to run a stack-update on the overcloud stack with stable/liberty tripleo-heat-templates.


Expected results:

If there's a problem detected by Heat during the update, stack should be UPDATE_FAILED, otherwise it should progress forward with the stack update.

Comment 2 Jiri Stransky 2016-01-14 13:53:52 UTC
openstack-heat-engine-5.0.0-1.el7ost.noarch was built before the fix for bug 1293421 was implemented, so quite possibly we're just missing a backport here.

Comment 3 Zane Bitter 2016-01-14 14:40:14 UTC
The fix for bug 1293421 is in stable/liberty upstream and we're expecting a release of that next week and plan to rebase then to pick it up.

That will be sufficient to prevent the stack getting stuck in UPDATE_IN_PROGRESS, but it cannot solve the actual root cause, which is that we should not be getting an exception about a resource type not being found after the update has got underway.

Comment 4 Zane Bitter 2016-01-21 14:22:27 UTC
*** Bug 1293117 has been marked as a duplicate of this bug. ***

Comment 5 Felipe Alfaro Solana 2016-01-28 08:34:08 UTC
Please, could you back port this fix ASAP on to RHOSP 7.2 or 7.3? We in Telefónica are apparently hitting this bug when trying to update packages in the Overcloud nodes.

Comment 6 Zane Bitter 2016-01-28 17:30:13 UTC
(In reply to Felipe Alfaro Solana from comment #5)
> Please, could you back port this fix ASAP on to RHOSP 7.2 or 7.3? We in
> Telefónica are apparently hitting this bug when trying to update packages in
> the Overcloud nodes.

Can you clarify, are you hitting the "ResourceTypeNotFound: The Resource Type (OS::TripleO::EndpointMap) could not be found." error specifically, or are you experiencing a problem where the stack remains stuck IN_PROGRESS even when it is no longer doing anything due to an exception? Because the latter was already fixed in 7.2 as bug 1280094.

Comment 7 Zane Bitter 2016-02-05 13:36:15 UTC
The ResourceTypeNotFound part looks like the same issue as bug 1290950 in RHOS 7: https://bugzilla.redhat.com/show_bug.cgi?id=1290950#c36

(Note that I think this only happens after cancelling an update in mid-flight, e.g. by restarting heat-engine).

Comment 9 Zane Bitter 2016-02-29 17:33:33 UTC
This is an 8.0 bug. Please keep 7.3 discussion on bug 1290950.

Comment 14 Omri Hochman 2016-04-12 19:56:18 UTC
(7.2 to 8.0 upgrade is: not supported).

unable to reproduce the issue upgrade 7.3 to 8.0:
--------------------------------------------------
openstack-heat-engine-5.0.1-5.el7ost.noarch
openstack-heat-api-cloudwatch-5.0.1-5.el7ost.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
python-heatclient-1.0.0-1.el7ost.noarch
openstack-heat-common-5.0.1-5.el7ost.noarch
openstack-heat-api-5.0.1-5.el7ost.noarch
openstack-tripleo-heat-templates-0.8.14-7.el7ost.noarch
openstack-heat-api-cfn-5.0.1-5.el7ost.noarch
heat-cfntools-1.2.8-2.el7.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-7.el7ost.noarch

Comment 16 errata-xmlrpc 2016-04-15 13:46:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0636.html