Bug 1278975 - StackValidationFailed: Unknown resource Type : OS::TripleO::AllNodes::Validation while updating stack in UPDATE_FAILED
StackValidationFailed: Unknown resource Type : OS::TripleO::AllNodes::Validat...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
unspecified Severity unspecified
: z3
: 7.0 (Kilo)
Assigned To: Steve Baker
Amit Ugol
: ZStream
Depends On: 1278544
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-06 17:04 EST by James Slagle
Modified: 2016-04-26 10:26 EDT (History)
11 users (show)

See Also:
Fixed In Version: openstack-heat-2015.1.2-4.el7ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-21 12:03:25 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1447194 None None None Never
Launchpad 1508096 None None None Never
Launchpad 1514615 None None None Never

  None (edit)
Description James Slagle 2015-11-06 17:04:35 EST
If you have a stack in UPDATE_FAILED (for whatever reason, such as misconfigured DNS on the overcloud nodes), and you try to restart another update after fixing the issue, heat-engine throws the falling traceback:

Nov 06 16:43:50 instack.localdomain heat-engine[4501]: Traceback (most recent call last):
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, in fire_timers
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: timer()
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: cb(*args, **kw)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: result = function(*args, **kwargs)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 112, in _start_with_trace
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: return func(*args, **kwargs)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: return f(*args, **kwargs)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 865, in update
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: updater()
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 174, in __call__
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: self.start(timeout=timeout)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 200, in start
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: self.step()
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 223, in step
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: next(self._runner)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 289, in wrapper
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: subtask = next(parent)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 918, in update_task
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: updater.start(timeout=self.timeout_secs())
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 200, in start
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: self.step()
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 223, in step
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: next(self._runner)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 289, in wrapper
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: subtask = next(parent)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/update.py", line 55, in __call__
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: self.previous_stack.dependencies,
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 238, in dependencies
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: self.resources.itervalues())
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 201, in resources
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: self.t.resource_definitions(self).items())
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 200, in <genexpr>
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: for (name, data) in
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 141, in __new__
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: resource_name=name)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: File "/usr/lib/python2.7/site-packages/heat/engine/environment.py", line 416, in get_class
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: raise exception.StackValidationFailed(message=msg)
Nov 06 16:43:50 instack.localdomain heat-engine[4501]: StackValidationFailed: Unknown resource Type : OS::TripleO::AllNodes::Validation

I'm filing this bug against python-rdomanager-oscplugin, because I suspect the problem is caused by a bug there in relation to not sending the correct environment files to the Heat API in this scenario.

Note however that I am specifying the environment file on the cli that should define this resource type:

openstack overcloud update stack overcloud -i --templates templates-y1 -e templates-y1/overcloud-resource-registry-puppet.yaml -e templates-y1/environments/network-isolation.yaml -e templates-y1/environments/net-single-nic-with-vlans.yaml -e custom-environment-7.1.yaml -e /home/stack/update.yaml

I've double checked and OS::TripleO::AllNodes::Validation is mostly defined in templates-y1/overcloud-resource-registry-puppet.yaml where templates-y1 is just a 1 for 1 copy of the templates from the latest openstack-tripleo-heat-templates package.

So I suspect the client is not sending what I've asked it to to the Heat API. I'll look into it a  bit more and change the bug to Heat or tripleo-heat-templates if I discover differently.
Comment 2 James Slagle 2015-11-06 17:07:46 EST
Note that this traceback causes https://bugzilla.redhat.com/show_bug.cgi?id=1278544

which means the stack is stuck in UPDATE_IN_PROGRESS forever, with no way to recover
Comment 3 Zane Bitter 2015-11-06 17:23:42 EST
Nope, this is a Heat bug - it's trying to load the *previous* stack and not finding a type for one of the resources in the environment. This is likely because we don't write the new environment until after a stack update has succeeded, so the previous stack may contain a mixture of old and new resources, but with the old environment.

I thought we had a bug for this already, but I don't see it at the moment.
Comment 4 Zane Bitter 2015-11-06 17:30:40 EST
https://bugs.launchpad.net/heat/+bug/1477812 was a similar problem involving parameters, but the patch would not have fixed this issue with resource type mappings.
Comment 5 Zane Bitter 2015-11-06 17:37:13 EST
Ah, found the other report of this: https://bugs.launchpad.net/heat/+bug/1508096 (from jprovazn, via me).

Now we know how to reproduce it.
Comment 6 Steve Baker 2015-11-08 22:18:35 EST
Regarding StackValidationFailed: Unknown resource Type : OS::TripleO::AllNodes::Validation

A backport of https://review.openstack.org/#/c/176324 would be a pre-requisite of diagnosing this further (and it may even fix the problem)

I currently have a stack which is similarly wedged because Step4 went to UPDATE_FAILED after pacemaker failed to bring galera back up after the yum update.
Comment 7 James Slagle 2015-11-09 16:35:47 EST
not sure if it helps any, but I tried to a patched Heat build with https://review.openstack.org/#/c/176324 applied, and I just get the exact same behavior as before
Comment 8 Steve Baker 2015-11-09 17:38:32 EST
https://review.openstack.org/#/c/176324 results in the correct exceptions being raised, but Resource needs to fallback to TemplateResource for both TemplateNotFound and ResourceTypeNotFound.

I'll be coming up with a fix for this soon.

http://git.openstack.org/cgit/openstack/heat/tree/heat/engine/resource.py#n141
Comment 10 Zane Bitter 2015-11-18 17:57:03 EST
Since the fixes for this are the same as the fixes for bug 1278544, I'm marking this one as TestOnly.
Comment 12 Amit Ugol 2015-11-23 11:10:44 EST
updates passes CI so this is verified
Comment 13 Amit Ugol 2015-11-23 11:21:37 EST
There is no more a way to recreate the type of failures that causes these errors while trying to recover from the previous errors (I hope I made it logical)
Comment 14 Steve Baker 2015-12-03 18:52:11 EST
There was an error in the backport due to the different thread_lock arguments on kilo which leads to this error on engine start:

2015-12-03 18:12:24.663 14246 TRACE heat.engine.service Traceback (most recent call last):
2015-12-03 18:12:24.663 14246 TRACE heat.engine.service   File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 1627, in reset_stack_status
2015-12-03 18:12:24.663 14246 TRACE heat.engine.service     with lock.thread_lock(retry=False):
2015-12-03 18:12:24.663 14246 TRACE heat.engine.service   File "/usr/lib64/python2.7/contextlib.py", line 84, in helper
2015-12-03 18:12:24.663 14246 TRACE heat.engine.service     return GeneratorContextManager(func(*args, **kwds))
2015-12-03 18:12:24.663 14246 TRACE heat.engine.service TypeError: thread_lock() takes at least 2 arguments (2 given)
2015-12-03 18:12:24.663 14246 TRACE heat.engine.service

which is fixed by this patch

diff --git a/heat/engine/service.py b/heat/engine/service.py
index ac85fdf..ea99fff 100644
--- a/heat/engine/service.py
+++ b/heat/engine/service.py
@@ -1624,7 +1624,7 @@ class EngineService(service.Service):
             lock = stack_lock.StackLock(cnxt, stk, self.engine_id)
             engine_id = lock.get_engine_id()
             try:
-                with lock.thread_lock(retry=False):
+                with lock.thread_lock(stack_id, retry=False):
 
                     # refetch stack and confirm it is still I
Comment 16 Amit Ugol 2015-12-14 03:06:18 EST
Trying again to update. The original issue still cannot be reproduced. The above fix is no longer visible to me. re-verifying.
Comment 18 errata-xmlrpc 2015-12-21 12:03:25 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2680

Note You need to log in before you can comment on or make changes to this bug.