Bug 1278975
Summary: | StackValidationFailed: Unknown resource Type : OS::TripleO::AllNodes::Validation while updating stack in UPDATE_FAILED | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | James Slagle <jslagle> |
Component: | openstack-heat | Assignee: | Steve Baker <sbaker> |
Status: | CLOSED ERRATA | QA Contact: | Amit Ugol <augol> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.0 (Kilo) | CC: | calfonso, jcoufal, jslagle, mburns, rbiba, rhel-osp-director-maint, sasha, sbaker, shardy, yeylon, zbitter |
Target Milestone: | z3 | Keywords: | ZStream |
Target Release: | 7.0 (Kilo) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-heat-2015.1.2-4.el7ost | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-12-21 17:03:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1278544 | ||
Bug Blocks: |
Description
James Slagle
2015-11-06 22:04:35 UTC
Note that this traceback causes https://bugzilla.redhat.com/show_bug.cgi?id=1278544 which means the stack is stuck in UPDATE_IN_PROGRESS forever, with no way to recover Nope, this is a Heat bug - it's trying to load the *previous* stack and not finding a type for one of the resources in the environment. This is likely because we don't write the new environment until after a stack update has succeeded, so the previous stack may contain a mixture of old and new resources, but with the old environment. I thought we had a bug for this already, but I don't see it at the moment. https://bugs.launchpad.net/heat/+bug/1477812 was a similar problem involving parameters, but the patch would not have fixed this issue with resource type mappings. Ah, found the other report of this: https://bugs.launchpad.net/heat/+bug/1508096 (from jprovazn, via me). Now we know how to reproduce it. Regarding StackValidationFailed: Unknown resource Type : OS::TripleO::AllNodes::Validation A backport of https://review.openstack.org/#/c/176324 would be a pre-requisite of diagnosing this further (and it may even fix the problem) I currently have a stack which is similarly wedged because Step4 went to UPDATE_FAILED after pacemaker failed to bring galera back up after the yum update. not sure if it helps any, but I tried to a patched Heat build with https://review.openstack.org/#/c/176324 applied, and I just get the exact same behavior as before https://review.openstack.org/#/c/176324 results in the correct exceptions being raised, but Resource needs to fallback to TemplateResource for both TemplateNotFound and ResourceTypeNotFound. I'll be coming up with a fix for this soon. http://git.openstack.org/cgit/openstack/heat/tree/heat/engine/resource.py#n141 Since the fixes for this are the same as the fixes for bug 1278544, I'm marking this one as TestOnly. updates passes CI so this is verified There is no more a way to recreate the type of failures that causes these errors while trying to recover from the previous errors (I hope I made it logical) There was an error in the backport due to the different thread_lock arguments on kilo which leads to this error on engine start: 2015-12-03 18:12:24.663 14246 TRACE heat.engine.service Traceback (most recent call last): 2015-12-03 18:12:24.663 14246 TRACE heat.engine.service File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 1627, in reset_stack_status 2015-12-03 18:12:24.663 14246 TRACE heat.engine.service with lock.thread_lock(retry=False): 2015-12-03 18:12:24.663 14246 TRACE heat.engine.service File "/usr/lib64/python2.7/contextlib.py", line 84, in helper 2015-12-03 18:12:24.663 14246 TRACE heat.engine.service return GeneratorContextManager(func(*args, **kwds)) 2015-12-03 18:12:24.663 14246 TRACE heat.engine.service TypeError: thread_lock() takes at least 2 arguments (2 given) 2015-12-03 18:12:24.663 14246 TRACE heat.engine.service which is fixed by this patch diff --git a/heat/engine/service.py b/heat/engine/service.py index ac85fdf..ea99fff 100644 --- a/heat/engine/service.py +++ b/heat/engine/service.py @@ -1624,7 +1624,7 @@ class EngineService(service.Service): lock = stack_lock.StackLock(cnxt, stk, self.engine_id) engine_id = lock.get_engine_id() try: - with lock.thread_lock(retry=False): + with lock.thread_lock(stack_id, retry=False): # refetch stack and confirm it is still I Trying again to update. The original issue still cannot be reproduced. The above fix is no longer visible to me. re-verifying. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:2680 |