Bug 1301511
Summary: | Updating a failed stack fails with Stack already has an action (UPDATE) in progress. | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Robin Cernin <rcernin> |
Component: | openstack-heat | Assignee: | Zane Bitter <zbitter> |
Status: | CLOSED NOTABUG | QA Contact: | Amit Ugol <augol> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 (Kilo) | CC: | bschmaus, calfonso, dmaley, ggillies, gkeegan, mburns, mcornea, morazi, nalmond, rcernin, rhel-osp-director-maint, sbaker, shardy, skinjo, yeylon, zbitter |
Target Milestone: | async | Keywords: | ZStream |
Target Release: | 7.0 (Kilo) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1292212 | Environment: | |
Last Closed: | 2016-01-28 18:17:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Robin Cernin
2016-01-25 09:24:30 UTC
The fix for bug 1280094 is supposed to prevent this from happening, so it would be good to know exactly what version of Heat you're running in the undercloud. As mentioned in the bug you linked, there were previously some issues with the workaround of restarting heat-engine, but we believe they are fixed. It's possible that if we are leaving resources IN_PROGRESS but not their containing stacks then they wouldn't get reset at startup, but that ought not to happen and the fact that you're getting the failure "Stack already has an action (UPDATE) in progress." suggests that the stack is also IN_PROGRESS anyway (which means it should get moved to FAILED when heat-engine starts up). Useful information would be: - A list of all resources and stacks that are still IN_PROGRESS after 4 hours - Log of the initial update that put this stuff into the IN_PROGRESS state - Journal output (`journalctl -u openstack-heat-engine`) from that same update So far we've established that as far as moving stacks to FAILED, everything is working as expected. However updates are timing out for unknown reasons - the Compute and Controller resource groups are never completing. One possible issue found in the journal is this: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply executor_callback)) File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch executor_callback) File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch result = func(ctxt, **new_args) File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 300, in wrapped return func(self, ctx, *args, **kwargs) File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 1526, in signal_software_deployment updated_at=updated_at) File "/usr/lib/python2.7/site-packages/heat/engine/service_software_config.py", line 178, in signal_software_deployment raise ValueError(_('deployment_id must be specified')) ValueError: deployment_id must be specified Which happens when an empty deployment_id is supplied with a signal. There definitely should have better logging of these kinds of errors, but we have no idea yet if this the cause of the problem, or why we're receiving a signal of this type. It doesn't appear that those tracebacks were related either. Created attachment 1118273 [details]
[Split] Full heat log 2
The only Heat bug found on this deployment was bug 1302828. In respect of resources being UPDATE_IN_PROGRESS, Heat was found to be behaving as expected. |