Bug 1042160 - [RFE][heat]: Update Failure Recovery
Summary: [RFE][heat]: Update Failure Recovery
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: Upstream M3
: 6.0 (Juno)
Assignee: Zane Bitter
QA Contact: Amit Ugol
URL: https://blueprints.launchpad.net/heat...
Whiteboard: upstream_milestone_juno-3 upstream_st...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-12 21:14 UTC by RHOS Integration
Modified: 2016-04-27 02:45 UTC (History)
8 users (show)

Fixed In Version: openstack-heat-2014.2-1.el7ost
Doc Type: Enhancement
Doc Text:
The Orchestration service now allows the user to update a stack in a FAILED state. Previously, failed stacks could only be deleted, not updated.
Clone Of:
Environment:
Last Closed: 2015-02-09 15:01:29 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2015:0147 0 normal SHIPPED_LIVE openstack-heat enhancement advisory 2015-02-09 19:53:18 UTC

Description RHOS Integration 2013-12-12 21:14:17 UTC
Cloned from launchpad blueprint https://blueprints.launchpad.net/heat/+spec/update-failure-recovery.

Description:

Currently, stack updates are handled in an all-or-nothing kind of way. If a failure occurs, we attempt to roll back to the previous state if rollback is enabled. If the rollback fails or is disabled, we leave the stack in its failed state, but accept the old or new template (respectively) as a true representation of the current state of the stack. (This means that we could lose track of some resources and not be able to delete them.) We also prohibit updates to the stack from this point on; once an update has failed, you can only delete the stack.

We need to incrementally update the current template as resources are added, removed or modified. This will give us a valid picture of the true state when a failure occurs, allowing us to safely run updates in the future.

Specification URL (additional information):

None

Comment 3 Zane Bitter 2014-10-20 17:33:25 UTC
The idea is that even if a resource fails during create or update, we should still be able to successfully run another update - with the same or a different template - and have the stack recover to the right state.

So some things that would be interesting to test are:
- Updating after a create failure
- Updating after an update failure with rollback disabled
- Updating after a rollback failure
- Update failures where the new template has added new parameters
- Update failures where the new template has removed existing parameters
- Update failures where parameter values are changing

BTW one thing to note is that when something fails, we now wait for up to 4 minutes for other in-progress resources to complete rather than killing them immediately, since we hope to be able to recover.

Comment 6 errata-xmlrpc 2015-02-09 15:01:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0147.html


Note You need to log in before you can comment on or make changes to this bug.