Bug 1253773

Summary:	stack-cancel-update doesn't work on overcloud updates
Product:	Red Hat OpenStack	Reporter:	Ben Nemec <bnemec>
Component:	openstack-heat	Assignee:	Zane Bitter <zbitter>
Status:	CLOSED WONTFIX	QA Contact:	Amit Ugol <augol>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	Director	CC:	calfonso, mcornea, rhosp-bugs-internal, sbaker, shardy, vincent, yeylon, zbitter
Target Milestone:	async	Keywords:	ZStream
Target Release:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-09-02 21:45:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ben Nemec 2015-08-14 16:15:05 UTC

Description of problem: heat stack-cancel-update overcloud leaves the stack in a ROLLBACK_FAILED state.


Version-Release number of selected component (if applicable): 2015.1.0-5


How reproducible: Always (?)


Steps to Reproduce:
1. Start a stack update of the overcloud
2. heat stack-cancel-update overcloud
3.

Actual results: Stack ends up in a ROLLBACK_FAILED state.


Expected results: Update cancelled successfully


Additional info: Error message:

| stack_status_reason   | unicode: Stack overcloud-Networks-ihe2kxijocrw already has an action (UPDATE) in progress.

Comment 3 Zane Bitter 2015-08-19 13:36:42 UTC

The problem is that when a root stack's update is cancelled, we don't have a mechanism of also recursively stopping any children.

This is a known issue that has been fixed upstream for Liberty (https://bugs.launchpad.net/heat/+bug/1446252), but we can't backport the fix as it causes other problems - we'll need the fix for https://bugs.launchpad.net/heat/+bug/1475057 (still in progress) as well.

Comment 4 chris alfonso 2015-08-28 18:45:03 UTC

Zane, is there a workaround for this?

Comment 5 Ben Nemec 2015-08-28 21:15:39 UTC

FWIW, this became a significant issue when we bumped the default Heat timeout to 4 hours.  It means that if a stack update goes bad somehow and you need to re-run, you can lose half a day waiting for the update to time out now (assuming this is a live cloud that you can't just stack-delete and re-deploy).

Comment 6 Zane Bitter 2015-08-31 20:25:58 UTC

Fix to the fix has merged upstream, so we can probably squash them together into a backport. There's no real workaround that I can think of.

Comment 7 Zane Bitter 2015-09-02 21:42:46 UTC

OK, the big problem with this is that the fix requires an RPC API change, which is something that we can't safely backport to a stable release I don't really see a solution here before RHOS 8.

Comment 8 RHEL Program Management 2015-09-02 21:45:40 UTC

Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 9 Zane Bitter 2016-01-07 19:35:37 UTC

*** Bug 1292212 has been marked as a duplicate of this bug. ***