Red Hat Bugzilla – Bug 1253773
stack-cancel-update doesn't work on overcloud updates
Last modified: 2016-05-26 09:45:53 EDT
Description of problem: heat stack-cancel-update overcloud leaves the stack in a ROLLBACK_FAILED state.
Version-Release number of selected component (if applicable): 2015.1.0-5
How reproducible: Always (?)
Steps to Reproduce:
1. Start a stack update of the overcloud
2. heat stack-cancel-update overcloud
Actual results: Stack ends up in a ROLLBACK_FAILED state.
Expected results: Update cancelled successfully
Additional info: Error message:
| stack_status_reason | unicode: Stack overcloud-Networks-ihe2kxijocrw already has an action (UPDATE) in progress.
The problem is that when a root stack's update is cancelled, we don't have a mechanism of also recursively stopping any children.
This is a known issue that has been fixed upstream for Liberty (https://bugs.launchpad.net/heat/+bug/1446252), but we can't backport the fix as it causes other problems - we'll need the fix for https://bugs.launchpad.net/heat/+bug/1475057 (still in progress) as well.
Zane, is there a workaround for this?
FWIW, this became a significant issue when we bumped the default Heat timeout to 4 hours. It means that if a stack update goes bad somehow and you need to re-run, you can lose half a day waiting for the update to time out now (assuming this is a live cloud that you can't just stack-delete and re-deploy).
Fix to the fix has merged upstream, so we can probably squash them together into a backport. There's no real workaround that I can think of.
OK, the big problem with this is that the fix requires an RPC API change, which is something that we can't safely backport to a stable release I don't really see a solution here before RHOS 8.
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.
*** Bug 1292212 has been marked as a duplicate of this bug. ***