Bug 1253773 - stack-cancel-update doesn't work on overcloud updates
stack-cancel-update doesn't work on overcloud updates
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat (Show other bugs)
Unspecified Unspecified
high Severity unspecified
: async
: 7.0 (Kilo)
Assigned To: Zane Bitter
Amit Ugol
: ZStream
: 1292212 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2015-08-14 12:15 EDT by Ben Nemec
Modified: 2016-05-26 09:45 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-09-02 17:45:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1475057 None None None Never

  None (edit)
Description Ben Nemec 2015-08-14 12:15:05 EDT
Description of problem: heat stack-cancel-update overcloud leaves the stack in a ROLLBACK_FAILED state.

Version-Release number of selected component (if applicable): 2015.1.0-5

How reproducible: Always (?)

Steps to Reproduce:
1. Start a stack update of the overcloud
2. heat stack-cancel-update overcloud

Actual results: Stack ends up in a ROLLBACK_FAILED state.

Expected results: Update cancelled successfully

Additional info: Error message:

| stack_status_reason   | unicode: Stack overcloud-Networks-ihe2kxijocrw already has an action (UPDATE) in progress.
Comment 3 Zane Bitter 2015-08-19 09:36:42 EDT
The problem is that when a root stack's update is cancelled, we don't have a mechanism of also recursively stopping any children.

This is a known issue that has been fixed upstream for Liberty (https://bugs.launchpad.net/heat/+bug/1446252), but we can't backport the fix as it causes other problems - we'll need the fix for https://bugs.launchpad.net/heat/+bug/1475057 (still in progress) as well.
Comment 4 chris alfonso 2015-08-28 14:45:03 EDT
Zane, is there a workaround for this?
Comment 5 Ben Nemec 2015-08-28 17:15:39 EDT
FWIW, this became a significant issue when we bumped the default Heat timeout to 4 hours.  It means that if a stack update goes bad somehow and you need to re-run, you can lose half a day waiting for the update to time out now (assuming this is a live cloud that you can't just stack-delete and re-deploy).
Comment 6 Zane Bitter 2015-08-31 16:25:58 EDT
Fix to the fix has merged upstream, so we can probably squash them together into a backport. There's no real workaround that I can think of.
Comment 7 Zane Bitter 2015-09-02 17:42:46 EDT
OK, the big problem with this is that the fix requires an RPC API change, which is something that we can't safely backport to a stable release I don't really see a solution here before RHOS 8.
Comment 8 RHEL Product and Program Management 2015-09-02 17:45:40 EDT
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.
Comment 9 Zane Bitter 2016-01-07 14:35:37 EST
*** Bug 1292212 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.