Bug 1386403 - Failed to update stack overcloud because of an action (ROLLBACK) in progress
Summary: Failed to update stack overcloud because of an action (ROLLBACK) in progress
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Zane Bitter
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-18 20:35 UTC by bzhou1
Modified: 2020-04-15 14:44 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-20 16:08:49 UTC
Target Upstream Version:


Attachments (Terms of Use)
output of heat stack-show overcloud (670.93 KB, text/plain)
2016-10-18 20:35 UTC, bzhou1
no flags Details

Description bzhou1 2016-10-18 20:35:25 UTC
Created attachment 1211843 [details]
output of heat stack-show overcloud

Description of problem:
A failure happened when scaling up an established overcloud. I tried to clean up the heat stack and re-deploy. After deleting the nova instance and ironic nodes at undercloud, I tried to deploy the compute nodes again. But it fails immediately because of "ERROR: Stack overcloud already has an action (ROLLBACK) in progress”. I cannot cancel heat-update because of “ERROR: Cancelling update when stack is (u'ROLLBACK', u'IN_PROGRESS') is not supported”. Although the rollback in progress finally changed into rollback failed state, but I still cannot do another update.
Now, I am stuck on this error thus cannot add new compute nodes to the overcloud.
To prevent this happens, user should not be allowed to delete the undercloud resources such as nova instance or ironic nodes;
Or heat should force clean up failed update. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. 
2.
3.

Actual results:
Cannot cancel the failed undercloud heat stack update;
Cannot do a new undercloud heat stack update;

Expected results:
reject the requests to delete undercloud resources directly, if heat is the owner of such resources;
User should be able to delete/cancel any failed actions, such as a heat update or rollback;
 

Additional info:

Comment 1 Zane Bitter 2016-10-18 21:17:36 UTC
Cancelling a stack update is not recommended for TripleO, because it causes a rollback and TripleO is not designed to deal with rollbacks. The best approach is to allow the stack and all of its nested stacks to either succeed, fail or timeout. It is possible to accelerate this process by restarting heat-engine on the undercloud (after which all IN_PROGRESS stacks should move to the FAILED state), and this usually works fine now (although in earlier versions of OSPd it was often problematic).

Once the stack has reached the ROLLBACK_FAILED state you should be able to update it again, although you may have to wait for some nested stacks to time out also (this latter part is fixed in Newton). If you're still having issues, I would suggest restarting heat-engine to ensure all stacks are reset.

Comment 2 bzhou1 2016-10-20 14:00:28 UTC
Zene,
Thanks a lot for the prompt response.
After restart heat-engine I can continue my operation, e.g. adding compute nodes to the existing overcloud. I think we can close the bug as no-problem-found. Should I change the state to close? Thanks.
Bin

(In reply to Zane Bitter from comment #1)
> Cancelling a stack update is not recommended for TripleO, because it causes
> a rollback and TripleO is not designed to deal with rollbacks. The best
> approach is to allow the stack and all of its nested stacks to either
> succeed, fail or timeout. It is possible to accelerate this process by
> restarting heat-engine on the undercloud (after which all IN_PROGRESS stacks
> should move to the FAILED state), and this usually works fine now (although
> in earlier versions of OSPd it was often problematic).
> 
> Once the stack has reached the ROLLBACK_FAILED state you should be able to
> update it again, although you may have to wait for some nested stacks to
> time out also (this latter part is fixed in Newton). If you're still having
> issues, I would suggest restarting heat-engine to ensure all stacks are
> reset.

Comment 3 Zane Bitter 2016-10-20 16:08:49 UTC
Great! Thanks for the update.


Note You need to log in before you can comment on or make changes to this bug.