Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1314080

Summary: [Heat] Stack failed, resource stuck IN_PROGRESS
Product: Red Hat OpenStack Reporter: Joe Talerico <jtaleric>
Component: openstack-heatAssignee: Steve Baker <sbaker>
Status: CLOSED ERRATA QA Contact: Amit Ugol <augol>
Severity: high Docs Contact:
Priority: medium    
Version: 7.0 (Kilo)CC: akaris, jcoufal, jschluet, mburns, mlopes, rhel-osp-director-maint, sbaker, shardy, srevivo, zbitter
Target Milestone: rcKeywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-heat-7.0.0-0.20160923054727.e4c4c56.el7ost Doc Type: Enhancement
Doc Text:
With this enhancement, `heat-manage` now supports a `heat-manage reset_stack_status` subcommand. This was added to manage situations where `heat-engine` was unable to contact the database, causing any stacks that were in-progress to remain stuck due to outdated database information. When this occurred, administrators needed a way to reset the status to allow these stacks to be updated again. As a result, administrators can now use the `heat-manage reset_stack_status` command to reset a stuck stack.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 15:25:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joe Talerico 2016-03-02 20:52:05 UTC
Description of problem:
Unable to update overcloud deployment or scale it out any further due to heat resources that think I have a deployment still running.

What I am seeing:
https://gist.github.com/jtaleric/9758204a799fc530243b#file-rackspace-scale-issue-log

Version-Release number of selected component (if applicable):
ospd73

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud, mariadb runs out of file descriptors which causes the deployment to fail, and leaves heat in a bad state.

Actual results:
https://gist.github.com/jtaleric/9758204a799fc530243b#file-rackspace-scale-issue-log

Expected results:
heat resources to be reaped/cleaned up.

Additional info:

Comment 2 Steve Baker 2016-03-02 21:19:54 UTC
Running out of file descriptors will be difficult to reproduce. This particular state can be replicated by setting some resources to IN_PROGRESS while their stacks are in an UPDATE_FAILED state.

Comment 3 Steve Baker 2016-03-09 21:26:04 UTC
I'm suggesting a heat-manage command which acts on a single stack and traverses all nested stacks to put any IN_PROGRESS things to FAILED, and clear hooks.

Comment 6 Zane Bitter 2016-09-27 14:02:47 UTC
*** Bug 1379716 has been marked as a duplicate of this bug. ***

Comment 7 Andreas Karis 2016-09-27 14:31:16 UTC
*** Bug 1379716 has been marked as a duplicate of this bug. ***

Comment 8 Steve Baker 2016-09-28 11:59:26 UTC
The command to fix a stack landed in the first Newton milestone:

 heat-manage reset_stack_status --help
usage: heat-manage reset_stack_status [-h] stack_id

positional arguments:
  stack_id    Stack id

optional arguments:
  -h, --help  show this help message and exit

Comment 12 Amit Ugol 2016-11-02 07:15:49 UTC
This bug is fixed though it did uncover a new one in https://bugs.launchpad.net/heat/+bug/1638476

Comment 14 errata-xmlrpc 2016-12-14 15:25:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html