Bug 1314080 - [Heat] Stack failed, resource stuck IN_PROGRESS
[Heat] Stack failed, resource stuck IN_PROGRESS
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
medium Severity high
: rc
: 10.0 (Newton)
Assigned To: Steve Baker
Amit Ugol
: Triaged
: 1379716 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-02 15:52 EST by Joe Talerico
Modified: 2016-12-14 10:25 EST (History)
10 users (show)

See Also:
Fixed In Version: openstack-heat-7.0.0-0.20160923054727.e4c4c56.el7ost
Doc Type: Enhancement
Doc Text:
With this enhancement, `heat-manage` now supports a `heat-manage reset_stack_status` subcommand. This was added to manage situations where `heat-engine` was unable to contact the database, causing any stacks that were in-progress to remain stuck due to outdated database information. When this occurred, administrators needed a way to reset the status to allow these stacks to be updated again. As a result, administrators can now use the `heat-manage reset_stack_status` command to reset a stuck stack.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-12-14 10:25:17 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1561214 None None None 2016-03-23 17:26 EDT
Launchpad 1570576 None None None 2016-09-27 10:01 EDT
OpenStack gerrit 305306 None None None 2016-09-28 07:59 EDT

  None (edit)
Description Joe Talerico 2016-03-02 15:52:05 EST
Description of problem:
Unable to update overcloud deployment or scale it out any further due to heat resources that think I have a deployment still running.

What I am seeing:
https://gist.github.com/jtaleric/9758204a799fc530243b#file-rackspace-scale-issue-log

Version-Release number of selected component (if applicable):
ospd73

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud, mariadb runs out of file descriptors which causes the deployment to fail, and leaves heat in a bad state.

Actual results:
https://gist.github.com/jtaleric/9758204a799fc530243b#file-rackspace-scale-issue-log

Expected results:
heat resources to be reaped/cleaned up.

Additional info:
Comment 2 Steve Baker 2016-03-02 16:19:54 EST
Running out of file descriptors will be difficult to reproduce. This particular state can be replicated by setting some resources to IN_PROGRESS while their stacks are in an UPDATE_FAILED state.
Comment 3 Steve Baker 2016-03-09 16:26:04 EST
I'm suggesting a heat-manage command which acts on a single stack and traverses all nested stacks to put any IN_PROGRESS things to FAILED, and clear hooks.
Comment 6 Zane Bitter 2016-09-27 10:02:47 EDT
*** Bug 1379716 has been marked as a duplicate of this bug. ***
Comment 7 Andreas Karis 2016-09-27 10:31:16 EDT
*** Bug 1379716 has been marked as a duplicate of this bug. ***
Comment 8 Steve Baker 2016-09-28 07:59:26 EDT
The command to fix a stack landed in the first Newton milestone:

 heat-manage reset_stack_status --help
usage: heat-manage reset_stack_status [-h] stack_id

positional arguments:
  stack_id    Stack id

optional arguments:
  -h, --help  show this help message and exit
Comment 12 Amit Ugol 2016-11-02 03:15:49 EDT
This bug is fixed though it did uncover a new one in https://bugs.launchpad.net/heat/+bug/1638476
Comment 14 errata-xmlrpc 2016-12-14 10:25:17 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html

Note You need to log in before you can comment on or make changes to this bug.