Bug 1314080 - [Heat] Stack failed, resource stuck IN_PROGRESS
Summary: [Heat] Stack failed, resource stuck IN_PROGRESS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: rc
: 10.0 (Newton)
Assignee: Steve Baker
QA Contact: Amit Ugol
URL:
Whiteboard:
: 1379716 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-02 20:52 UTC by Joe Talerico
Modified: 2016-12-14 15:25 UTC (History)
10 users (show)

Fixed In Version: openstack-heat-7.0.0-0.20160923054727.e4c4c56.el7ost
Doc Type: Enhancement
Doc Text:
With this enhancement, `heat-manage` now supports a `heat-manage reset_stack_status` subcommand. This was added to manage situations where `heat-engine` was unable to contact the database, causing any stacks that were in-progress to remain stuck due to outdated database information. When this occurred, administrators needed a way to reset the status to allow these stacks to be updated again. As a result, administrators can now use the `heat-manage reset_stack_status` command to reset a stuck stack.
Clone Of:
Environment:
Last Closed: 2016-12-14 15:25:17 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 305306 None None None 2016-09-28 11:59:26 UTC
Launchpad 1561214 None None None 2016-03-23 21:26:50 UTC
Launchpad 1570576 None None None 2016-09-27 14:01:12 UTC

Description Joe Talerico 2016-03-02 20:52:05 UTC
Description of problem:
Unable to update overcloud deployment or scale it out any further due to heat resources that think I have a deployment still running.

What I am seeing:
https://gist.github.com/jtaleric/9758204a799fc530243b#file-rackspace-scale-issue-log

Version-Release number of selected component (if applicable):
ospd73

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud, mariadb runs out of file descriptors which causes the deployment to fail, and leaves heat in a bad state.

Actual results:
https://gist.github.com/jtaleric/9758204a799fc530243b#file-rackspace-scale-issue-log

Expected results:
heat resources to be reaped/cleaned up.

Additional info:

Comment 2 Steve Baker 2016-03-02 21:19:54 UTC
Running out of file descriptors will be difficult to reproduce. This particular state can be replicated by setting some resources to IN_PROGRESS while their stacks are in an UPDATE_FAILED state.

Comment 3 Steve Baker 2016-03-09 21:26:04 UTC
I'm suggesting a heat-manage command which acts on a single stack and traverses all nested stacks to put any IN_PROGRESS things to FAILED, and clear hooks.

Comment 6 Zane Bitter 2016-09-27 14:02:47 UTC
*** Bug 1379716 has been marked as a duplicate of this bug. ***

Comment 7 Andreas Karis 2016-09-27 14:31:16 UTC
*** Bug 1379716 has been marked as a duplicate of this bug. ***

Comment 8 Steve Baker 2016-09-28 11:59:26 UTC
The command to fix a stack landed in the first Newton milestone:

 heat-manage reset_stack_status --help
usage: heat-manage reset_stack_status [-h] stack_id

positional arguments:
  stack_id    Stack id

optional arguments:
  -h, --help  show this help message and exit

Comment 12 Amit Ugol 2016-11-02 07:15:49 UTC
This bug is fixed though it did uncover a new one in https://bugs.launchpad.net/heat/+bug/1638476

Comment 14 errata-xmlrpc 2016-12-14 15:25:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.