1314080 – [Heat] Stack failed, resource stuck IN_PROGRESS

Bug 1314080 - [Heat] Stack failed, resource stuck IN_PROGRESS

Summary: [Heat] Stack failed, resource stuck IN_PROGRESS

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-heat
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	10.0 (Newton)
Assignee:	Steve Baker
QA Contact:	Amit Ugol
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1379716 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-03-02 20:52 UTC by Joe Talerico
Modified:	2016-12-14 15:25 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openstack-heat-7.0.0-0.20160923054727.e4c4c56.el7ost
Doc Type:	Enhancement
Doc Text:	With this enhancement, `heat-manage` now supports a `heat-manage reset_stack_status` subcommand. This was added to manage situations where `heat-engine` was unable to contact the database, causing any stacks that were in-progress to remain stuck due to outdated database information. When this occurred, administrators needed a way to reset the status to allow these stacks to be updated again. As a result, administrators can now use the `heat-manage reset_stack_status` command to reset a stuck stack.
Clone Of:
Environment:
Last Closed:	2016-12-14 15:25:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1561214	None	None	None	2016-03-23 21:26:50 UTC
Launchpad	1570576	None	None	None	2016-09-27 14:01:12 UTC
OpenStack gerrit	305306	'None'	MERGED	Add command to reset one stack status	2021-02-02 03:03:38 UTC
Red Hat Product Errata	RHEA-2016:2948	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 10 enhancement update	2016-12-14 19:55:27 UTC

Description Joe Talerico 2016-03-02 20:52:05 UTC

Description of problem:
Unable to update overcloud deployment or scale it out any further due to heat resources that think I have a deployment still running.

What I am seeing:
https://gist.github.com/jtaleric/9758204a799fc530243b#file-rackspace-scale-issue-log

Version-Release number of selected component (if applicable):
ospd73

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud, mariadb runs out of file descriptors which causes the deployment to fail, and leaves heat in a bad state.

Actual results:
https://gist.github.com/jtaleric/9758204a799fc530243b#file-rackspace-scale-issue-log

Expected results:
heat resources to be reaped/cleaned up.

Additional info:

Comment 2 Steve Baker 2016-03-02 21:19:54 UTC

Running out of file descriptors will be difficult to reproduce. This particular state can be replicated by setting some resources to IN_PROGRESS while their stacks are in an UPDATE_FAILED state.

Comment 3 Steve Baker 2016-03-09 21:26:04 UTC

I'm suggesting a heat-manage command which acts on a single stack and traverses all nested stacks to put any IN_PROGRESS things to FAILED, and clear hooks.

Comment 6 Zane Bitter 2016-09-27 14:02:47 UTC

*** Bug 1379716 has been marked as a duplicate of this bug. ***

Comment 7 Andreas Karis 2016-09-27 14:31:16 UTC

*** Bug 1379716 has been marked as a duplicate of this bug. ***

Comment 8 Steve Baker 2016-09-28 11:59:26 UTC

The command to fix a stack landed in the first Newton milestone:

 heat-manage reset_stack_status --help
usage: heat-manage reset_stack_status [-h] stack_id

positional arguments:
  stack_id    Stack id

optional arguments:
  -h, --help  show this help message and exit

Comment 12 Amit Ugol 2016-11-02 07:15:49 UTC

This bug is fixed though it did uncover a new one in https://bugs.launchpad.net/heat/+bug/1638476

Comment 14 errata-xmlrpc 2016-12-14 15:25:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html

Note You need to log in before you can comment on or make changes to this bug.