Description of problem: Unable to update overcloud deployment or scale it out any further due to heat resources that think I have a deployment still running. What I am seeing: https://gist.github.com/jtaleric/9758204a799fc530243b#file-rackspace-scale-issue-log Version-Release number of selected component (if applicable): ospd73 How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud, mariadb runs out of file descriptors which causes the deployment to fail, and leaves heat in a bad state. Actual results: https://gist.github.com/jtaleric/9758204a799fc530243b#file-rackspace-scale-issue-log Expected results: heat resources to be reaped/cleaned up. Additional info:
Running out of file descriptors will be difficult to reproduce. This particular state can be replicated by setting some resources to IN_PROGRESS while their stacks are in an UPDATE_FAILED state.
I'm suggesting a heat-manage command which acts on a single stack and traverses all nested stacks to put any IN_PROGRESS things to FAILED, and clear hooks.
*** Bug 1379716 has been marked as a duplicate of this bug. ***
The command to fix a stack landed in the first Newton milestone: heat-manage reset_stack_status --help usage: heat-manage reset_stack_status [-h] stack_id positional arguments: stack_id Stack id optional arguments: -h, --help show this help message and exit
This bug is fixed though it did uncover a new one in https://bugs.launchpad.net/heat/+bug/1638476
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html