If an exception occurs after a stack has been put into UPDATE_IN_PROGRESS, but not while actually running the tasks that perform the actual resource changes, then the operation can be aborted leaving the stack stuck in the IN_PROGRESS state. While this shouldn't happen (and is always a bug if it does), we have encountered bugs of this type when attempting to upgrade rhos-director from 7.0 to 7.x We should ensure that any exceptions that happen after the stack goes into the IN_PROGRESS state should result in it being moved to FAILED with the reason.
If the upstream fix shapes up in time I'll be asking that this be a blocker for 7.2
Oops: https://bugs.launchpad.net/heat/+bug/1521881 It appears the patch was harmless, but did not fix the issue. (This is really hard to test, because it requires another bug to trigger it.)
We really need the heat-engine.log stack trace of the exception which caused this so that we can replicate it locally by deliberately raising a similar exception.
alas, no. Local issues at the lab broke the vms they were in. I am re-creating the system to be exactly as it was. If you have a system ready for testing, (7.0 GA) then please try as well but do not do the workaround for bug #1272347.
Doc/workaround works for me.
To break confusion, there were a few issues here that caused a specific state while updating which ended in a failure. Those issues seem to be fixed here, with only one (?) issues that remains when I missed a workaround (see comment 7). Since we MUST do the workaround anyway when upgrading from 7.0 GA and from what I have tested thus far, it is safe to assume that we don't hit the original issue.
Yes, and I'm testing it. If a few more times this issue will not reproduce itself, then I will mark as verified.
marking this as verified because it is enough for rhos 7. cloning it to rhos 8.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-0266.html