Description of problem: Set IN_PROGRESS resources as FAILED when heat engine is restarted Prior to OSP 10, stopping heat services and starting them again would make all resources go to FAILED state. From there, it was easy to delete the stack. In OSP 10, when a stack is stuck, timing out ... this method does not bring the stack into FAILED it keeps the DELETE_IN_PROGRESS (or UPDATE_IN_PROGRESS) which the stack had before. While this is less critical on a stack delete (because the nested stacks can be deleted), there are situations in which the stack is stuck in UPDATE_IN_PROGRESS and even heat resource-signal does not bring the IN_PROGRESS resources forward. A restart of heat-engine does not help here. This seems to be a regression from OSP 9 to 10. This is happening with the tripleo overcloud stack (deployed by the undercloud). If this is an improvement, then we should have a command that allows us to set IN_PROGRESS resources to FAILED. Thanks, Andreas
FYI, looks as if we have to rerun a restart of heat services multiple times (at least 2x) now before it takes effect.
This is definitely not intentional. I wonder if https://review.openstack.org/#/c/320348/ is the cause - there was a problem with an earlier version of the patch (https://bugs.launchpad.net/heat/+bug/1584724). Can you attach some logs so we can see what is going on? There actually is now (since Newton) a command that you can use to reset a stack if necessary: "heat-manage reset_stack_status".
Oh, awesome: [root@undercloud-1 ~]# systemctl stop openstack-heat-engine [root@undercloud-1 ~]# heat-manage reset_stack_status [root@undercloud-1 ~]# heat-manage reset_stack_status 693af527-5d28-479b-bab9-5b2510d80d80 Warning: this command is potentially destructive and only intended to recover from specific crashes. It is advised to shutdown all Heat engines beforehand. Continue ? [y/N] y [root@undercloud-1 ~]# systemctl start openstack-heat-engine [root@undercloud-1 ~]# . stackrc [root@undercloud-1 ~]# heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+---------------+----------------------+----------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+---------------+----------------------+----------------------+ | 693af527-5d28-479b-bab9-5b2510d80d80 | overcloud | UPDATE_FAILED | 2017-03-25T22:03:59Z | 2017-04-25T18:21:53Z | +--------------------------------------+------------+---------------+----------------------+----------------------+ [root@undercloud-1 ~]# Let me try to reproduce this and attach logs.
As always, I cannot recreate this when I need it. In my lab it does now behave well. However, I hit this today at a (restricted, so it will be hard to get the logs) customer environment. The next time that I hit this issue, I am going to update this bugzilla.
*** Bug 1608022 has been marked as a duplicate of this bug. ***
Hi there, If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -. Thanks, Alex
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2716