Bug 1445484 - Set IN_PROGRESS resources as FAILED when heat engine is restarted
Summary: Set IN_PROGRESS resources as FAILED when heat engine is restarted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z9
: 10.0 (Newton)
Assignee: Thomas Hervé
QA Contact: Amit Ugol
URL:
Whiteboard:
: 1608022 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-25 18:34 UTC by Andreas Karis
Modified: 2021-12-10 15:05 UTC (History)
11 users (show)

Fixed In Version: openstack-heat-7.0.6-3.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-17 16:57:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-11287 0 None None None 2021-12-10 15:05:05 UTC
Red Hat Product Errata RHBA-2018:2716 0 None None None 2018-09-17 16:58:43 UTC

Description Andreas Karis 2017-04-25 18:34:55 UTC
Description of problem:
Set IN_PROGRESS resources as FAILED when heat engine is restarted

Prior to OSP 10, stopping heat services and starting them again would
make all resources go to FAILED state. From there, it was easy to
delete the stack. In OSP 10, when a stack is stuck, timing out ...
this method does not bring the stack into FAILED it keeps the
DELETE_IN_PROGRESS (or UPDATE_IN_PROGRESS) which the stack had before.

While this is less critical on a stack delete (because the nested stacks can be deleted), there are situations in which the stack is stuck in UPDATE_IN_PROGRESS and even heat resource-signal does not bring the IN_PROGRESS resources forward. A restart of heat-engine does not help here. This seems to be a regression from OSP 9 to 10.

This is happening with the tripleo overcloud stack (deployed by the undercloud).


If this is an improvement, then we should have a command that allows us to set IN_PROGRESS resources to FAILED.

Thanks,

Andreas

Comment 1 Andreas Karis 2017-04-25 18:37:21 UTC
FYI, looks as if we have to rerun a restart of heat services multiple times (at least 2x) now before it takes effect.

Comment 2 Zane Bitter 2017-04-25 19:17:15 UTC
This is definitely not intentional. I wonder if https://review.openstack.org/#/c/320348/ is the cause - there was a problem with an earlier version of the patch (https://bugs.launchpad.net/heat/+bug/1584724). Can you attach some logs so we can see what is going on?

There actually is now (since Newton) a command that you can use to reset a stack if necessary: "heat-manage reset_stack_status".

Comment 3 Andreas Karis 2017-04-25 19:43:01 UTC
Oh, awesome:

[root@undercloud-1 ~]# systemctl stop openstack-heat-engine
[root@undercloud-1 ~]# heat-manage reset_stack_status
[root@undercloud-1 ~]# heat-manage reset_stack_status 693af527-5d28-479b-bab9-5b2510d80d80
Warning: this command is potentially destructive and only intended to recover from specific crashes.
It is advised to shutdown all Heat engines beforehand.
Continue ? [y/N]
y
[root@undercloud-1 ~]# systemctl start openstack-heat-engine
[root@undercloud-1 ~]# . stackrc
[root@undercloud-1 ~]# heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+---------------+----------------------+----------------------+
| id                                   | stack_name | stack_status  | creation_time        | updated_time         |
+--------------------------------------+------------+---------------+----------------------+----------------------+
| 693af527-5d28-479b-bab9-5b2510d80d80 | overcloud  | UPDATE_FAILED | 2017-03-25T22:03:59Z | 2017-04-25T18:21:53Z |
+--------------------------------------+------------+---------------+----------------------+----------------------+
[root@undercloud-1 ~]# 


Let me try to reproduce this and attach logs.

Comment 4 Andreas Karis 2017-04-25 19:57:16 UTC
As always, I cannot recreate this when I need it. In my lab it does now behave well.

However, I hit this today at a (restricted, so it will be hard to get the logs) customer environment. The next time that I hit this issue, I am going to update this bugzilla.

Comment 6 Mike McClure 2018-07-25 15:34:34 UTC
*** Bug 1608022 has been marked as a duplicate of this bug. ***

Comment 15 Alex McLeod 2018-09-03 07:58:35 UTC
Hi there,

If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field.

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Thanks,
Alex

Comment 17 errata-xmlrpc 2018-09-17 16:57:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2716


Note You need to log in before you can comment on or make changes to this bug.