Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1445484

Summary: Set IN_PROGRESS resources as FAILED when heat engine is restarted
Product: Red Hat OpenStack Reporter: Andreas Karis <akaris>
Component: openstack-heatAssignee: Thomas Hervé <therve>
Status: CLOSED ERRATA QA Contact: Amit Ugol <augol>
Severity: medium Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: akaris, marjones, mburns, mimcclur, ramishra, rhel-osp-director-maint, sbaker, sclewis, shardy, srevivo, therve
Target Milestone: z9Keywords: Reopened, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-heat-7.0.6-3.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-17 16:57:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andreas Karis 2017-04-25 18:34:55 UTC
Description of problem:
Set IN_PROGRESS resources as FAILED when heat engine is restarted

Prior to OSP 10, stopping heat services and starting them again would
make all resources go to FAILED state. From there, it was easy to
delete the stack. In OSP 10, when a stack is stuck, timing out ...
this method does not bring the stack into FAILED it keeps the
DELETE_IN_PROGRESS (or UPDATE_IN_PROGRESS) which the stack had before.

While this is less critical on a stack delete (because the nested stacks can be deleted), there are situations in which the stack is stuck in UPDATE_IN_PROGRESS and even heat resource-signal does not bring the IN_PROGRESS resources forward. A restart of heat-engine does not help here. This seems to be a regression from OSP 9 to 10.

This is happening with the tripleo overcloud stack (deployed by the undercloud).


If this is an improvement, then we should have a command that allows us to set IN_PROGRESS resources to FAILED.

Thanks,

Andreas

Comment 1 Andreas Karis 2017-04-25 18:37:21 UTC
FYI, looks as if we have to rerun a restart of heat services multiple times (at least 2x) now before it takes effect.

Comment 2 Zane Bitter 2017-04-25 19:17:15 UTC
This is definitely not intentional. I wonder if https://review.openstack.org/#/c/320348/ is the cause - there was a problem with an earlier version of the patch (https://bugs.launchpad.net/heat/+bug/1584724). Can you attach some logs so we can see what is going on?

There actually is now (since Newton) a command that you can use to reset a stack if necessary: "heat-manage reset_stack_status".

Comment 3 Andreas Karis 2017-04-25 19:43:01 UTC
Oh, awesome:

[root@undercloud-1 ~]# systemctl stop openstack-heat-engine
[root@undercloud-1 ~]# heat-manage reset_stack_status
[root@undercloud-1 ~]# heat-manage reset_stack_status 693af527-5d28-479b-bab9-5b2510d80d80
Warning: this command is potentially destructive and only intended to recover from specific crashes.
It is advised to shutdown all Heat engines beforehand.
Continue ? [y/N]
y
[root@undercloud-1 ~]# systemctl start openstack-heat-engine
[root@undercloud-1 ~]# . stackrc
[root@undercloud-1 ~]# heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+---------------+----------------------+----------------------+
| id                                   | stack_name | stack_status  | creation_time        | updated_time         |
+--------------------------------------+------------+---------------+----------------------+----------------------+
| 693af527-5d28-479b-bab9-5b2510d80d80 | overcloud  | UPDATE_FAILED | 2017-03-25T22:03:59Z | 2017-04-25T18:21:53Z |
+--------------------------------------+------------+---------------+----------------------+----------------------+
[root@undercloud-1 ~]# 


Let me try to reproduce this and attach logs.

Comment 4 Andreas Karis 2017-04-25 19:57:16 UTC
As always, I cannot recreate this when I need it. In my lab it does now behave well.

However, I hit this today at a (restricted, so it will be hard to get the logs) customer environment. The next time that I hit this issue, I am going to update this bugzilla.

Comment 6 Mike McClure 2018-07-25 15:34:34 UTC
*** Bug 1608022 has been marked as a duplicate of this bug. ***

Comment 15 Alex McLeod 2018-09-03 07:58:35 UTC
Hi there,

If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field.

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Thanks,
Alex

Comment 17 errata-xmlrpc 2018-09-17 16:57:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2716