Bug 1414779 - [UPDATES] ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment"
Summary: [UPDATES] ERROR: The "pre-update" hook is not defined on SoftwareDeployment "...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 11.0 (Ocata)
Assignee: Zane Bitter
QA Contact: Yurii Prokulevych
URL:
Whiteboard:
Depends On:
Blocks: 1394025 1428845 1428877 1428879
TreeView+ depends on / blocked
 
Reported: 2017-01-19 12:27 UTC by Yurii Prokulevych
Modified: 2023-02-22 23:02 UTC (History)
22 users (show)

Fixed In Version: openstack-heat-8.0.0-6.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, when a pre-update hook was set on a resource that was in a FAILED state, the Orchestration service recorded an event indicating the hook was active. The service would then immediately create a replacement resource without waiting for the hook to be cleared by the user. As a result, the tripleoclient service believed the hook to be pending (based on the event), but fail upon trying to clear it as the replacement resource did not have a hook set. This, in turn, prevented the director from completing an overcloud update with the following message: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" This also affected other client-side applications that used hooks. In the director, this could have also resulted in UpdateDeployment executing on two Controller nodes simultaneously, instead of serialized so that only one Controller is updated at a time. With this release, the Orchestration service now pauses until the hook is cleared by the user, regardless of the state of the resource. This allows director overcloud updates to complete even when there is an UpdateDeployment resource in a FAILED state.
Clone Of:
: 1428845 1428877 1428879 (view as bug list)
Environment:
Last Closed: 2017-05-17 19:40:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
heat-event-list-output (1.06 MB, application/x-gzip)
2017-02-01 01:16 UTC, Chris Paquin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1665699 0 None None None 2017-02-17 16:35:41 UTC
OpenStack gerrit 441972 0 'None' MERGED Still wait for hooks on failed resources 2020-09-12 01:38:53 UTC
Red Hat Knowledge Base (Solution) 3113321 0 None None None 2017-07-13 08:56:35 UTC
Red Hat Product Errata RHEA-2017:1245 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory 2017-05-17 23:01:50 UTC

Description Yurii Prokulevych 2017-01-19 12:27:35 UTC
Description of problem:
-----------------------
Error message during RHOS-11 overcloud update:

Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING
completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1']
failed to remove breakpoint on overcloud-cephstorage-1: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [7070cb7f-4477-479f-aa0d-78e508fcdec2] Stack "overcloud-CephStorage-fht6p7lfa3i6-1-pepcndpf46za" [e1321b04-4298-4ef9-8b3f-a2a27e4d6309]
Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING
completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1']
failed to remove breakpoint on overcloud-cephstorage-1: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [7070cb7f-4477-479f-aa0d-78e508fcdec2] Stack "overcloud-CephStorage-fht6p7lfa3i6-1-pepcndpf46za" [e1321b04-4298-4ef9-8b3f-a2a27e4d6309]
Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING
completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1']
failed to remove breakpoint on overcloud-cephstorage-1: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [7070cb7f-4477-479f-aa0d-78e508fcdec2] Stack "overcloud-CephStorage-fht6p7lfa3i6-1-pepcndpf46za" [e1321b04-4298-4ef9-8b3f-a2a27e4d6309]
Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING
completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1']


Version-Release number of selected component (if applicable):
----------------------------------
openstack-heat-templates-0.0.1-0.20170109231310.01b1768.el7ost.noarch
openstack-tripleo-heat-templates-6.0.0-0.20170116025719.fa45e05.el7ost.noarch

Steps to Reproduce:
1. Setup latest repos on overcloud nodes
2. Run 'openstack overcloud update stack -i overcloud'

Additional info:
----------------
Virtual setup: 3controllers + 1compute + 3ceph

Comment 1 Chris Paquin 2017-01-31 22:59:53 UTC
also seeing this same error on rhel 7.3, osp 8 when attempting to perform a minor update

Comment 2 Zane Bitter 2017-01-31 23:13:54 UTC
'heat hook-poll -n3 overcloud' shows:

+------------------+--------------------------------------+------------------------------------------------+-----------------+---------------------+--------------------------------------------------+
| resource_name    | id                                   | resource_status_reason                         | resource_status | event_time          | stack_name                                       |
+------------------+--------------------------------------+------------------------------------------------+-----------------+---------------------+--------------------------------------------------+
| UpdateDeployment | 0b664e55-909f-45aa-b71a-b5a76f59699e | UPDATE paused until Hook pre-update is cleared | CREATE_FAILED   | 2017-01-31T22:08:51 | overcloud-Controller-rdppxbh4bla3-1-qjgvcvnvfbyb |
+------------------+--------------------------------------+------------------------------------------------+-----------------+---------------------+--------------------------------------------------+

but when clearing the hook we see:

ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [98363e1a-78e6-44f5-887e-4df05d07eb0b] Stack "overcloud-Controller-rdppxbh4b1a3-1-qjgvcvnvfbyb" [b34c7474-0247-418d-b591-2c241144749b]

Comment 3 Zane Bitter 2017-02-01 00:00:39 UTC
Trivia: when the resource state is FAILED, the hook gets set in the database but Heat does *not* wait.

Oddly, we're seeing the opposite here: the state is indeed FAILED, but Heat is waiting, even though when we try to clear the hook we're told it does not exist in the database.

So a possible scenario is:

- Heat sets the hook in the DB and creates an event, but does not wait for it to be cleared due to being in the FAILED state.
- Heat starts creating a replacement resource (also due to being in a FAILED state).
- The replacement resource has no hooks set, so clearing the hook fails even though it appears to exist based on the event list (which hook-poll uses).
- The replacement deployment never succeeds, due to some problem on the server that caused it to be in the FAILED state in the first place (in this instance we were seeing strange messages from os-collect-config).
- Eventually the whole stack times out, but in the meantime everything looks frozen.

Looking at the event list should help to confirm this.

Comment 4 Chris Paquin 2017-02-01 01:16:30 UTC
Created attachment 1246503 [details]
heat-event-list-output

Comment 5 Chris Paquin 2017-02-01 01:16:51 UTC
See attachment. Thanks.

Comment 6 Zane Bitter 2017-02-01 05:37:35 UTC
Looking at only the last few events for the UpdateDeployment in overcloud-Controller-rdppxbh4b1a3-1-qjgvcvnvfbyb:

Engine went down during resource CREATE        | CREATE_FAILED      | 22:07:05
UPDATE paused until Hook pre-update is cleared | CREATE_FAILED      | 22:08:51
state changed                                  | CREATE_IN_PROGRESS | 22:08:53
Signal: deployment succeeded                   | SIGNAL_IN_PROGRESS | 23:54:33
state changed                                  | CREATE_COMPLETE    | 23:54:34
Unknown                                        | SIGNAL_COMPLETE    | 00:01:01

So it was in CREATE_FAILED due to the engine restart during the previous update. The hook was recorded in the database, but Heat immediately proceeded to creating a replacement because it was in the FAILED state (I'd consider this a bug, because the point of using breakpoints here is that we don't want the UpdateDeployment happening on two controllers simultaneously). Eventually the create actually succeeded(!), but it took 1 3/4 hours.

Looking at the rest of the log, it appears that everything proceeded normally after that until heat-engine was restarted shortly afterwards (at 00:06:04).

Comment 7 Chris Paquin 2017-02-01 14:41:52 UTC
Zane - does this mean that we should be able to kick off the update and expect not to run into the hook issue?

Comment 8 Chris Paquin 2017-02-01 14:42:06 UTC
Zane - does this mean that we should be able to kick off the update and expect not to run into the hook issue?

Comment 9 Zane Bitter 2017-02-01 15:44:29 UTC
(In reply to Chris Paquin from comment #8)
> Zane - does this mean that we should be able to kick off the update and
> expect not to run into the hook issue?

Yes.

Comment 11 Sofer Athlan-Guyot 2017-03-21 17:29:07 UTC
Hi Yurii,

upstream has merged in stable/ocata, moving to POST.

Comment 20 Yurii Prokulevych 2017-04-07 13:41:50 UTC
Verified with:
openstack-heat-engine-8.0.0-7.el7ost.noarch
openstack-heat-api-cfn-8.0.0-7.el7ost.noarch
openstack-heat-common-8.0.0-7.el7ost.noarch
openstack-heat-api-8.0.0-7.el7ost.noarch

openstack stack list
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| a906c7a0-3e20-49f4-acfc-585dfefe9452 | overcloud  | UPDATE_COMPLETE | 2017-04-07T08:08:02Z | 2017-04-07T10:58:01Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+

nova list
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 469c7b43-b4e8-44fb-8c2c-3e41da55aa3f | ceph-0       | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| 4f397ab2-8e9d-429c-b839-930bfe506b22 | ceph-1       | ACTIVE | -          | Running     | ctlplane=192.168.24.19 |
| e66945fa-9441-4815-a00d-259bedb9f34e | ceph-2       | ACTIVE | -          | Running     | ctlplane=192.168.24.12 |
| 3e7ce3c9-1b1b-4869-adf7-11968cfcf549 | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.7  |
| 038bb09b-4a0a-4d84-adf8-2f465a4d16a9 | compute-1    | ACTIVE | -          | Running     | ctlplane=192.168.24.9  |
| 409fd440-527c-46f0-9295-71d0e0810493 | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| 3635c8e6-ecbb-4d23-9f23-cd1dc52d868b | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.15 |
| 5f35c248-a1b4-473b-be2a-e991f1a44c70 | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.16 |
| 6df65fb3-fcaa-4c59-b8b2-68e3af0aca1c | galera-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.24 |
| 9d75252c-c6e9-4a66-9bbd-6c22db341580 | galera-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.17 |
| b7dc4b7c-0c9f-4e94-b9b4-d403b47b3063 | galera-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
| 4626ab11-0983-4991-bd1d-c6691908a3fe | messaging-0  | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| f931298e-4927-4bee-bcad-296d0da5bb92 | messaging-1  | ACTIVE | -          | Running     | ctlplane=192.168.24.23 |
| c0f4407c-89f5-4e25-8c0c-129f6698cc9c | messaging-2  | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| d8587085-ddf8-40d7-8236-b8923f7ef0ff | networker-0  | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| a39935f4-76fe-453d-9a33-de5657582473 | networker-1  | ACTIVE | -          | Running     | ctlplane=192.168.24.20 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+

Comment 21 Randy Perryman 2017-04-14 12:33:30 UTC
Was this backported to OSP 8, 9 and 10? if so what are the Bug Numbers?

Comment 22 Thomas Hervé 2017-04-14 19:46:09 UTC
This was backported, and you can find the bug numbers in the clones list.

Comment 24 errata-xmlrpc 2017-05-17 19:40:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245

Comment 25 Radosław Śmigielski 2017-06-06 12:00:22 UTC
I am still on OSP 8 and doing minor upgrade from 8.0 to latest available 8.0. I am also subscribed to rhel-7-server-openstack-8-director-rpms, rhel-7-server-openstack-8-rpms repos.
So the fix suppose to be in openstack-heat-5.0.3-2.el7ost but the latest I see in official repository are:

[stack@undercloud ~]$ rpm -qa | grep   openstack-heat
openstack-heat-api-5.0.1-9.el7ost.noarch
openstack-heat-engine-5.0.1-9.el7ost.noarch
openstack-heat-api-cloudwatch-5.0.1-9.el7ost.noarch
openstack-heat-api-cfn-5.0.1-9.el7ost.noarch
openstack-heat-templates-0-0.1.20151019.el7ost.noarch
openstack-heat-common-5.0.1-9.el7ost.noarch

So was this fix released for OSP 8.0 too? or maybe it's been forgotten and hasn't been added to official OSP 8.0 repos rhel-7-server-openstack-8-director-rpms, rhel-7-server-openstack-8-rpms?

Comment 26 Zane Bitter 2017-06-06 13:05:30 UTC
(In reply to Radosław Śmigielski from comment #25)
> So was this fix released for OSP 8.0 too? or maybe it's been forgotten and
> hasn't been added to official OSP 8.0 repos
> rhel-7-server-openstack-8-director-rpms, rhel-7-server-openstack-8-rpms?

The bugzilla for this issue in OSP 8 is bug 1428845; it hasn't been released yet.


Note You need to log in before you can comment on or make changes to this bug.