Bug 1414779

Summary:

[UPDATES] ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment"

Product:

Red Hat OpenStack

Reporter:

Yurii Prokulevych <yprokule>

Component:

openstack-heat

Assignee:

Zane Bitter <zbitter>

Status:

CLOSED ERRATA

QA Contact:

Yurii Prokulevych <yprokule>

Severity:

high

Docs Contact:

Priority:

high

Version:

11.0 (Ocata)

CC:

aschultz, cpaquin, cwolfe, ddomingo, jcoufal, jschluet, lbezdick, mburns, mcornea, pneedle, radoslaw.smigielski, ramishra, randy_perryman, rhel-osp-director-maint, sathlang, sbaker, sclewis, shardy, srevivo, therve, yprokule, zbitter

Target Milestone:

Keywords:

Triaged

Target Release:

11.0 (Ocata)

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

openstack-heat-8.0.0-6.el7ost

Doc Type:

Bug Fix

Doc Text:

Previously, when a pre-update hook was set on a resource that was in a FAILED state, the Orchestration service recorded an event indicating the hook was active. The service would then immediately create a replacement resource without waiting for the hook to be cleared by the user. As a result, the tripleoclient service believed the hook to be pending (based on the event), but fail upon trying to clear it as the replacement resource did not have a hook set. This, in turn, prevented the director from completing an overcloud update with the following message: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" This also affected other client-side applications that used hooks. In the director, this could have also resulted in UpdateDeployment executing on two Controller nodes simultaneously, instead of serialized so that only one Controller is updated at a time. With this release, the Orchestration service now pauses until the hook is cleared by the user, regardless of the state of the resource. This allows director overcloud updates to complete even when there is an UpdateDeployment resource in a FAILED state.

Story Points:

---

Clone Of:

Clones:

1428845 1428877 1428879 (view as bug list)

Environment:

Last Closed:

2017-05-17 19:40:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1394025, 1428845, 1428877, 1428879

Attachments:

Description	Flags
heat-event-list-output	none

Description Yurii Prokulevych 2017-01-19 12:27:35 UTC

Description of problem:
-----------------------
Error message during RHOS-11 overcloud update:

Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING
completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1']
failed to remove breakpoint on overcloud-cephstorage-1: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [7070cb7f-4477-479f-aa0d-78e508fcdec2] Stack "overcloud-CephStorage-fht6p7lfa3i6-1-pepcndpf46za" [e1321b04-4298-4ef9-8b3f-a2a27e4d6309]
Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING
completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1']
failed to remove breakpoint on overcloud-cephstorage-1: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [7070cb7f-4477-479f-aa0d-78e508fcdec2] Stack "overcloud-CephStorage-fht6p7lfa3i6-1-pepcndpf46za" [e1321b04-4298-4ef9-8b3f-a2a27e4d6309]
Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING
completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1']
failed to remove breakpoint on overcloud-cephstorage-1: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [7070cb7f-4477-479f-aa0d-78e508fcdec2] Stack "overcloud-CephStorage-fht6p7lfa3i6-1-pepcndpf46za" [e1321b04-4298-4ef9-8b3f-a2a27e4d6309]
Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING
completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0']
on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1']


Version-Release number of selected component (if applicable):
----------------------------------
openstack-heat-templates-0.0.1-0.20170109231310.01b1768.el7ost.noarch
openstack-tripleo-heat-templates-6.0.0-0.20170116025719.fa45e05.el7ost.noarch

Steps to Reproduce:
1. Setup latest repos on overcloud nodes
2. Run 'openstack overcloud update stack -i overcloud'

Additional info:
----------------
Virtual setup: 3controllers + 1compute + 3ceph

Comment 1 Chris Paquin 2017-01-31 22:59:53 UTC

also seeing this same error on rhel 7.3, osp 8 when attempting to perform a minor update

Comment 2 Zane Bitter 2017-01-31 23:13:54 UTC

'heat hook-poll -n3 overcloud' shows:

+------------------+--------------------------------------+------------------------------------------------+-----------------+---------------------+--------------------------------------------------+
| resource_name    | id                                   | resource_status_reason                         | resource_status | event_time          | stack_name                                       |
+------------------+--------------------------------------+------------------------------------------------+-----------------+---------------------+--------------------------------------------------+
| UpdateDeployment | 0b664e55-909f-45aa-b71a-b5a76f59699e | UPDATE paused until Hook pre-update is cleared | CREATE_FAILED   | 2017-01-31T22:08:51 | overcloud-Controller-rdppxbh4bla3-1-qjgvcvnvfbyb |
+------------------+--------------------------------------+------------------------------------------------+-----------------+---------------------+--------------------------------------------------+

but when clearing the hook we see:

ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [98363e1a-78e6-44f5-887e-4df05d07eb0b] Stack "overcloud-Controller-rdppxbh4b1a3-1-qjgvcvnvfbyb" [b34c7474-0247-418d-b591-2c241144749b]

Comment 3 Zane Bitter 2017-02-01 00:00:39 UTC

Trivia: when the resource state is FAILED, the hook gets set in the database but Heat does *not* wait.

Oddly, we're seeing the opposite here: the state is indeed FAILED, but Heat is waiting, even though when we try to clear the hook we're told it does not exist in the database.

So a possible scenario is:

- Heat sets the hook in the DB and creates an event, but does not wait for it to be cleared due to being in the FAILED state.
- Heat starts creating a replacement resource (also due to being in a FAILED state).
- The replacement resource has no hooks set, so clearing the hook fails even though it appears to exist based on the event list (which hook-poll uses).
- The replacement deployment never succeeds, due to some problem on the server that caused it to be in the FAILED state in the first place (in this instance we were seeing strange messages from os-collect-config).
- Eventually the whole stack times out, but in the meantime everything looks frozen.

Looking at the event list should help to confirm this.

Comment 4 Chris Paquin 2017-02-01 01:16:30 UTC

Created attachment 1246503 [details]
heat-event-list-output

Comment 5 Chris Paquin 2017-02-01 01:16:51 UTC

See attachment. Thanks.

Comment 6 Zane Bitter 2017-02-01 05:37:35 UTC

Looking at only the last few events for the UpdateDeployment in overcloud-Controller-rdppxbh4b1a3-1-qjgvcvnvfbyb:

Engine went down during resource CREATE        | CREATE_FAILED      | 22:07:05
UPDATE paused until Hook pre-update is cleared | CREATE_FAILED      | 22:08:51
state changed                                  | CREATE_IN_PROGRESS | 22:08:53
Signal: deployment succeeded                   | SIGNAL_IN_PROGRESS | 23:54:33
state changed                                  | CREATE_COMPLETE    | 23:54:34
Unknown                                        | SIGNAL_COMPLETE    | 00:01:01

So it was in CREATE_FAILED due to the engine restart during the previous update. The hook was recorded in the database, but Heat immediately proceeded to creating a replacement because it was in the FAILED state (I'd consider this a bug, because the point of using breakpoints here is that we don't want the UpdateDeployment happening on two controllers simultaneously). Eventually the create actually succeeded(!), but it took 1 3/4 hours.

Looking at the rest of the log, it appears that everything proceeded normally after that until heat-engine was restarted shortly afterwards (at 00:06:04).

Comment 7 Chris Paquin 2017-02-01 14:41:52 UTC

Zane - does this mean that we should be able to kick off the update and expect not to run into the hook issue?

Comment 8 Chris Paquin 2017-02-01 14:42:06 UTC

Zane - does this mean that we should be able to kick off the update and expect not to run into the hook issue?

Comment 9 Zane Bitter 2017-02-01 15:44:29 UTC

(In reply to Chris Paquin from comment #8)
> Zane - does this mean that we should be able to kick off the update and
> expect not to run into the hook issue?

Yes.

Comment 11 Sofer Athlan-Guyot 2017-03-21 17:29:07 UTC

Hi Yurii,

upstream has merged in stable/ocata, moving to POST.

Comment 20 Yurii Prokulevych 2017-04-07 13:41:50 UTC

Verified with:
openstack-heat-engine-8.0.0-7.el7ost.noarch
openstack-heat-api-cfn-8.0.0-7.el7ost.noarch
openstack-heat-common-8.0.0-7.el7ost.noarch
openstack-heat-api-8.0.0-7.el7ost.noarch

openstack stack list
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| a906c7a0-3e20-49f4-acfc-585dfefe9452 | overcloud  | UPDATE_COMPLETE | 2017-04-07T08:08:02Z | 2017-04-07T10:58:01Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+

nova list
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 469c7b43-b4e8-44fb-8c2c-3e41da55aa3f | ceph-0       | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| 4f397ab2-8e9d-429c-b839-930bfe506b22 | ceph-1       | ACTIVE | -          | Running     | ctlplane=192.168.24.19 |
| e66945fa-9441-4815-a00d-259bedb9f34e | ceph-2       | ACTIVE | -          | Running     | ctlplane=192.168.24.12 |
| 3e7ce3c9-1b1b-4869-adf7-11968cfcf549 | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.7  |
| 038bb09b-4a0a-4d84-adf8-2f465a4d16a9 | compute-1    | ACTIVE | -          | Running     | ctlplane=192.168.24.9  |
| 409fd440-527c-46f0-9295-71d0e0810493 | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.22 |
| 3635c8e6-ecbb-4d23-9f23-cd1dc52d868b | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.15 |
| 5f35c248-a1b4-473b-be2a-e991f1a44c70 | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.16 |
| 6df65fb3-fcaa-4c59-b8b2-68e3af0aca1c | galera-0     | ACTIVE | -          | Running     | ctlplane=192.168.24.24 |
| 9d75252c-c6e9-4a66-9bbd-6c22db341580 | galera-1     | ACTIVE | -          | Running     | ctlplane=192.168.24.17 |
| b7dc4b7c-0c9f-4e94-b9b4-d403b47b3063 | galera-2     | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
| 4626ab11-0983-4991-bd1d-c6691908a3fe | messaging-0  | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| f931298e-4927-4bee-bcad-296d0da5bb92 | messaging-1  | ACTIVE | -          | Running     | ctlplane=192.168.24.23 |
| c0f4407c-89f5-4e25-8c0c-129f6698cc9c | messaging-2  | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| d8587085-ddf8-40d7-8236-b8923f7ef0ff | networker-0  | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| a39935f4-76fe-453d-9a33-de5657582473 | networker-1  | ACTIVE | -          | Running     | ctlplane=192.168.24.20 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+

Comment 21 Randy Perryman 2017-04-14 12:33:30 UTC

Was this backported to OSP 8, 9 and 10? if so what are the Bug Numbers?

Comment 22 Thomas Hervé 2017-04-14 19:46:09 UTC

This was backported, and you can find the bug numbers in the clones list.

Comment 24 errata-xmlrpc 2017-05-17 19:40:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245

Comment 25 Radosław Śmigielski 2017-06-06 12:00:22 UTC

I am still on OSP 8 and doing minor upgrade from 8.0 to latest available 8.0. I am also subscribed to rhel-7-server-openstack-8-director-rpms, rhel-7-server-openstack-8-rpms repos.
So the fix suppose to be in openstack-heat-5.0.3-2.el7ost but the latest I see in official repository are:

[stack@undercloud ~]$ rpm -qa | grep   openstack-heat
openstack-heat-api-5.0.1-9.el7ost.noarch
openstack-heat-engine-5.0.1-9.el7ost.noarch
openstack-heat-api-cloudwatch-5.0.1-9.el7ost.noarch
openstack-heat-api-cfn-5.0.1-9.el7ost.noarch
openstack-heat-templates-0-0.1.20151019.el7ost.noarch
openstack-heat-common-5.0.1-9.el7ost.noarch

So was this fix released for OSP 8.0 too? or maybe it's been forgotten and hasn't been added to official OSP 8.0 repos rhel-7-server-openstack-8-director-rpms, rhel-7-server-openstack-8-rpms?

Comment 26 Zane Bitter 2017-06-06 13:05:30 UTC

(In reply to Radosław Śmigielski from comment #25)
> So was this fix released for OSP 8.0 too? or maybe it's been forgotten and
> hasn't been added to official OSP 8.0 repos
> rhel-7-server-openstack-8-director-rpms, rhel-7-server-openstack-8-rpms?

The bugzilla for this issue in OSP 8 is bug 1428845; it hasn't been released yet.