Description of problem: ----------------------- Error message during RHOS-11 overcloud update: Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0'] on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1'] failed to remove breakpoint on overcloud-cephstorage-1: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [7070cb7f-4477-479f-aa0d-78e508fcdec2] Stack "overcloud-CephStorage-fht6p7lfa3i6-1-pepcndpf46za" [e1321b04-4298-4ef9-8b3f-a2a27e4d6309] Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0'] on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1'] failed to remove breakpoint on overcloud-cephstorage-1: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [7070cb7f-4477-479f-aa0d-78e508fcdec2] Stack "overcloud-CephStorage-fht6p7lfa3i6-1-pepcndpf46za" [e1321b04-4298-4ef9-8b3f-a2a27e4d6309] Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0'] on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1'] failed to remove breakpoint on overcloud-cephstorage-1: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [7070cb7f-4477-479f-aa0d-78e508fcdec2] Stack "overcloud-CephStorage-fht6p7lfa3i6-1-pepcndpf46za" [e1321b04-4298-4ef9-8b3f-a2a27e4d6309] Breakpoint reached, continue? Regexp or Enter=proceed (will clear 7070cb7f-4477-479f-aa0d-78e508fcdec2), no=cancel update, C-c=quit interactive mode: WAITING completed: [u'overcloud-cephstorage-0', u'overcloud-cephstorage-2', u'overcloud-controller-2', u'overcloud-novacompute-0', u'overcloud-controller-0'] on_breakpoint: [u'overcloud-controller-1', u'overcloud-cephstorage-1'] Version-Release number of selected component (if applicable): ---------------------------------- openstack-heat-templates-0.0.1-0.20170109231310.01b1768.el7ost.noarch openstack-tripleo-heat-templates-6.0.0-0.20170116025719.fa45e05.el7ost.noarch Steps to Reproduce: 1. Setup latest repos on overcloud nodes 2. Run 'openstack overcloud update stack -i overcloud' Additional info: ---------------- Virtual setup: 3controllers + 1compute + 3ceph
also seeing this same error on rhel 7.3, osp 8 when attempting to perform a minor update
'heat hook-poll -n3 overcloud' shows: +------------------+--------------------------------------+------------------------------------------------+-----------------+---------------------+--------------------------------------------------+ | resource_name | id | resource_status_reason | resource_status | event_time | stack_name | +------------------+--------------------------------------+------------------------------------------------+-----------------+---------------------+--------------------------------------------------+ | UpdateDeployment | 0b664e55-909f-45aa-b71a-b5a76f59699e | UPDATE paused until Hook pre-update is cleared | CREATE_FAILED | 2017-01-31T22:08:51 | overcloud-Controller-rdppxbh4bla3-1-qjgvcvnvfbyb | +------------------+--------------------------------------+------------------------------------------------+-----------------+---------------------+--------------------------------------------------+ but when clearing the hook we see: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [98363e1a-78e6-44f5-887e-4df05d07eb0b] Stack "overcloud-Controller-rdppxbh4b1a3-1-qjgvcvnvfbyb" [b34c7474-0247-418d-b591-2c241144749b]
Trivia: when the resource state is FAILED, the hook gets set in the database but Heat does *not* wait. Oddly, we're seeing the opposite here: the state is indeed FAILED, but Heat is waiting, even though when we try to clear the hook we're told it does not exist in the database. So a possible scenario is: - Heat sets the hook in the DB and creates an event, but does not wait for it to be cleared due to being in the FAILED state. - Heat starts creating a replacement resource (also due to being in a FAILED state). - The replacement resource has no hooks set, so clearing the hook fails even though it appears to exist based on the event list (which hook-poll uses). - The replacement deployment never succeeds, due to some problem on the server that caused it to be in the FAILED state in the first place (in this instance we were seeing strange messages from os-collect-config). - Eventually the whole stack times out, but in the meantime everything looks frozen. Looking at the event list should help to confirm this.
Created attachment 1246503 [details] heat-event-list-output
See attachment. Thanks.
Looking at only the last few events for the UpdateDeployment in overcloud-Controller-rdppxbh4b1a3-1-qjgvcvnvfbyb: Engine went down during resource CREATE | CREATE_FAILED | 22:07:05 UPDATE paused until Hook pre-update is cleared | CREATE_FAILED | 22:08:51 state changed | CREATE_IN_PROGRESS | 22:08:53 Signal: deployment succeeded | SIGNAL_IN_PROGRESS | 23:54:33 state changed | CREATE_COMPLETE | 23:54:34 Unknown | SIGNAL_COMPLETE | 00:01:01 So it was in CREATE_FAILED due to the engine restart during the previous update. The hook was recorded in the database, but Heat immediately proceeded to creating a replacement because it was in the FAILED state (I'd consider this a bug, because the point of using breakpoints here is that we don't want the UpdateDeployment happening on two controllers simultaneously). Eventually the create actually succeeded(!), but it took 1 3/4 hours. Looking at the rest of the log, it appears that everything proceeded normally after that until heat-engine was restarted shortly afterwards (at 00:06:04).
Zane - does this mean that we should be able to kick off the update and expect not to run into the hook issue?
(In reply to Chris Paquin from comment #8) > Zane - does this mean that we should be able to kick off the update and > expect not to run into the hook issue? Yes.
Hi Yurii, upstream has merged in stable/ocata, moving to POST.
Verified with: openstack-heat-engine-8.0.0-7.el7ost.noarch openstack-heat-api-cfn-8.0.0-7.el7ost.noarch openstack-heat-common-8.0.0-7.el7ost.noarch openstack-heat-api-8.0.0-7.el7ost.noarch openstack stack list +--------------------------------------+------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | a906c7a0-3e20-49f4-acfc-585dfefe9452 | overcloud | UPDATE_COMPLETE | 2017-04-07T08:08:02Z | 2017-04-07T10:58:01Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+ nova list +--------------------------------------+--------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------------+--------+------------+-------------+------------------------+ | 469c7b43-b4e8-44fb-8c2c-3e41da55aa3f | ceph-0 | ACTIVE | - | Running | ctlplane=192.168.24.10 | | 4f397ab2-8e9d-429c-b839-930bfe506b22 | ceph-1 | ACTIVE | - | Running | ctlplane=192.168.24.19 | | e66945fa-9441-4815-a00d-259bedb9f34e | ceph-2 | ACTIVE | - | Running | ctlplane=192.168.24.12 | | 3e7ce3c9-1b1b-4869-adf7-11968cfcf549 | compute-0 | ACTIVE | - | Running | ctlplane=192.168.24.7 | | 038bb09b-4a0a-4d84-adf8-2f465a4d16a9 | compute-1 | ACTIVE | - | Running | ctlplane=192.168.24.9 | | 409fd440-527c-46f0-9295-71d0e0810493 | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | 3635c8e6-ecbb-4d23-9f23-cd1dc52d868b | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.15 | | 5f35c248-a1b4-473b-be2a-e991f1a44c70 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.16 | | 6df65fb3-fcaa-4c59-b8b2-68e3af0aca1c | galera-0 | ACTIVE | - | Running | ctlplane=192.168.24.24 | | 9d75252c-c6e9-4a66-9bbd-6c22db341580 | galera-1 | ACTIVE | - | Running | ctlplane=192.168.24.17 | | b7dc4b7c-0c9f-4e94-b9b4-d403b47b3063 | galera-2 | ACTIVE | - | Running | ctlplane=192.168.24.18 | | 4626ab11-0983-4991-bd1d-c6691908a3fe | messaging-0 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | f931298e-4927-4bee-bcad-296d0da5bb92 | messaging-1 | ACTIVE | - | Running | ctlplane=192.168.24.23 | | c0f4407c-89f5-4e25-8c0c-129f6698cc9c | messaging-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | d8587085-ddf8-40d7-8236-b8923f7ef0ff | networker-0 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | a39935f4-76fe-453d-9a33-de5657582473 | networker-1 | ACTIVE | - | Running | ctlplane=192.168.24.20 | +--------------------------------------+--------------+--------+------------+-------------+------------------------+
Was this backported to OSP 8, 9 and 10? if so what are the Bug Numbers?
This was backported, and you can find the bug numbers in the clones list.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245
I am still on OSP 8 and doing minor upgrade from 8.0 to latest available 8.0. I am also subscribed to rhel-7-server-openstack-8-director-rpms, rhel-7-server-openstack-8-rpms repos. So the fix suppose to be in openstack-heat-5.0.3-2.el7ost but the latest I see in official repository are: [stack@undercloud ~]$ rpm -qa | grep openstack-heat openstack-heat-api-5.0.1-9.el7ost.noarch openstack-heat-engine-5.0.1-9.el7ost.noarch openstack-heat-api-cloudwatch-5.0.1-9.el7ost.noarch openstack-heat-api-cfn-5.0.1-9.el7ost.noarch openstack-heat-templates-0-0.1.20151019.el7ost.noarch openstack-heat-common-5.0.1-9.el7ost.noarch So was this fix released for OSP 8.0 too? or maybe it's been forgotten and hasn't been added to official OSP 8.0 repos rhel-7-server-openstack-8-director-rpms, rhel-7-server-openstack-8-rpms?
(In reply to Radosław Śmigielski from comment #25) > So was this fix released for OSP 8.0 too? or maybe it's been forgotten and > hasn't been added to official OSP 8.0 repos > rhel-7-server-openstack-8-director-rpms, rhel-7-server-openstack-8-rpms? The bugzilla for this issue in OSP 8 is bug 1428845; it hasn't been released yet.