Red Hat Bugzilla – Bug 1267558
Breakpoints are not deleted after stack-update operation
Last modified: 2018-02-08 05:57:42 EST
Description of problem: If any breakpoints are set on overcloud stack (by running "openstack overcloud update stack") and then this operation fails and user continues by any other stack-update operation, then breakpoints are still present. This is a new behavior introduced/uncovered by re-using heat stack existing environment - all CLI commands now don't send new environment when updating stack but only bits which have changed. A solution would be to explicitly clear existing breakpoints when running any stack-update CLI command. This is not optimal though because then knowledge about command-specific heat params is spanned across all CLI params. Also this is more general problem because same situation may happen with any other heat params set by CLI commands (e.g. when deleting a particular node, RemovalPolicies param is used).
Just to clarify, the issue here is not so much that the breakpoints remain set across operations (they don't) but that the breakpoints, which we use as a temporary thing to apply to the current operation, are configured in the environment and the environment is now maintained between operations instead of re-sent each time.
Jan confirmed that the way we set the breakpoints now is to generate a snippet of JSON that gets merged into the environment file to send. One option to stop this from sticking around would be for all other commands (i.e. the ones that don't want breakpoints) to generate a similar snippet with *no* breakpoints set and merge that into the environment, so that it overrides any stored breakpoint configuration. This isn't a great long-term solution because it means that every command has to know about every other command's use of breakpoints. However, we agreed that this is probably the best short-term solution.
I was able to update my overcloud successfully with: openstack overcloud update stack --templates -e <yaml> -i overcloud. Then, when I tried to re-run the overcloud deployment command (only without the yaml files) - it got stuck. I see the following for a long time: heat resource-list -n 5 overcloud|grep -v COMPLE +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | Controller | deba9153-d96c-4fae-8061-eb2cbe5ce390 | OS::Heat::ResourceGroup | UPDATE_IN_PROGRESS | 2015-10-01T13:33:14Z | | | 2 | 33504937-1746-434a-bb58-b76b8923bb81 | OS::TripleO::Controller | UPDATE_IN_PROGRESS | 2015-10-01T13:33:21Z | Controller | | Compute | 1f8c1164-ddaf-4ff7-981a-e9c3c61c7094 | OS::Heat::ResourceGroup | UPDATE_IN_PROGRESS | 2015-10-01T13:33:21Z | | | 0 | bd529704-fec6-4de5-857f-ceada7c21d78 | OS::TripleO::Compute | UPDATE_IN_PROGRESS | 2015-10-01T13:33:24Z | Compute | | 0 | 8ad0cb1b-efbc-4cda-9cde-f7b7f932a24f | OS::TripleO::Controller | UPDATE_IN_PROGRESS | 2015-10-01T13:33:29Z | Controller | | 1 | 3ab44c19-87f6-4a4e-8702-be08b222698b | OS::TripleO::Controller | UPDATE_IN_PROGRESS | 2015-10-01T13:33:40Z | Controller | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
This is starting to look like a y1 blocker.
This bug will manifest itself on every operation that does a Heat stack update (e.g. a subsequent overcloud deploy, scaling up or down, removing a node) after a package update. The workaround for now is to manually clear the breakpoints (hooks) using the "heat clear-hook" command. This will have to be repeated each time.
*** Bug 1268252 has been marked as a duplicate of this bug. ***
https://code.engineering.redhat.com/gerrit/59752
*** Bug 1279544 has been marked as a duplicate of this bug. ***
We can see that the breakpoints get cleared during the update process: stack@instack:~>>> heat hook-poll -n5 overcloud +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ | resource_name | id | resource_status_reason | resource_status | event_time | stack_name | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ | UpdateDeployment | 183b4a66-4521-4c70-90b9-4733ce017b3d | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:44:21Z | overcloud-Controller-gm34wwhcni7u-0-sm4gdmxjfesg | | UpdateDeployment | 409c6074-25d1-4901-97e8-689e81f61e62 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:58Z | overcloud-Controller-gm34wwhcni7u-1-zgqpydv2oawr | | UpdateDeployment | c8ed8054-4ebd-40a5-9d62-b665f3a1b754 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:46Z | overcloud-CephStorage-n3ft7va6txum-2-zfayegy6nz5u | | UpdateDeployment | 19fcc094-9b79-46de-8fca-2c2a39cf61cb | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:39Z | overcloud-CephStorage-n3ft7va6txum-1-dmkkixd5whud | | UpdateDeployment | 0d0fa35e-6bdd-4e3f-81b0-43fdcb43bd66 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:49Z | overcloud-Compute-ia5kvmciy4x2-0-se7fq5l3alx5 | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ stack@instack:~>>> stack@instack:~>>> stack@instack:~>>> heat hook-poll -n5 overcloud +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ | resource_name | id | resource_status_reason | resource_status | event_time | stack_name | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ | UpdateDeployment | 183b4a66-4521-4c70-90b9-4733ce017b3d | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:44:21Z | overcloud-Controller-gm34wwhcni7u-0-sm4gdmxjfesg | | UpdateDeployment | 409c6074-25d1-4901-97e8-689e81f61e62 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:58Z | overcloud-Controller-gm34wwhcni7u-1-zgqpydv2oawr | | UpdateDeployment | c8ed8054-4ebd-40a5-9d62-b665f3a1b754 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:46Z | overcloud-CephStorage-n3ft7va6txum-2-zfayegy6nz5u | | UpdateDeployment | 0d0fa35e-6bdd-4e3f-81b0-43fdcb43bd66 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:49Z | overcloud-Compute-ia5kvmciy4x2-0-se7fq5l3alx5 | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ stack@instack:~>>> heat hook-poll -n5 overcloud +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ | resource_name | id | resource_status_reason | resource_status | event_time | stack_name | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ | UpdateDeployment | 183b4a66-4521-4c70-90b9-4733ce017b3d | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:44:21Z | overcloud-Controller-gm34wwhcni7u-0-sm4gdmxjfesg | | UpdateDeployment | c8ed8054-4ebd-40a5-9d62-b665f3a1b754 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:46Z | overcloud-CephStorage-n3ft7va6txum-2-zfayegy6nz5u | | UpdateDeployment | 0d0fa35e-6bdd-4e3f-81b0-43fdcb43bd66 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:49Z | overcloud-Compute-ia5kvmciy4x2-0-se7fq5l3alx5 | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ stack@instack:~>>> heat hook-poll -n5 overcloud +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ | resource_name | id | resource_status_reason | resource_status | event_time | stack_name | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ | UpdateDeployment | 183b4a66-4521-4c70-90b9-4733ce017b3d | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:44:21Z | overcloud-Controller-gm34wwhcni7u-0-sm4gdmxjfesg | | UpdateDeployment | c8ed8054-4ebd-40a5-9d62-b665f3a1b754 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:46Z | overcloud-CephStorage-n3ft7va6txum-2-zfayegy6nz5u | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+ stack@instack:~>>> heat hook-poll -n5 overcloud +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+--------------------------------------------------+ | resource_name | id | resource_status_reason | resource_status | event_time | stack_name | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+--------------------------------------------------+ | UpdateDeployment | 183b4a66-4521-4c70-90b9-4733ce017b3d | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:44:21Z | overcloud-Controller-gm34wwhcni7u-0-sm4gdmxjfesg | +------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+--------------------------------------------------+
I don't think the verification above is sufficient for this BZ. This BZ address the issue when "openstack overcloud update" fails which causes that some breakpoints are left on stack, then if some *other* command is executed on the stack in failed state, it would be hanging on the breakpoints which were left there from the update command. So the right verification of this BZ should be something like this: 1) run "openstack overcloud update" and cause it fails on e.g. first node 2) check with "heat hook-poll -n5 overcloud" that some breakpoints were left on the stack 3) run "openstack overcloud deploy" (or "openstack overcloud node delete") - if this BZ is fixed, breakpoints should be cleared when you run this command, if it's not fixed, this command will be hanging and you will see breakpoints still set on the stack with "heat hook-poll -n5 overcloud"
(In reply to Jan Provaznik from comment #16) > I don't think the verification above is sufficient for this BZ. This BZ > address the issue when "openstack overcloud update" fails which causes that > some breakpoints are left on stack, then if some *other* command is executed > on the stack in failed state, it would be hanging on the breakpoints which > were left there from the update command. So the right verification of this > BZ should be something like this: > 1) run "openstack overcloud update" and cause it fails on e.g. first node > 2) check with "heat hook-poll -n5 overcloud" that some breakpoints were left > on the stack > 3) run "openstack overcloud deploy" (or "openstack overcloud node delete") - > if this BZ is fixed, breakpoints should be cleared when you run this > command, if it's not fixed, this command will be hanging and you will see > breakpoints still set on the stack with "heat hook-poll -n5 overcloud" 1. I triggered an update that failed on the first node of the stack overcloud | UPDATE_FAILED 2. I couldn't check the breakpoints status because at this point: Stack status UPDATE_FAILED not IN_PROGRESS 3. I reran the initial deploy command and checking the breakpoints the list shows empty: stack@instack:~>>> heat hook-poll -n5 overcloud +----+------------------------+-----------------+------------+------------+ | id | resource_status_reason | resource_status | event_time | stack_name | +----+------------------------+-----------------+------------+------------+ +----+------------------------+-----------------+------------+------------+ But the deploy command fails: stack@instack:~>>> time bash deploy.command Deploying templates in the directory /home/stack/templates/my-overcloud Stack failed with status: resources.Compute: Stack overcloud-Compute-fpeywqgic2le already has an action (UPDATE) in progress. ERROR: openstack Heat Stack update failed. I'm not sure if this relates to the initial report of this bug or it's another issue (there are some heat stacks which are currently UPDATE_IN_PROGRESS). Can you confirm that the steps that I did were enough for the verification of this ticket? Thanks.
Hi Marius, I agree that the issue you hit is unrelated to this BZ and the fact that "heat hook-poll -n5 overcloud" after re-running the deploy command proves that this issue is fixed, thanks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:2650