Bug 1267558 - Breakpoints are not deleted after stack-update operation
Breakpoints are not deleted after stack-update operation
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-rdomanager-oscplugin (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
high Severity unspecified
: y2
: 7.0 (Kilo)
Assigned To: Jan Provaznik
Marius Cornea
: Triaged
: 1268252 1279544 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-30 08:06 EDT by Jan Provaznik
Modified: 2018-02-08 05:57 EST (History)
12 users (show)

See Also:
Fixed In Version: openstack-tripleo-common-0.0.1.dev6-4.git49b57eb.el7ost python-rdomanager-oscplugin-0.0.10-12.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, breakpoints were not removed when an update operation failed. If a user ran the "openstack overcloud update" command and it failed, it is possible that the subsequent stack-update command (e.g. "openstack overcloud deploy") might be stuck in the 'IN_PROGRESS' state waiting for the removal of breakpoints. With this update, all existing CLI commands explicitly remove any existing breakpoints when running a stack-update operation. As a result, the stack-update operations do not get stuck in the 'IN_PROGRESS' state while waiting for the breakpoint removal.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-21 11:49:41 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 230964 None None None Never
OpenStack gerrit 230969 None None None Never

  None (edit)
Description Jan Provaznik 2015-09-30 08:06:13 EDT
Description of problem:
If any breakpoints are set on overcloud stack (by running "openstack overcloud update stack") and then this operation fails and user continues by any other stack-update operation, then breakpoints are still present.

This is a new behavior introduced/uncovered by re-using heat stack existing environment - all CLI commands now don't send new environment when updating stack but only bits which have changed.

A solution would be to explicitly clear existing breakpoints when running any stack-update CLI command. This is not optimal though because then knowledge about command-specific heat params is spanned across all CLI params. Also this is more general problem because same situation may happen with any other heat params set by CLI commands (e.g. when deleting a particular node, RemovalPolicies param is used).
Comment 3 Zane Bitter 2015-10-01 08:58:09 EDT
Just to clarify, the issue here is not so much that the breakpoints remain set across operations (they don't) but that the breakpoints, which we use as a temporary thing to apply to the current operation, are configured in the environment and the environment is now maintained between operations instead of re-sent each time.
Comment 4 Zane Bitter 2015-10-01 09:27:23 EDT
Jan confirmed that the way we set the breakpoints now is to generate a snippet of JSON that gets merged into the environment file to send. One option to stop this from sticking around would be for all other commands (i.e. the ones that don't want breakpoints) to generate a similar snippet with *no* breakpoints set and merge that into the environment, so that it overrides any stored breakpoint configuration.

This isn't a great long-term solution because it means that every command has to know about every other command's use of breakpoints. However, we agreed that this is probably the best short-term solution.
Comment 5 Alexander Chuzhoy 2015-10-01 10:26:58 EDT
I was able to update my overcloud successfully with: openstack overcloud update stack --templates -e <yaml> -i overcloud.
Then, when I tried to re-run the overcloud deployment command (only without the yaml files) - it got stuck. I see the following for a long time:
heat resource-list -n 5 overcloud|grep -v COMPLE
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status    | updated_time         | parent_resource                             |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
| Controller                                  | deba9153-d96c-4fae-8061-eb2cbe5ce390          | OS::Heat::ResourceGroup                           | UPDATE_IN_PROGRESS | 2015-10-01T13:33:14Z |                                             |
| 2                                           | 33504937-1746-434a-bb58-b76b8923bb81          | OS::TripleO::Controller                           | UPDATE_IN_PROGRESS | 2015-10-01T13:33:21Z | Controller                                  |
| Compute                                     | 1f8c1164-ddaf-4ff7-981a-e9c3c61c7094          | OS::Heat::ResourceGroup                           | UPDATE_IN_PROGRESS | 2015-10-01T13:33:21Z |                                             |
| 0                                           | bd529704-fec6-4de5-857f-ceada7c21d78          | OS::TripleO::Compute                              | UPDATE_IN_PROGRESS | 2015-10-01T13:33:24Z | Compute                                     |
| 0                                           | 8ad0cb1b-efbc-4cda-9cde-f7b7f932a24f          | OS::TripleO::Controller                           | UPDATE_IN_PROGRESS | 2015-10-01T13:33:29Z | Controller                                  |
| 1                                           | 3ab44c19-87f6-4a4e-8702-be08b222698b          | OS::TripleO::Controller                           | UPDATE_IN_PROGRESS | 2015-10-01T13:33:40Z | Controller                                  |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
Comment 6 Zane Bitter 2015-10-01 10:28:11 EDT
This is starting to look like a y1 blocker.
Comment 7 Zane Bitter 2015-10-01 15:28:10 EDT
This bug will manifest itself on every operation that does a Heat stack update (e.g. a subsequent overcloud deploy, scaling up or down, removing a node) after a package update.

The workaround for now is to manually clear the breakpoints (hooks) using the "heat clear-hook" command. This will have to be repeated each time.
Comment 8 Jan Provaznik 2015-10-02 07:40:03 EDT
*** Bug 1268252 has been marked as a duplicate of this bug. ***
Comment 9 Jan Provaznik 2015-10-20 06:46:12 EDT
https://code.engineering.redhat.com/gerrit/59752
Comment 10 Zane Bitter 2015-11-09 12:51:54 EST
*** Bug 1279544 has been marked as a duplicate of this bug. ***
Comment 15 Marius Cornea 2015-12-12 18:37:42 EST
We can see that the breakpoints get cleared during the update process:

stack@instack:~>>> heat hook-poll -n5 overcloud
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
| resource_name    | id                                   | resource_status_reason                         | resource_status | event_time           | stack_name                                        |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
| UpdateDeployment | 183b4a66-4521-4c70-90b9-4733ce017b3d | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:44:21Z | overcloud-Controller-gm34wwhcni7u-0-sm4gdmxjfesg  |
| UpdateDeployment | 409c6074-25d1-4901-97e8-689e81f61e62 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:58Z | overcloud-Controller-gm34wwhcni7u-1-zgqpydv2oawr  |
| UpdateDeployment | c8ed8054-4ebd-40a5-9d62-b665f3a1b754 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:46Z | overcloud-CephStorage-n3ft7va6txum-2-zfayegy6nz5u |
| UpdateDeployment | 19fcc094-9b79-46de-8fca-2c2a39cf61cb | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:39Z | overcloud-CephStorage-n3ft7va6txum-1-dmkkixd5whud |
| UpdateDeployment | 0d0fa35e-6bdd-4e3f-81b0-43fdcb43bd66 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:49Z | overcloud-Compute-ia5kvmciy4x2-0-se7fq5l3alx5     |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
stack@instack:~>>> 
stack@instack:~>>> 
stack@instack:~>>> heat hook-poll -n5 overcloud
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
| resource_name    | id                                   | resource_status_reason                         | resource_status | event_time           | stack_name                                        |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
| UpdateDeployment | 183b4a66-4521-4c70-90b9-4733ce017b3d | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:44:21Z | overcloud-Controller-gm34wwhcni7u-0-sm4gdmxjfesg  |
| UpdateDeployment | 409c6074-25d1-4901-97e8-689e81f61e62 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:58Z | overcloud-Controller-gm34wwhcni7u-1-zgqpydv2oawr  |
| UpdateDeployment | c8ed8054-4ebd-40a5-9d62-b665f3a1b754 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:46Z | overcloud-CephStorage-n3ft7va6txum-2-zfayegy6nz5u |
| UpdateDeployment | 0d0fa35e-6bdd-4e3f-81b0-43fdcb43bd66 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:49Z | overcloud-Compute-ia5kvmciy4x2-0-se7fq5l3alx5     |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
stack@instack:~>>> heat hook-poll -n5 overcloud
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
| resource_name    | id                                   | resource_status_reason                         | resource_status | event_time           | stack_name                                        |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
| UpdateDeployment | 183b4a66-4521-4c70-90b9-4733ce017b3d | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:44:21Z | overcloud-Controller-gm34wwhcni7u-0-sm4gdmxjfesg  |
| UpdateDeployment | c8ed8054-4ebd-40a5-9d62-b665f3a1b754 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:46Z | overcloud-CephStorage-n3ft7va6txum-2-zfayegy6nz5u |
| UpdateDeployment | 0d0fa35e-6bdd-4e3f-81b0-43fdcb43bd66 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:49Z | overcloud-Compute-ia5kvmciy4x2-0-se7fq5l3alx5     |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
stack@instack:~>>> heat hook-poll -n5 overcloud
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
| resource_name    | id                                   | resource_status_reason                         | resource_status | event_time           | stack_name                                        |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
| UpdateDeployment | 183b4a66-4521-4c70-90b9-4733ce017b3d | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:44:21Z | overcloud-Controller-gm34wwhcni7u-0-sm4gdmxjfesg  |
| UpdateDeployment | c8ed8054-4ebd-40a5-9d62-b665f3a1b754 | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:43:46Z | overcloud-CephStorage-n3ft7va6txum-2-zfayegy6nz5u |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+---------------------------------------------------+
stack@instack:~>>> heat hook-poll -n5 overcloud
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+--------------------------------------------------+
| resource_name    | id                                   | resource_status_reason                         | resource_status | event_time           | stack_name                                       |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+--------------------------------------------------+
| UpdateDeployment | 183b4a66-4521-4c70-90b9-4733ce017b3d | UPDATE paused until Hook pre-update is cleared | CREATE_COMPLETE | 2015-12-12T22:44:21Z | overcloud-Controller-gm34wwhcni7u-0-sm4gdmxjfesg |
+------------------+--------------------------------------+------------------------------------------------+-----------------+----------------------+--------------------------------------------------+
Comment 16 Jan Provaznik 2015-12-14 03:36:38 EST
I don't think the verification above is sufficient for this BZ. This BZ address the issue when "openstack overcloud update" fails which causes that some breakpoints are left on stack, then if some *other* command is executed on the stack in failed state, it would be hanging on the breakpoints which were left there from the update command. So the right verification of this BZ should be something like this:
1) run "openstack overcloud update" and cause it fails on e.g. first node
2) check with "heat hook-poll -n5 overcloud" that some breakpoints were left on the stack
3) run "openstack overcloud deploy" (or "openstack overcloud node delete") - if this BZ is fixed, breakpoints should be cleared when you run this command, if it's not fixed, this command will be hanging and you will see breakpoints still set on the stack with "heat hook-poll -n5 overcloud"
Comment 17 Marius Cornea 2015-12-14 08:27:47 EST
(In reply to Jan Provaznik from comment #16)
> I don't think the verification above is sufficient for this BZ. This BZ
> address the issue when "openstack overcloud update" fails which causes that
> some breakpoints are left on stack, then if some *other* command is executed
> on the stack in failed state, it would be hanging on the breakpoints which
> were left there from the update command. So the right verification of this
> BZ should be something like this:
> 1) run "openstack overcloud update" and cause it fails on e.g. first node
> 2) check with "heat hook-poll -n5 overcloud" that some breakpoints were left
> on the stack
> 3) run "openstack overcloud deploy" (or "openstack overcloud node delete") -
> if this BZ is fixed, breakpoints should be cleared when you run this
> command, if it's not fixed, this command will be hanging and you will see
> breakpoints still set on the stack with "heat hook-poll -n5 overcloud"

1. I triggered an update that failed on the first node of the stack
overcloud  | UPDATE_FAILED

2. I couldn't check the breakpoints status because at this point:
Stack status UPDATE_FAILED not IN_PROGRESS

3. I reran the initial deploy command and checking the breakpoints the list shows empty:

stack@instack:~>>> heat hook-poll -n5 overcloud
+----+------------------------+-----------------+------------+------------+
| id | resource_status_reason | resource_status | event_time | stack_name |
+----+------------------------+-----------------+------------+------------+
+----+------------------------+-----------------+------------+------------+

But the deploy command fails:

stack@instack:~>>> time bash deploy.command 
Deploying templates in the directory /home/stack/templates/my-overcloud
Stack failed with status: resources.Compute: Stack overcloud-Compute-fpeywqgic2le already has an action (UPDATE) in progress.
ERROR: openstack Heat Stack update failed.

I'm not sure if this relates to the initial report of this bug or it's another issue (there are some heat stacks which are currently UPDATE_IN_PROGRESS). Can you confirm that the steps that I did were enough for the verification of this ticket? Thanks.
Comment 18 Jan Provaznik 2015-12-14 08:33:28 EST
Hi Marius,
I agree that the issue you hit is unrelated to this BZ and the fact that "heat hook-poll -n5 overcloud" after re-running  the deploy command proves that this issue is fixed, thanks
Comment 20 errata-xmlrpc 2015-12-21 11:49:41 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2650

Note You need to log in before you can comment on or make changes to this bug.