Bug 1427969 - Heat stack delete fails due to clock skew
Summary: Heat stack delete fails due to clock skew
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: z4
: 10.0 (Newton)
Assignee: Zane Bitter
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks: 1459981
TreeView+ depends on / blocked
 
Reported: 2017-03-01 15:40 UTC by Jeremy
Modified: 2020-06-11 13:21 UTC (History)
6 users (show)

Fixed In Version: openstack-heat-7.0.3-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1459981 (view as bug list)
Environment:
Last Closed: 2017-09-06 17:13:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1669608 0 None None None 2017-03-02 23:17:18 UTC
Launchpad 1688327 0 None None None 2017-05-04 15:37:39 UTC
OpenStack gerrit 469937 0 None None None 2017-06-08 18:08:27 UTC
Red Hat Product Errata RHBA-2017:2655 0 normal SHIPPED_LIVE openstack-heat bug fix advisory 2017-09-06 20:56:09 UTC

Description Jeremy 2017-03-01 15:40:05 UTC
Description of problem:
[stack@cpspospdrt01 sshserver]$ openstack stack list
+--------------------------+---------------+-----------------+----------------------+--------------+
| ID                       | Stack Name    | Stack Status    | Creation Time        | Updated Time |
+--------------------------+---------------+-----------------+----------------------+--------------+
| 18e08c4d-6e73-49fd-a97c- | timeout-test3 | CREATE_COMPLETE | 2017-03-01T14:36:01Z | None         |
| b39a3cd20f15             |               |                 |                      |              |
| d6bf85f4-4c11-4c2e-a86c- | web-asg-demo  | CREATE_COMPLETE | 2017-02-24T16:42:08Z | None         |
| 93f3e4c1982a             |               |                 |                      |              |
+--------------------------+---------------+-----------------+----------------------+--------------+
[stack@cpspospdrt01 sshserver]$ openstack stack delete timeout-test3 --wait --yes
2017-03-01 14:52:00Z [timeout-test3]: DELETE_FAILED  Timed out
2017-03-01 14:52:05Z [timeout-test3]: DELETE_IN_PROGRESS  Stack DELETE started
2017-03-01 14:52:05Z [timeout-test3.cpu_alarm_low]: DELETE_IN_PROGRESS  state changed
2017-03-01 14:52:05Z [timeout-test3.floating_ip]: DELETE_IN_PROGRESS  state changed
2017-03-01 14:52:05Z [timeout-test3.cpu_alarm_high]: DELETE_IN_PROGRESS  state changed
2017-03-01 14:52:05Z [timeout-test3.monitor]: DELETE_IN_PROGRESS  state changed
2017-03-01 14:52:06Z [timeout-test3.cpu_alarm_low]: DELETE_COMPLETE  state changed
2017-03-01 14:52:06Z [timeout-test3.cpu_alarm_high]: DELETE_COMPLETE  state changed
2017-03-01 14:52:06Z [timeout-test3.scaleup_policy]: DELETE_IN_PROGRESS  state changed

 Stack timeout-test3 DELETE_FAILED

Unable to delete 1 of the 1 stacks.

[stack@cpspospdrt01 sshserver]$ heat resource-list timeout-test3
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
+------------------+--------------------------------------+-----------------------------------+-----------------+----------------------+
| resource_name    | physical_resource_id                 | resource_type                     | resource_status | updated_time         |
+------------------+--------------------------------------+-----------------------------------+-----------------+----------------------+
| floating_ip      | a03ba81a-9bdb-4beb-9a23-e320288f6c9d | OS::Neutron::FloatingIP           | DELETE_COMPLETE | 2017-03-01T14:36:03Z |
| group            | f77f0184-0086-4310-af27-39fe0aa44626 | OS::Heat::AutoScalingGroup        | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
| lb               | 94ec43c3-b649-4209-8e71-377804f4c312 | OS::Neutron::LBaaS::LoadBalancer  | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
| listener         | acc577f8-a770-4e23-94e0-6f6258506a2a | OS::Neutron::LBaaS::Listener      | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
| monitor          | e26e5579-12ca-4fc4-8f9e-ebbcb9063405 | OS::Neutron::LBaaS::HealthMonitor | DELETE_COMPLETE | 2017-03-01T14:36:03Z |
| pool             | 634d9c52-8e3f-4830-83a5-1050d1b28659 | OS::Neutron::LBaaS::Pool          | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
| scaledown_policy | 29c64fcbe72443878d6fb5e541b4bb61     | OS::Heat::ScalingPolicy           | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
| scaleup_policy   | 31a217fe087c4d48acbd84617eb612df     | OS::Heat::ScalingPolicy           | DELETE_COMPLETE | 2017-03-01T14:36:03Z |
| sec_group        | f084442d-0bb7-48f7-a343-fb19341209a0 | OS::Neutron::SecurityGroup        | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
+------------------+--------------------------------------+-----------------------------------+-----------------+------------

Version-Release number of selected component (if applicable):
grep -i heat sosreport-20170227-183334/overcloud-controller-0.localdomain/installed-rpms 
heat-cfntools-1.3.0-2.el7ost.noarch                         Mon Dec 12 17:58:23 2016
openstack-heat-api-7.0.0-7.el7ost.noarch                    Mon Dec 12 18:33:24 2016
openstack-heat-api-cfn-7.0.0-7.el7ost.noarch                Mon Dec 12 18:33:25 2016
openstack-heat-api-cloudwatch-7.0.0-7.el7ost.noarch         Mon Dec 12 18:33:23 2016
openstack-heat-common-7.0.0-7.el7ost.noarch                 Mon Dec 12 18:14:24 2016
openstack-heat-engine-7.0.0-7.el7ost.noarch                 Mon Dec 12 18:33:24 2016
puppet-heat-9.4.1-1.el7ost.noarch                           Mon Dec 12 18:37:58 2016
python-heat-agent-0-0.9.1e6015dgit.el7ost.noarch            Mon Dec 12 17:59:59 2016
python-heat-agent-puppet-0-0.9.1e6015dgit.el7ost.noarch     Mon Dec 12 18:01:49 2016
python-heatclient-1.5.0-1.el7ost.noarch 

How reproducible:
100%


Actual results:

delete fails
Expected results:
deletes 

Additional info:

Stack create for the stack also sometimes fails.

Comment 1 Red Hat Bugzilla Rules Engine 2017-03-01 15:40:12 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 3 Zane Bitter 2017-03-01 22:27:18 UTC
One main issue here seems to be that once the nested stack goes into a DELETE_FAILED state, subsequent attempts to delete the parent resource can poll the nested stack state before it goes back to DELETE_IN_PROGRESS, see the old error, and immediately mark the parent resource failed again. There is code in place to ensure this doesn't happen on UPDATE, but it is not invoked on DELETE.

It's not clear to me why the operation is timing out in the first place, though. I take it the stack is not being created with a really short timeout value?

Comment 4 Jeremy 2017-03-02 15:16:46 UTC
the customer isn't setting short timeout. he is just using : openstack stack create timeout-test3 -t autoscaling_web_demo.yaml

Comment 5 Jeremy 2017-03-14 15:33:24 UTC
Hello,
Any update on this?
THanks,
Jeremy

Comment 6 Zane Bitter 2017-05-04 15:37:40 UTC
Aha, I've tracked down the cause. There's a skew between the clocks of the different nodes of around 5s or so (controller_0 is the slow one), and a bug in Heat that causes timeouts when the difference between the start time and the current time is negative.

I'll fix the bug upstream. In the meantime, an easy workaround would be to sync the clocks.

Comment 10 errata-xmlrpc 2017-09-06 17:13:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2655


Note You need to log in before you can comment on or make changes to this bug.