Bug 1459981 - Heat stack delete fails due to clock skew
Heat stack delete fails due to clock skew
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat (Show other bugs)
10.0 (Newton)
Unspecified Unspecified
medium Severity low
: z1
: 11.0 (Ocata)
Assigned To: Zane Bitter
Amit Ugol
: Triaged, ZStream
Depends On: 1427969
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-08 14:05 EDT by Zane Bitter
Modified: 2017-07-19 13:04 EDT (History)
9 users (show)

See Also:
Fixed In Version: openstack-heat-8.0.0-9.el7ost
Doc Type: Bug Fix
Doc Text:
Cause: Heat erroneously calculated the time difference between two events if one of them appeared to be in the future. Consequence: A significant clock skew between two servers each running heat-engine processes could cause a stack operation to time out immediately if it was started on the server with the faster clock and one of the tasks was immediately picked up on the server with the slower clock. Fix: Negative time differences in events are now handled correctly. Result: Operations won't immediately time out due to clock skews.
Story Points: ---
Clone Of: 1427969
Environment:
Last Closed: 2017-07-19 13:04:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1669608 None None None 2017-06-08 14:05 EDT
Launchpad 1688327 None None None 2017-06-08 14:05 EDT
OpenStack gerrit 465727 None None None 2017-06-08 14:06 EDT

  None (edit)
Description Zane Bitter 2017-06-08 14:05:10 EDT
+++ This bug was initially created as a clone of Bug #1427969 +++

Description of problem:
[stack@cpspospdrt01 sshserver]$ openstack stack list
+--------------------------+---------------+-----------------+----------------------+--------------+
| ID                       | Stack Name    | Stack Status    | Creation Time        | Updated Time |
+--------------------------+---------------+-----------------+----------------------+--------------+
| 18e08c4d-6e73-49fd-a97c- | timeout-test3 | CREATE_COMPLETE | 2017-03-01T14:36:01Z | None         |
| b39a3cd20f15             |               |                 |                      |              |
| d6bf85f4-4c11-4c2e-a86c- | web-asg-demo  | CREATE_COMPLETE | 2017-02-24T16:42:08Z | None         |
| 93f3e4c1982a             |               |                 |                      |              |
+--------------------------+---------------+-----------------+----------------------+--------------+
[stack@cpspospdrt01 sshserver]$ openstack stack delete timeout-test3 --wait --yes
2017-03-01 14:52:00Z [timeout-test3]: DELETE_FAILED  Timed out
2017-03-01 14:52:05Z [timeout-test3]: DELETE_IN_PROGRESS  Stack DELETE started
2017-03-01 14:52:05Z [timeout-test3.cpu_alarm_low]: DELETE_IN_PROGRESS  state changed
2017-03-01 14:52:05Z [timeout-test3.floating_ip]: DELETE_IN_PROGRESS  state changed
2017-03-01 14:52:05Z [timeout-test3.cpu_alarm_high]: DELETE_IN_PROGRESS  state changed
2017-03-01 14:52:05Z [timeout-test3.monitor]: DELETE_IN_PROGRESS  state changed
2017-03-01 14:52:06Z [timeout-test3.cpu_alarm_low]: DELETE_COMPLETE  state changed
2017-03-01 14:52:06Z [timeout-test3.cpu_alarm_high]: DELETE_COMPLETE  state changed
2017-03-01 14:52:06Z [timeout-test3.scaleup_policy]: DELETE_IN_PROGRESS  state changed

 Stack timeout-test3 DELETE_FAILED

Unable to delete 1 of the 1 stacks.

[stack@cpspospdrt01 sshserver]$ heat resource-list timeout-test3
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
+------------------+--------------------------------------+-----------------------------------+-----------------+----------------------+
| resource_name    | physical_resource_id                 | resource_type                     | resource_status | updated_time         |
+------------------+--------------------------------------+-----------------------------------+-----------------+----------------------+
| floating_ip      | a03ba81a-9bdb-4beb-9a23-e320288f6c9d | OS::Neutron::FloatingIP           | DELETE_COMPLETE | 2017-03-01T14:36:03Z |
| group            | f77f0184-0086-4310-af27-39fe0aa44626 | OS::Heat::AutoScalingGroup        | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
| lb               | 94ec43c3-b649-4209-8e71-377804f4c312 | OS::Neutron::LBaaS::LoadBalancer  | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
| listener         | acc577f8-a770-4e23-94e0-6f6258506a2a | OS::Neutron::LBaaS::Listener      | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
| monitor          | e26e5579-12ca-4fc4-8f9e-ebbcb9063405 | OS::Neutron::LBaaS::HealthMonitor | DELETE_COMPLETE | 2017-03-01T14:36:03Z |
| pool             | 634d9c52-8e3f-4830-83a5-1050d1b28659 | OS::Neutron::LBaaS::Pool          | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
| scaledown_policy | 29c64fcbe72443878d6fb5e541b4bb61     | OS::Heat::ScalingPolicy           | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
| scaleup_policy   | 31a217fe087c4d48acbd84617eb612df     | OS::Heat::ScalingPolicy           | DELETE_COMPLETE | 2017-03-01T14:36:03Z |
| sec_group        | f084442d-0bb7-48f7-a343-fb19341209a0 | OS::Neutron::SecurityGroup        | CREATE_COMPLETE | 2017-03-01T14:36:03Z |
+------------------+--------------------------------------+-----------------------------------+-----------------+------------

Version-Release number of selected component (if applicable):
grep -i heat sosreport-20170227-183334/overcloud-controller-0.localdomain/installed-rpms 
heat-cfntools-1.3.0-2.el7ost.noarch                         Mon Dec 12 17:58:23 2016
openstack-heat-api-7.0.0-7.el7ost.noarch                    Mon Dec 12 18:33:24 2016
openstack-heat-api-cfn-7.0.0-7.el7ost.noarch                Mon Dec 12 18:33:25 2016
openstack-heat-api-cloudwatch-7.0.0-7.el7ost.noarch         Mon Dec 12 18:33:23 2016
openstack-heat-common-7.0.0-7.el7ost.noarch                 Mon Dec 12 18:14:24 2016
openstack-heat-engine-7.0.0-7.el7ost.noarch                 Mon Dec 12 18:33:24 2016
puppet-heat-9.4.1-1.el7ost.noarch                           Mon Dec 12 18:37:58 2016
python-heat-agent-0-0.9.1e6015dgit.el7ost.noarch            Mon Dec 12 17:59:59 2016
python-heat-agent-puppet-0-0.9.1e6015dgit.el7ost.noarch     Mon Dec 12 18:01:49 2016
python-heatclient-1.5.0-1.el7ost.noarch 

How reproducible:
100%


Actual results:

delete fails
Expected results:
deletes 

Additional info:

Stack create for the stack also sometimes fails.

--- Additional comment from Zane Bitter on 2017-05-04 11:37:40 EDT ---

Aha, I've tracked down the cause. There's a skew between the clocks of the different nodes of around 5s or so (controller_0 is the slow one), and a bug in Heat that causes timeouts when the difference between the start time and the current time is negative.

I'll fix the bug upstream. In the meantime, an easy workaround would be to sync the clocks.
Comment 4 errata-xmlrpc 2017-07-19 13:04:31 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1779

Note You need to log in before you can comment on or make changes to this bug.