Bug 2035406 - Too frequent async task polling causes delay in timeout detection
Summary: Too frequent async task polling causes delay in timeout detection
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z4
: 16.2 (Train on RHEL 8.4)
Assignee: Takashi Kajinami
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks: 2109931
TreeView+ depends on / blocked
 
Reported: 2021-12-23 23:41 UTC by Takashi Kajinami
Modified: 2024-10-01 19:15 UTC (History)
3 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20220821010130.b1e9bfe.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2109931 (view as bug list)
Environment:
Last Closed: 2022-12-07 19:21:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1955682 0 None None None 2021-12-24 02:46:04 UTC
OpenStack gerrit 849992 0 None MERGED Reduce frequency of task retries 2022-09-30 09:31:39 UTC
Red Hat Issue Tracker OSP-11948 0 None None None 2021-12-23 23:43:30 UTC
Red Hat Product Errata RHBA-2022:8794 0 None None None 2022-12-07 19:22:10 UTC

Description Takashi Kajinami 2021-12-23 23:41:15 UTC
Description of problem:

Currently some tasks like paunch uses async mechanism in ansible.
These tasks have the first task to run a long running job asynchronously and the second task to run periodically check status of the first task.

Currently the second polling task is retried every 3 seconds but this interval is not guaranteed and it can be delayed for example there are multiple nodes, and more frequent interval causes more additional overhead. This makes the timeout detection happen very late.

In the real field we saw that the paunch task timed out in a controller node after 1 hour but ansible detected that after 2 hours.


Version-Release number of selected component (if applicable):


How reproducible:
Sometimes, when timeout is detected

Steps to Reproduce:
1.
2.
3.

Actual results:
Timeout is detected a while after the task times out

Expected results:
Timeout is detected immediately after the task times out


Additional info:

Comment 13 errata-xmlrpc 2022-12-07 19:21:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8794


Note You need to log in before you can comment on or make changes to this bug.