Bug 1674054 - Sporadic error 504 when running a deploy on large clouds
Summary: Sporadic error 504 when running a deploy on large clouds
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Alex Schultz
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks: 1674066
TreeView+ depends on / blocked
 
Reported: 2019-02-08 21:45 UTC by David Vallee Delisle
Modified: 2023-09-07 19:46 UTC (History)
8 users (show)

Fixed In Version: python-tripleoclient-5.4.6-4.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1674066 (view as bug list)
Environment:
Last Closed: 2019-10-16 09:40:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1833452 0 None None None 2019-06-19 18:11:17 UTC
Red Hat Issue Tracker OSP-28209 0 None None None 2023-09-07 19:46:09 UTC
Red Hat Product Errata RHBA-2019:3112 0 None None None 2019-10-16 09:40:52 UTC

Description David Vallee Delisle 2019-02-08 21:45:19 UTC
Description of problem:
During a deploy, there's a heatclient.poll_for_events being called every once in a while. In large environment, getting the events can be quite long [1] and results in the tripleoclient quitting because haproxy returns a 504, because we break the 2m timeout.

We can workaround this issue by changing the haproxy timeout, but this is not desirable because it often gets forgotten when managing multiple clouds.

We believe that tripleo shouldn't be failing when getting a 504, and it should instead retry. Either that or we should optimize the way we poll for events.


Version-Release number of selected component (if applicable):
python-heatclient-1.5.2-1.el7ost.noarch                     Thu May 24 18:44:01 2018
python-tripleoclient-5.4.6-1.el7ost.noarch                  Mon Feb  4 18:51:14 2019


How reproducible:
All the time

Steps to Reproduce:
1. Have hundreds of compute nodes
2. Deploy

Actual results:
tripleoclient will exit with a 504 because GET /v1/19235c66a3cc45c4a58349e1448a9d40/stacks/overcloud/4a40a99e-a258-440a-a5fe-ad3e276e30b1/resources?nested_depth=5 takes more than 2 minutes to complete. 

[1]
~~~
heat-api.log:2019-02-06 06:43:16.157 33257 DEBUG oslo_policy._cache_handler [req-19e6935e-cd27-4b68-9aa8-637de9226ac4 51eefe5b76b2405f990106af93c1c252 19235c66a3cc45c4a58349e1448a9d40 - default default] Reloading cached file /etc/heat/policy.json read_cached_file /usr/lib/python2.7/site-packages/oslo_policy/_cache_handler.py:38
heat-api.log:2019-02-06 06:43:16.202 33257 DEBUG oslo_policy.policy [req-19e6935e-cd27-4b68-9aa8-637de9226ac4 51eefe5b76b2405f990106af93c1c252 19235c66a3cc45c4a58349e1448a9d40 - default default] Reloaded policy file: /etc/heat/policy.json _load_policy_file /usr/lib/python2.7/site-packages/oslo_policy/policy.py:584
heat-api.log:2019-02-06 06:43:16.203 33257 DEBUG heat.common.wsgi [req-19e6935e-cd27-4b68-9aa8-637de9226ac4 51eefe5b76b2405f990106af93c1c252 19235c66a3cc45c4a58349e1448a9d40 - default default] Calling <heat.api.openstack.v1.resources.ResourceController object at 0x7f6b71bde290> : index __call__ /usr/lib/python2.7/site-packages/heat/common/wsgi.py:839
heat-api.log:2019-02-06 06:43:16.204 33257 DEBUG oslo_messaging._drivers.amqpdriver [req-19e6935e-cd27-4b68-9aa8-637de9226ac4 51eefe5b76b2405f990106af93c1c252 19235c66a3cc45c4a58349e1448a9d40 - default default] CALL msg_id: 3baa80f7d602487da77b5be8daf14887 exchange 'heat' topic 'engine' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:568
heat-api.log:2019-02-06 06:45:41.162 33257 INFO eventlet.wsgi.server [req-19e6935e-cd27-4b68-9aa8-637de9226ac4 51eefe5b76b2405f990106af93c1c252 19235c66a3cc45c4a58349e1448a9d40 - default default] 192.168.8.1 - - [06/Feb/2019 06:45:41] "GET /v1/19235c66a3cc45c4a58349e1448a9d40/stacks/overcloud/4a40a99e-a258-440a-a5fe-ad3e276e30b1/resources?nested_depth=5 HTTP/1.1" 200 8436480 145.009166
heat-api.log:2019-02-06 06:45:41.900 33257 DEBUG heat.api.middleware.version_negotiation [req-19e6935e-cd27-4b68-9aa8-637de9226ac4 51eefe5b76b2405f990106af93c1c252 19235c66a3cc45c4a58349e1448a9d40 - default default] Processing request: GET /v1/19235c66a3cc45c4a58349e1448a9d40/stacks/4a40a99e-a258-440a-a5fe-ad3e276e30b1/events Accept: application/json process_request /usr/lib/python2.7/site-packages/heat/api/middleware/version_negotiation.py:50
heat-api.log:2019-02-06 06:45:41.900 33257 DEBUG heat.api.middleware.version_negotiation [req-19e6935e-cd27-4b68-9aa8-637de9226ac4 51eefe5b76b2405f990106af93c1c252 19235c66a3cc45c4a58349e1448a9d40 - default default] Matched versioned URI. Version: 1.0 process_request /usr/lib/python2.7/site-packages/heat/api/middleware/version_negotiation.py:65
~~~
Expected results:
We should either optimize this, or tripleoclient shouldn't fail

Additional info:

Comment 21 errata-xmlrpc 2019-10-16 09:40:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3112


Note You need to log in before you can comment on or make changes to this bug.