Bug 1733013 - When trying to create multiple LBs via Heat SDK. The Octavia API seems to be overwhelmed by the load. [NEEDINFO]
Summary: When trying to create multiple LBs via Heat SDK. The Octavia API seems to be ...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-octavia
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Assaf Muller
QA Contact: Bruna Bonguardo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-25 02:14 UTC by Brendan Shephard
Modified: 2019-09-24 08:26 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-21 14:19:41 UTC
Target Upstream Version:
nmagnezi: needinfo? (bshephar)


Attachments (Terms of Use)

Description Brendan Shephard 2019-07-25 02:14:25 UTC
Description of problem:
When using the Heat SDK to create a large number of load-balancers ( 30 ), it appears to overwhelm the Octavia API endpoint.

Version-Release number of selected component (if applicable):
openstack-octavia-api 2.0.3-2
RHOSP13

How reproducible:
Every time when trying to create a large number of LBs. Around 30

Steps to Reproduce:
1. Use Heat SDK to create multiple Load Balancers
2. Observe the below error and time outs
3.

Actual results:

worker.log.1:2019-07-23 11:26:43.752 24 ERROR oslo_messaging.rpc.server TimeOutException: contacting the amphora timed out
worker.log.1:           |__Flow 'octavia-delete-member-flow': TimeOutException: contacting the amphora timed out
worker.log.1:2019-07-23 11:26:43.800 24 ERROR octavia.controller.worker.controller_worker TimeOutException: contacting the amphora timed out
worker.log.1:2019-07-23 11:26:43.854 24 ERROR oslo_messaging.rpc.server [req-bd2eda7a-ea75-4544-8a89-e4cbe61de99e - 514d97890ed54b719a6e34496b392dfe - - -] Exception during 
message handling: TimeOutException: contacting the amphora timed out
worker.log.1:2019-07-23 11:26:43.854 24 ERROR oslo_messaging.rpc.server TimeOutException: contacting the amphora timed out
worker.log.1:           |__Flow 'octavia-delete-member-flow': TimeOutException: contacting the amphora timed out

Expected results:
Multiple load balancers would be created.

Additional info:
When using the heat CLI, it seems to wait for the stack to be provisioned before creating the next one. As such, the same issue isn't observed via the Heat CLI. But when using the SDK, it appears to overwhelm the API endpoint for Octavia.

Comment 1 Michael Johnson 2019-07-25 15:52:37 UTC
The log snippet is from the controller-worker process and during a member delete workflow. This is not from the octavia-api nor from a load balancer create request.

We need the full logs to identify the issue, can you please request an sos report?

The Octavia API has been stress tested to handle high rates of requests without issue.

It is likely the heat template is not honoring the REST API status codes returned by the octavia API.
A typically lifecycle would be once the load balancer create request is received, the load balancer will be in an immutable state for a period of time while the load balancer is provisioned. The API requester will receive a 409 status code with an appropriate error message indicating that the load balancer is immutable at the moment and the request should be tried again.
If the heat template ignores these status codes, it will progress thinking the follow on API requests are completing when in fact they are getting 409 "retry" status messages back.

We will need the API and worker logs to confirm or identify the issue. It would also be helpful to know if the API call for the load balancer create was a "single-call-create" (creating the whole load balancer, listeners, pools, etc. in one API call) or if it was individual API calls for each object.

Comment 2 Brendan Shephard 2019-07-29 00:19:05 UTC
Hey Michael,

Thanks for your reply. My chances of getting a full sos report are fairly slim due to the secure environment requirements. It sounds like we could just be focusing on the API log for Octavia and the heat-engine logs though. Would that be sufficient here?

I can probably get a snippet of the code they are using so that we can better understand how they are constructing the API call. Let me check with them and get back to you on that one.

Comment 3 Michael Johnson 2019-07-29 06:19:36 UTC
Yes, the Octavia API log and a code snippet would be helpful here.

Comment 4 Nir Magnezi 2019-07-31 14:14:14 UTC
(In reply to Michael Johnson from comment #3)
> Yes, the Octavia API log and a code snippet would be helpful here.

Brendan,

Please provide API logs as Michael requested so we can take a closer look into this. Thank you!


Note You need to log in before you can comment on or make changes to this bug.