Description of problem: When using the Heat SDK to create a large number of load-balancers ( 30 ), it appears to overwhelm the Octavia API endpoint. Version-Release number of selected component (if applicable): openstack-octavia-api 2.0.3-2 RHOSP13 How reproducible: Every time when trying to create a large number of LBs. Around 30 Steps to Reproduce: 1. Use Heat SDK to create multiple Load Balancers 2. Observe the below error and time outs 3. Actual results: worker.log.1:2019-07-23 11:26:43.752 24 ERROR oslo_messaging.rpc.server TimeOutException: contacting the amphora timed out worker.log.1: |__Flow 'octavia-delete-member-flow': TimeOutException: contacting the amphora timed out worker.log.1:2019-07-23 11:26:43.800 24 ERROR octavia.controller.worker.controller_worker TimeOutException: contacting the amphora timed out worker.log.1:2019-07-23 11:26:43.854 24 ERROR oslo_messaging.rpc.server [req-bd2eda7a-ea75-4544-8a89-e4cbe61de99e - 514d97890ed54b719a6e34496b392dfe - - -] Exception during message handling: TimeOutException: contacting the amphora timed out worker.log.1:2019-07-23 11:26:43.854 24 ERROR oslo_messaging.rpc.server TimeOutException: contacting the amphora timed out worker.log.1: |__Flow 'octavia-delete-member-flow': TimeOutException: contacting the amphora timed out Expected results: Multiple load balancers would be created. Additional info: When using the heat CLI, it seems to wait for the stack to be provisioned before creating the next one. As such, the same issue isn't observed via the Heat CLI. But when using the SDK, it appears to overwhelm the API endpoint for Octavia.
The log snippet is from the controller-worker process and during a member delete workflow. This is not from the octavia-api nor from a load balancer create request. We need the full logs to identify the issue, can you please request an sos report? The Octavia API has been stress tested to handle high rates of requests without issue. It is likely the heat template is not honoring the REST API status codes returned by the octavia API. A typically lifecycle would be once the load balancer create request is received, the load balancer will be in an immutable state for a period of time while the load balancer is provisioned. The API requester will receive a 409 status code with an appropriate error message indicating that the load balancer is immutable at the moment and the request should be tried again. If the heat template ignores these status codes, it will progress thinking the follow on API requests are completing when in fact they are getting 409 "retry" status messages back. We will need the API and worker logs to confirm or identify the issue. It would also be helpful to know if the API call for the load balancer create was a "single-call-create" (creating the whole load balancer, listeners, pools, etc. in one API call) or if it was individual API calls for each object.
Hey Michael, Thanks for your reply. My chances of getting a full sos report are fairly slim due to the secure environment requirements. It sounds like we could just be focusing on the API log for Octavia and the heat-engine logs though. Would that be sufficient here? I can probably get a snippet of the code they are using so that we can better understand how they are constructing the API call. Let me check with them and get back to you on that one.
Yes, the Octavia API log and a code snippet would be helpful here.
(In reply to Michael Johnson from comment #3) > Yes, the Octavia API log and a code snippet would be helpful here. Brendan, Please provide API logs as Michael requested so we can take a closer look into this. Thank you!