1571940 – Unusual number of DescribeLoadBalancers API call

Bug 1571940 - Unusual number of DescribeLoadBalancers API call

Summary: Unusual number of DescribeLoadBalancers API call

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Miciah Dashiel Butler Masters
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-25 18:38 UTC by Hemant Kumar
Modified:	2018-10-03 17:46 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The service controller was sending a request to the cloud provider every time a service was created. The controller sends these requests in order to check whether the cloud provider has a load balancer that is associated with the service. However, the controller performed this check even for non-LoadBalancer services. Consequence: On a cluster on which services were frequently created, the service controller was sending many requests to the cloud provider. These requests could dominate the cloud provider API usage. Fix: The service controller no longer sends a request to the cloud provider when a non-LoadBalancer service is created. Result: Usage of the cloud provider API is reduced on clusters on which services are frequently created.
Clone Of:
Environment:
Last Closed:	2018-10-03 16:53:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Comment 2 Meng Bo 2018-05-07 05:50:45 UTC

One possible reason may be the user creates the service with type Loadbalancer on the cluster, but the aws cloudprovider configured in the cluster does not have proper permission to create ELB. Then it will keep trying to send the related api request to AWS.

Comment 3 Miciah Dashiel Butler Masters 2018-05-16 18:23:23 UTC

Here's a tentative fix:

Upstream PR: https://github.com/kubernetes/kubernetes/pull/63926
Origin PR: https://github.com/openshift/origin/pull/19742

Comment 4 openshift-github-bot 2018-05-23 21:29:59 UTC

Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/aaf46c42283d2cd2611474ad72a76c196f09d0d1
UPSTREAM: 63926: Avoid unnecessary calls to the cloud provider

If the service controller is processing a service that does not have type
LoadBalancer and does not have any external load balancer recorded in its
status, do not call the cloud provider to check for an associated load
balancer.

This commit fixes bug 1571940.

https://bugzilla.redhat.com/show_bug.cgi?id=1571940

Comment 5 Miciah Dashiel Butler Masters 2018-05-25 18:15:13 UTC

Hemant, we now have a patch in Origin, but we need to make sure we get solid before-and-after metrics to justify inclusion in Kubernetes upstream.  Is this something you would be able to do when we deploy the new build to prod?

Comment 14 Hongan Li 2018-06-08 08:45:42 UTC

Tested in v3.10.0-0.63.0 but log shows it always retry to process the deleted LB service.

1. Create LB service (retry and failed to getting LB in OpenStack but that's fine)
# oc create service loadbalancer lb --tcp=27011:8080
# oc get svc
NAME      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
lb        LoadBalancer   172.30.106.161   <pending>     27011:32625/TCP   14m

controlers log:
I0608 07:27:53.901197       1 service_controller.go:305] Ensuring LB for service lha/lb
I0608 07:27:54.402720       1 service_controller.go:718] Finished syncing service "lha/lb" (501.539244ms)
E0608 07:27:54.402833       1 service_controller.go:219] error processing service lha/lb (will retry): failed to ensure load balancer for service lha/lb: error getting loadbalancer a514ed2096aeb11e898f6fa163e78c85: Resource not found

2. Delete the LB service and it always "retry" and shows error log.

I0608 08:12:55.778609       1 service_controller.go:731] Service has been deleted lha/lb. Attempting to cleanup load balancer resources
I0608 08:12:55.930254       1 service_controller.go:718] Finished syncing service "lha/lb" (151.72944ms)
E0608 08:12:55.930473       1 service_controller.go:219] error processing service lha/lb (will retry): Resource not found
I0608 08:17:55.930787       1 service_controller.go:731] Service has been deleted lha/lb. Attempting to cleanup load balancer resources
I0608 08:17:56.130174       1 service_controller.go:718] Finished syncing service "lha/lb" (199.410055ms)
E0608 08:17:56.130363       1 service_controller.go:219] error processing service lha/lb (will retry): Resource not found

Comment 15 Miciah Dashiel Butler Masters 2018-06-08 17:18:03 UTC

The errors on deleting a service with type=LoadBalancer in comment 14 look related to <https://github.com/kubernetes/kubernetes/issues/60658>
 (fixed in <https://github.com/kubernetes/kubernetes/pull/61002>; see also bug 1587812), which is a defect in the OpenStack cloud provider.  This defect causes any GetLoadBalancer request for a non-existent load balancer to return an error instead of reporting that the load balancer was not found.  Returning an error causes Kubernetes to retry the GetLoadBalancer request to the OpenStack API.    Thus you see the extra requests when you create or delete a LoadBalancer service on OpenShift on OpenStack.

The original scenario in this Bugzilla report concerned OpenShift on an AWS cluster (not OpenStack) that should have had no LoadBalancer services.  In this case, the root problem is that we were sending requests to the cloud provider for non-LoadBalancer services.  The proposed fix is to avoid requests to the cloud provider for non-LoadBalancer services.

To verify the fix, we want to make sure that Kubernetes does not send any GetLoadBalancer requests to the cloud provider when non-LoadBalancer services are created, modified, or deleted.

Comment 16 Hongan Li 2018-06-11 09:53:17 UTC

Thanks for your clarification, Miciah.

Verified in v3.10.0-0.64.0, there is no DescribeLoadBalancers AWS API request when non-LoadBalancer services are created.


In the old version, can see below logs when non-LoadBalancer services are created:

Jun 11 04:41:08 ip-172-18-0-177.ec2.internal atomic-openshift-master-controllers[11949]: I0611 08:41:08.644303       1 log_handler.go:27] AWS request: elasticloadbalancing DescribeLoadBalancers
Jun 11 04:41:22 ip-172-18-0-177.ec2.internal atomic-openshift-master-controllers[11949]: I0611 08:41:22.242775       1 log_handler.go:32] AWS API Send: elasticloadbalancing DescribeLoadBalancers &{DescribeLoadBalancers POST / 0xc4275b27d0 <nil>} {
Jun 11 04:41:22 ip-172-18-0-177.ec2.internal atomic-openshift-master-controllers[11949]: I0611 08:41:22.248988       1 log_handler.go:37] AWS API ValidateResponse: elasticloadbalancing DescribeLoadBalancers &{DescribeLoadBalancers POST / 0xc4275b27d0 <nil>} {


OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
kernel: Linux ip-172-18-1-195.ec2.internal 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Note You need to log in before you can comment on or make changes to this bug.