One possible reason may be the user creates the service with type Loadbalancer on the cluster, but the aws cloudprovider configured in the cluster does not have proper permission to create ELB. Then it will keep trying to send the related api request to AWS.
Here's a tentative fix: Upstream PR: https://github.com/kubernetes/kubernetes/pull/63926 Origin PR: https://github.com/openshift/origin/pull/19742
Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/aaf46c42283d2cd2611474ad72a76c196f09d0d1 UPSTREAM: 63926: Avoid unnecessary calls to the cloud provider If the service controller is processing a service that does not have type LoadBalancer and does not have any external load balancer recorded in its status, do not call the cloud provider to check for an associated load balancer. This commit fixes bug 1571940. https://bugzilla.redhat.com/show_bug.cgi?id=1571940
Hemant, we now have a patch in Origin, but we need to make sure we get solid before-and-after metrics to justify inclusion in Kubernetes upstream. Is this something you would be able to do when we deploy the new build to prod?
Tested in v3.10.0-0.63.0 but log shows it always retry to process the deleted LB service. 1. Create LB service (retry and failed to getting LB in OpenStack but that's fine) # oc create service loadbalancer lb --tcp=27011:8080 # oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE lb LoadBalancer 172.30.106.161 <pending> 27011:32625/TCP 14m controlers log: I0608 07:27:53.901197 1 service_controller.go:305] Ensuring LB for service lha/lb I0608 07:27:54.402720 1 service_controller.go:718] Finished syncing service "lha/lb" (501.539244ms) E0608 07:27:54.402833 1 service_controller.go:219] error processing service lha/lb (will retry): failed to ensure load balancer for service lha/lb: error getting loadbalancer a514ed2096aeb11e898f6fa163e78c85: Resource not found 2. Delete the LB service and it always "retry" and shows error log. I0608 08:12:55.778609 1 service_controller.go:731] Service has been deleted lha/lb. Attempting to cleanup load balancer resources I0608 08:12:55.930254 1 service_controller.go:718] Finished syncing service "lha/lb" (151.72944ms) E0608 08:12:55.930473 1 service_controller.go:219] error processing service lha/lb (will retry): Resource not found I0608 08:17:55.930787 1 service_controller.go:731] Service has been deleted lha/lb. Attempting to cleanup load balancer resources I0608 08:17:56.130174 1 service_controller.go:718] Finished syncing service "lha/lb" (199.410055ms) E0608 08:17:56.130363 1 service_controller.go:219] error processing service lha/lb (will retry): Resource not found
The errors on deleting a service with type=LoadBalancer in comment 14 look related to <https://github.com/kubernetes/kubernetes/issues/60658> (fixed in <https://github.com/kubernetes/kubernetes/pull/61002>; see also bug 1587812), which is a defect in the OpenStack cloud provider. This defect causes any GetLoadBalancer request for a non-existent load balancer to return an error instead of reporting that the load balancer was not found. Returning an error causes Kubernetes to retry the GetLoadBalancer request to the OpenStack API. Thus you see the extra requests when you create or delete a LoadBalancer service on OpenShift on OpenStack. The original scenario in this Bugzilla report concerned OpenShift on an AWS cluster (not OpenStack) that should have had no LoadBalancer services. In this case, the root problem is that we were sending requests to the cloud provider for non-LoadBalancer services. The proposed fix is to avoid requests to the cloud provider for non-LoadBalancer services. To verify the fix, we want to make sure that Kubernetes does not send any GetLoadBalancer requests to the cloud provider when non-LoadBalancer services are created, modified, or deleted.
Thanks for your clarification, Miciah. Verified in v3.10.0-0.64.0, there is no DescribeLoadBalancers AWS API request when non-LoadBalancer services are created. In the old version, can see below logs when non-LoadBalancer services are created: Jun 11 04:41:08 ip-172-18-0-177.ec2.internal atomic-openshift-master-controllers[11949]: I0611 08:41:08.644303 1 log_handler.go:27] AWS request: elasticloadbalancing DescribeLoadBalancers Jun 11 04:41:22 ip-172-18-0-177.ec2.internal atomic-openshift-master-controllers[11949]: I0611 08:41:22.242775 1 log_handler.go:32] AWS API Send: elasticloadbalancing DescribeLoadBalancers &{DescribeLoadBalancers POST / 0xc4275b27d0 <nil>} { Jun 11 04:41:22 ip-172-18-0-177.ec2.internal atomic-openshift-master-controllers[11949]: I0611 08:41:22.248988 1 log_handler.go:37] AWS API ValidateResponse: elasticloadbalancing DescribeLoadBalancers &{DescribeLoadBalancers POST / 0xc4275b27d0 <nil>} { OS: Red Hat Enterprise Linux Server release 7.5 (Maipo) kernel: Linux ip-172-18-1-195.ec2.internal 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux