Description of problem: We're seeing a flood of AWS calls of the type 'DescribeLoadBalancers' that is causing us to be API throttled when viewing/managing the AWS account. Version-Release number of selected component (if applicable): atomic-openshift-master-3.3.1.5-1.git.0.62700af.el7.x86_64 atomic-openshift-3.3.1.5-1.git.0.62700af.el7.x86_64 atomic-openshift-sdn-ovs-3.3.1.5-1.git.0.62700af.el7.x86_64 atomic-openshift-clients-3.3.1.5-1.git.0.62700af.el7.x86_64 atomic-openshift-node-3.3.1.5-1.git.0.62700af.el7.x86_64 tuned-profiles-atomic-openshift-node-3.3.1.5-1.git.0.62700af.el7.x86_64 How reproducible: It isn't clear when this issue started occurring (as we didn't have AWS CloudTrails enabled to go back in history), but it is currently affecting us without any relief. Steps to Reproduce: 1. Install an OpenShift cluster onto AWS. 2. <something triggers processing on OpenShift that causes it to repeatedly query the AWS API with DescribeLoadBalancers.> 3. Try to view/list the EC2 ELBs associated with the AWS account through the AWS web UI. Actual results: Witness the API throttling/rate-limiting (unable to view/list EC2 ELBs for instance). Message from AWS interface: An error occurred fetching load balancer data: Rate exceeded Expected results: Should be able to view/manage the AWS account without OpenShift consuming the account's API usage quota. Additional info: Amazon tells us we are making approximately 10 API calls per second which is exhausting the API usage limits. After enabling CloudTrails, we can see the specific calls which look like: {"eventVersion":"1.04","userIdentity":{"type":"IAMUser","principalId":"AWSIDHERE","arn":"arn:aws:iam::323879391493:user/cloud_provider","accountId":"9999999999","accessKeyId":"AWSACCESSKEYIDHERE","userName":"cloud_provider"},"eventTime":"2017-01-16T14:35:43Z","eventSource":"elasticloadbalancing.amazonaws.com","eventName":"DescribeLoadBalancers","awsRegion":"us-east-1","sourceIPAddress":"SOURCEIPHERE","userAgent":"aws-sdk-go/1.0.8 (go1.6.3; linux; amd64)","errorCode":"AccessDenied","errorMessage":"User: arn:aws:iam::323879391493:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers","requestParameters":null,"responseElements":null,"requestID":"0fabb6d3-dbf9-11e6-aa9e-ad05bef77b66","eventID":"89a6c447-5d91-4e2a-ab98-23686b68a272","eventType":"AwsApiCall","recipientAccountId":"323879391493"}
I don't have access to the previous list of AWS IAM permissions that the cloud_provider user had while the excessive AWS API calls were being made, but this current set of IAM permissions lets the DescribeLoadBalancers succeed and keep OpenShift from repeatedly retrying: "ec2:AttachVolume", "ec2:CreateTags", "ec2:CreateVolume", "ec2:DeleteVolume", "ec2:DescribeInstances", "elasticloadbalancing:DescribeLoadBalancerAttributes", "elasticloadbalancing:DescribeLoadBalancers", "ec2:DescribeVolumes", "ec2:DetachVolume", "kms:CreateGrant"
For the record, the issue here is that this version of the service controller doesn't trust the state it gets from the API server on deletes. So, it EnsureLoadBalancerDeleted (which calls DescribeLoadBalancer) to ensure that any load balancer that was created is now deleted. If we need to have a mode where these calls are never made, we need to get a change in upstream that: 1. Uses an admission controller to ensure that services with type ExternalLoadBalancer are never persisted in the API server 2. Adds a config item to the service controller that tells the controller never to make any load-balancer related calls Barring that, the 3.3.1 service controller will need to be permissioned to make the call, or you will get a new, indefinitely repeating call for each deleted service.
Joel Diaz and Paul Morie, I thought we discussed closing this bug because this turned out to be caused by the lack of perms. Once we fixed the perms, we are no longer getting an excessive amount of api calls. Do we still want to close this bug?
Yes, that is what we had discussed. There is still the issue of not having the exponential back-off logic that caused OpenShift to make the endless number of DescribeLoadBalancer AWS API calls. Should this bug be used to track the need for the exponential back-off, or is there already another bug/feature tracker for that?
@jdiaz That effort is being addressed by RFE: https://trello.com/c/plDazQbH/640-provide-comprehensive-aws-api-exponential-backoff-ops-rfe So I think this bug can be closed.