Bug 1413650 - excessive number of DescribeLoadBalancer AWS API calls from OpenShift
Summary: excessive number of DescribeLoadBalancer AWS API calls from OpenShift
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Paul Morie
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-16 15:48 UTC by Joel Diaz
Modified: 2018-04-17 12:12 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-24 19:11:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Joel Diaz 2017-01-16 15:48:37 UTC
Description of problem:
We're seeing a flood of AWS calls of the type 'DescribeLoadBalancers' that is causing us to be API throttled when viewing/managing the AWS account.


Version-Release number of selected component (if applicable):
atomic-openshift-master-3.3.1.5-1.git.0.62700af.el7.x86_64
atomic-openshift-3.3.1.5-1.git.0.62700af.el7.x86_64
atomic-openshift-sdn-ovs-3.3.1.5-1.git.0.62700af.el7.x86_64
atomic-openshift-clients-3.3.1.5-1.git.0.62700af.el7.x86_64
atomic-openshift-node-3.3.1.5-1.git.0.62700af.el7.x86_64
tuned-profiles-atomic-openshift-node-3.3.1.5-1.git.0.62700af.el7.x86_64


How reproducible:
It isn't clear when this issue started occurring (as we didn't have AWS CloudTrails enabled to go back in history), but it is currently affecting us without any relief.


Steps to Reproduce:
1. Install an OpenShift cluster onto AWS.
2. <something triggers processing on OpenShift that causes it to repeatedly query the AWS API with DescribeLoadBalancers.>
3. Try to view/list the EC2 ELBs associated with the AWS account through the AWS web UI.

Actual results:
Witness the API throttling/rate-limiting (unable to view/list EC2 ELBs for instance). Message from AWS interface: An error occurred fetching load balancer data: Rate exceeded

Expected results:
Should be able to view/manage the AWS account without OpenShift consuming the account's API usage quota.

Additional info:

Amazon tells us we are making approximately 10 API calls per second which is exhausting the API usage limits.

After enabling CloudTrails, we can see the specific calls which look like:

{"eventVersion":"1.04","userIdentity":{"type":"IAMUser","principalId":"AWSIDHERE","arn":"arn:aws:iam::323879391493:user/cloud_provider","accountId":"9999999999","accessKeyId":"AWSACCESSKEYIDHERE","userName":"cloud_provider"},"eventTime":"2017-01-16T14:35:43Z","eventSource":"elasticloadbalancing.amazonaws.com","eventName":"DescribeLoadBalancers","awsRegion":"us-east-1","sourceIPAddress":"SOURCEIPHERE","userAgent":"aws-sdk-go/1.0.8 (go1.6.3; linux; amd64)","errorCode":"AccessDenied","errorMessage":"User: arn:aws:iam::323879391493:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers","requestParameters":null,"responseElements":null,"requestID":"0fabb6d3-dbf9-11e6-aa9e-ad05bef77b66","eventID":"89a6c447-5d91-4e2a-ab98-23686b68a272","eventType":"AwsApiCall","recipientAccountId":"323879391493"}

Comment 1 Joel Diaz 2017-01-17 16:44:11 UTC
I don't have access to the previous list of AWS IAM permissions that the cloud_provider user had while the excessive AWS API calls were being made, but this current set of IAM permissions lets the DescribeLoadBalancers succeed and keep OpenShift from repeatedly retrying:

"ec2:AttachVolume",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:DeleteVolume",
"ec2:DescribeInstances",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"elasticloadbalancing:DescribeLoadBalancers",
"ec2:DescribeVolumes",
"ec2:DetachVolume",
"kms:CreateGrant"

Comment 2 Paul Morie 2017-01-17 17:03:29 UTC
For the record, the issue here is that this version of the service controller doesn't trust the state it gets from the API server on deletes.  So, it EnsureLoadBalancerDeleted (which calls DescribeLoadBalancer) to ensure that any load balancer that was created is now deleted.

If we need to have a mode where these calls are never made, we need to get a change in upstream that:

1.  Uses an admission controller to ensure that services with type ExternalLoadBalancer are never persisted in the API server
2.  Adds a config item to the service controller that tells the controller never to make any load-balancer related calls

Barring that, the 3.3.1 service controller will need to be permissioned to make the call, or you will get a new, indefinitely repeating call for each deleted service.

Comment 3 Thomas Wiest 2017-01-23 20:53:15 UTC
Joel Diaz and Paul Morie, I thought we discussed closing this bug because this turned out to be caused by the lack of perms. Once we fixed the perms, we are no longer getting an excessive amount of api calls.

Do we still want to close this bug?

Comment 4 Joel Diaz 2017-01-24 14:38:14 UTC
Yes, that is what we had discussed. There is still the issue of not having the exponential back-off logic that caused OpenShift to make the endless number of DescribeLoadBalancer AWS API calls.

Should this bug be used to track the need for the exponential back-off, or is there already another bug/feature tracker for that?

Comment 5 Thomas Wiest 2017-01-24 14:54:13 UTC
@jdiaz That effort is being addressed by RFE:

https://trello.com/c/plDazQbH/640-provide-comprehensive-aws-api-exponential-backoff-ops-rfe

So I think this bug can be closed.


Note You need to log in before you can comment on or make changes to this bug.