Bug 1367229 - OSE 3.3 tries to create load balancers in AWS
Summary: OSE 3.3 tries to create load balancers in AWS
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.3.1
Assignee: Paul Morie
QA Contact: Weihua Meng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-15 23:53 UTC by Stefanie Forrester
Modified: 2016-11-22 22:31 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-22 22:31:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 1 Stefanie Forrester 2016-08-15 23:55:28 UTC
oc v3.3.0.19
kubernetes v1.3.0+507d3a7

Comment 2 Wang Haoran 2016-08-16 02:56:23 UTC
I cannot reproduce this in the aws platform with:
openshift v3.3.0.21
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

Could you please provide your master config ?

Comment 3 Andy Goldstein 2016-08-16 10:11:37 UTC
Could you please provide the output from 'oc get service/cakephp-mysql-example -o json'?

Comment 4 Andy Goldstein 2016-08-16 10:16:12 UTC
Stefanie, also, is this actually Origin, or is it OCP?

Comment 5 Stefanie Forrester 2016-08-16 14:43:49 UTC
I think it's OCP, since it came from an enterprise repo. Here's the rpm version:

atomic-openshift-3.3.0.19-1.git.0.93380aa.el7.x86_64

[root@dev-preview-int-master-490df ~]# oc get services cakephp-mysql-example -n dakinitest -o json
{
    "kind": "Service",
    "apiVersion": "v1",
    "metadata": {
        "name": "cakephp-mysql-example",
        "namespace": "dakinitest",
        "selfLink": "/api/v1/namespaces/dakinitest/services/cakephp-mysql-example",
        "uid": "f61206a8-633f-11e6-a9c9-0a27f5fffde3",
        "resourceVersion": "6274169",
        "creationTimestamp": "2016-08-15T23:28:24Z",
        "labels": {
            "app": "cakephp-mysql-example",
            "template": "cakephp-mysql-example"
        },
        "annotations": {
            "description": "Exposes and load balances the application pods",
            "openshift.io/generated-by": "OpenShiftNewApp"
        }
    },
    "spec": {
        "ports": [
            {
                "name": "web",
                "protocol": "TCP",
                "port": 8080,
                "targetPort": 8080
            }
        ],
        "selector": {
            "name": "cakephp-mysql-example"
        },
        "portalIP": "172.30.46.10",
        "clusterIP": "172.30.46.10",
        "type": "ClusterIP",
        "sessionAffinity": "None"
    },
    "status": {
        "loadBalancer": {}
    }
}

The template itself comes from here:
https://github.com/openshift/online/tree/master/templates/examples

Comment 6 Stefanie Forrester 2016-08-16 14:49:31 UTC
Here is the master config:

[root@dev-preview-int-master-490df ~]# cat /etc/origin/master/master-config.yaml
admissionConfig:
  pluginConfig:
    PodNodeConstraints:
      configuration:
        apiVersion: v1
        kind: PodNodeConstraintsConfig
    ProjectRequestLimit:
      configuration:
        apiVersion: v1
        kind: ProjectRequestLimitConfig
        limits:
        - selector:
            admin: 'true'
        - maxProjects: '1'
apiLevels:
- v1
apiVersion: v1
assetConfig:
  extensionScripts:
  - /etc/openshift-online/ui-extensions/assets/extensions/online-extensions.js
  - /etc/openshift-online/ui-extensions/assets/extensions/intercom-widget-extension.js
  extensionStylesheets:
  - /etc/openshift-online/ui-extensions/assets/extensions/online-extensions.css
  logoutURL: ""
  masterPublicURL: https://api.dev-preview-int.openshift.com
  publicURL: https://console.dev-preview-int.openshift.com/console/
  servingInfo:
    bindAddress: 0.0.0.0:443
    bindNetwork: tcp4
    certFile: master.server.crt
    clientCA: ""
    keyFile: master.server.key
    maxRequestsInFlight: 0
    requestTimeoutSeconds: 0
controllerLeaseTTL: 30
controllers: '*'
corsAllowedOrigins:
- 127.0.0.1
- localhost
- 172.31.6.178
- 52.91.93.189
- ip-172-31-6-178.ec2.internal
- kubernetes.default
- kubernetes.default.svc.cluster.local
- kubernetes
- openshift.default
- openshift.default.svc
- api.dev-preview-int.openshift.com
- 172.30.0.1
- internal.api.dev-preview-int.openshift.com
- ec2-52-91-93-189.compute-1.amazonaws.com
- openshift.default.svc.cluster.local
- kubernetes.default.svc
- openshift
- console.dev-preview-int.openshift.com
- api.dev-preview-int.openshift.com
dnsConfig:
  bindAddress: 0.0.0.0:8053
  bindNetwork: tcp4
etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
  - https://ip-172-31-6-178.ec2.internal:2379
  - https://ip-172-31-6-177.ec2.internal:2379
  - https://ip-172-31-6-179.ec2.internal:2379
etcdStorageConfig:
  kubernetesStoragePrefix: kubernetes.io
  kubernetesStorageVersion: v1
  openShiftStoragePrefix: openshift.io
  openShiftStorageVersion: v1
imageConfig:
  format: registry.qe.openshift.com/openshift3/ose-${component}:${version}
  latest: false
imagePolicyConfig:
  disableScheduledImport: true
  maxImagesBulkImportedPerRepository: 3
kind: MasterConfig
kubeletClientInfo:
  ca: ca.crt
  certFile: master.kubelet-client.crt
  keyFile: master.kubelet-client.key
  port: 10250
kubernetesMasterConfig:
  admissionConfig:
    pluginConfig:
      BuildOverrides:
        configuration:
          apiVersion: v1
          forcePull: true
          kind: BuildOverridesConfig
      ClusterResourceOverride:
        configuration:
          apiVersion: v1
          cpuRequestToLimitPercent: '6'
          kind: ClusterResourceOverrideConfig
          limitCPUToMemoryPercent: '200'
          memoryRequestToLimitPercent: '60'
      PodNodeConstraints:
        configuration:
          apiVersion: v1
          kind: PodNodeConstraintsConfig
      RunOnceDuration:
        configuration:
          activeDeadlineSecondsOverride: '3600'
          apiVersion: v1
          kind: RunOnceDurationConfig
    pluginOrderOverride:
    - RunOnceDuration
    - NamespaceLifecycle
    - PodNodeConstraints
    - OriginPodNodeEnvironment
    - ClusterResourceOverride
    - LimitRanger
    - ServiceAccount
    - SecurityContextConstraint
    - BuildDefaults
    - BuildOverrides
    - ResourceQuota
    - SCCExecRestrictions
    - AlwaysPullImages
  apiServerArguments:
    cloud-config:
    - /etc/origin/cloudprovider/aws.conf
    cloud-provider:
    - aws
  controllerArguments:
    cloud-config:
    - /etc/origin/cloudprovider/aws.conf
    cloud-provider:
    - aws
    pvclaimbinder-sync-period:
    - 30s
    terminated-pod-gc-threshold:
    - '3000'
  masterCount: 3
  masterIP: 172.31.6.178
  podEvictionTimeout:
  proxyClientInfo:
    certFile: master.proxy-client.crt
    keyFile: master.proxy-client.key
  schedulerConfigFile: /etc/origin/master/scheduler.json
  servicesNodePortRange: ""
  servicesSubnet: 172.30.0.0/16
  staticNodeNames: []
masterClients:
  externalKubernetesClientConnectionOverrides:
    acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
    burst: 400
    contentType: application/vnd.kubernetes.protobuf
    ops: 200
  externalKubernetesKubeConfig: ""
  openshiftLoopbackClientConnectionOverrides:
    acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
    burst: 600
    contentType: application/vnd.kubernetes.protobuf
    ops: 300
  openshiftLoopbackKubeConfig: openshift-master.kubeconfig
masterPublicURL: https://api.dev-preview-int.openshift.com
networkConfig:
  clusterNetworkCIDR: 10.1.0.0/16
  hostSubnetLength: 8
  networkPluginName: redhat/openshift-ovs-multitenant
  serviceNetworkCIDR: 172.30.0.0/16
oauthConfig:
  alwaysShowProviderSelection: true
  assetPublicURL: https://console.dev-preview-int.openshift.com/console/
  grantConfig:
    method: auto
  identityProviders:
  - challenge: false
    login: true
    mappingMethod: lookup
    name: github
    provider:
      apiVersion: v1
      clientID: ***********
      clientSecret: *******************
      kind: GitHubIdentityProvider
  masterCA: ca.crt
  masterPublicURL: https://api.dev-preview-int.openshift.com
  masterURL: https://internal.api.dev-preview-int.openshift.com
  sessionConfig:
    sessionMaxAgeSeconds: 3600
    sessionName: ssn
    sessionSecretsFile: /etc/origin/master/session-secrets.yaml
  templates:
    error: /etc/openshift-online/ui-extensions/custom-templates/oauth-error-dev.html
    providerSelection: /etc/openshift-online/ui-extensions/custom-templates/provider-selection-dev.html
  tokenConfig:
    accessTokenMaxAgeSeconds: 2678400
    authorizeTokenMaxAgeSeconds: 300
pauseControllers: false
policyConfig:
  bootstrapPolicyFile: /etc/origin/master/policy.json
  openshiftInfrastructureNamespace: openshift-infra
  openshiftSharedResourcesNamespace: openshift
projectConfig:
  defaultNodeSelector: type=compute
  projectRequestMessage: ""
  projectRequestTemplate: default/project-request
  securityAllocator:
    mcsAllocatorRange: s0:/2
    mcsLabelsPerProject: 5
    uidAllocatorRange: 1000000000-1999999999/10000
routingConfig:
  subdomain: 1ec1.dev-preview-int.openshiftapps.com
serviceAccountConfig:
  limitSecretReferences: false
  managedNames:
  - default
  - builder
  - deployer
  masterCA: ca.crt
  privateKeyFile: serviceaccounts.private.key
  publicKeyFiles:
  - serviceaccounts.public.key
servingInfo:
  bindAddress: 0.0.0.0:443
  bindNetwork: tcp4
  certFile: master.server.crt
  clientCA: ca.crt
  keyFile: master.server.key
  maxRequestsInFlight: 1000
  namedCertificates:
  - certFile: /etc/origin/master/named_certificates/star.dev-preview-int.openshift.com.crt
    keyFile: /etc/origin/master/named_certificates/star.dev-preview-int.openshift.com.key
    names:
    - api.dev-preview-int.openshift.com
    - console.dev-preview-int.openshift.com
  requestTimeoutSeconds: 3600
volumeConfig:
  dynamicProvisioningEnabled: false

Comment 7 Andy Goldstein 2016-08-16 19:17:32 UTC
The "error" is confusing but harmless. See https://github.com/kubernetes/kubernetes/issues/30700 for more details. We will do a release note for this for the 3.3 release and try to fix in a future release.

Comment 8 Stefanie Forrester 2016-10-04 21:26:09 UTC
We just installed OCP 3.3.0.33 in prod yesterday, which contains this bug. We're now seeing 27,000 to 35,000 instances of AWS throttling per controller per day in prod because of the repeated requests to view the non-existent AWS load balancer for each service. This is an example of the messages we're seeing that many times per day in the controller logs:

"Failed to process service delta. Retrying in 5s: Error getting LB for service jboss/newapp: Throttling: Rate exceeded"

Because of the heavy usage of AWS API, it's affecting our ability to use the AWS web console since it keeps displaying "rate limit exceeded" when we try to view or change components in the console. It makes it difficult (but not impossible) to do ad-hoc tasks in the console manually.

I believe the amount of API requests we're generating is also interfering with the cluster's performance, since it prevents API calls for volume creates/deletes/attaches/detaches from succeeding immediately.

Comment 9 Stefanie Forrester 2016-10-04 21:37:19 UTC
Opened a related bug for the controllers shutting down under these conditions. https://bugzilla.redhat.com/show_bug.cgi?id=1381745

Comment 10 Derek Carr 2016-10-05 18:53:51 UTC
Related issue: https://github.com/kubernetes/kubernetes/issues/33088

Comment 11 Joel Diaz 2016-10-11 20:37:26 UTC
+1 Seeing similar behavior as what Sefanie is reporting.

I believe we calculated an average of 600 API calls per minute of this type (from the AWS Cloudtrails logging):

{"eventVersion":"1.04","userIdentity":{"type":"IAMUser","principalId":"SECRET_ACCESS_KEY","arn":"arn:aws:iam::ACCOUNT_NUMBER:user/cloud_provider","accountId":"ACCOUNT_NUMBER","accessKeyId":"SECRET_ACCESS_KEY_ID","userName":"cloud_provider"},"eventTime":"2016-10-10T23:56:40Z","eventSource":"elasticloadbalancing.amazonaws.com","eventName":"DescribeLoadBalancers","awsRegion":"us-east-1","sourceIPAddress":"SOURCE_IP","userAgent":"aws-sdk-go/1.0.8 (go1.6.2; linux; amd64)","errorCode":"AccessDenied","errorMessage":"User: arn:aws:iam::ACCOUNT_NUMBER:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers","requestParameters":null,"responseElements":null,"requestID":"REQUEST_ID","eventID":"EVENT_ID","eventType":"AwsApiCall","recipientAccountId":"ACCOUNT_ID"}

This is causing API throttling through the AWS web UI and any kind of manual/automation scripts that attempt API calls on this AWS account.

Comment 12 Stefanie Forrester 2016-10-12 16:55:46 UTC
I looked at the AWS account that Joel was using at the time. In region us-east-1, we had 23,317 calls to DescribeLoadBalancers during a 40-minute period (which does indeed come out to about 600 calls per minute). This particular region was hosting two test clusters.

Since some of our Dedicated customers also host multiple clusters in a single AWS account, this has become a blocker for the 3.3 upgrade in Dedicated.

The following API calls occurred between this time:

"eventTime": "2016-10-11T16:54:50Z"
"eventTime": "2016-10-11T17:34:52Z"

Errors present:
            "errorCode": "AccessDenied",

Events present:
            "eventName": "CreateSnapshot",
            "eventName": "CreateTags",
            "eventName": "DeleteSnapshot",
            "eventName": "DescribeInstances",
            "eventName": "DescribeLoadBalancers",
            "eventName": "DescribeSnapshots",
            "eventName": "DescribeVolumes",

Number of calls per event:

DescribeLoadBalancers   23317
DescribeVolumes         12
DescribeInstances       10
CreateTags              12
CreateSnapshot          6
DescribeSnapshots       1

Comment 13 Paul Morie 2016-10-13 21:55:12 UTC
I've opened https://github.com/openshift/ose/pull/414 to fix this.

Comment 14 Paul Morie 2016-10-17 16:33:32 UTC
My PR to the OSE 3.3 branch has merged.

Comment 15 Weihua Meng 2016-10-19 03:09:35 UTC
Verified.
    openshift v3.3.1.2
    kubernetes v1.3.0+52492b4
    etcd 2.3.0+git

watched for 16min, no message like "Error creating load balancer (will retry): Error getting LB for service dakinitest/cakephp-mysql-example: AccessDenied: User: arn:aws:iam::704252977135:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers" found.

Comment 16 Weihua Meng 2016-10-19 05:49:30 UTC
steps:
1. set up HA cluster with ELB on AWS.
2. use account which does not have permission to describe load balancer on each master host.
    cd /etc/sysconfig
    sed -i 's/<old>/<new>/g' atomic-openshift-*
    sed -i "s/<old>/<new>/g" atomic-openshift-*
     
    systemctl restart atomic-openshift-master-api
    systemctl restart atomic-openshift-master-controllers
    systemctl restart atomic-openshift-node

3. login and create project
4. oc new-app cakephp-mysql-example
5. oc get events -w

Comment 17 Bing Li 2016-10-20 02:31:21 UTC
We didn't meet the error messages about CreatingLoadBalancer again in our test against online STG 3.3.1:

OpenShift Master: v3.3.1.3
Kubernetes Master: v1.3.0+52492b4


Note You need to log in before you can comment on or make changes to this bug.