oc v3.3.0.19 kubernetes v1.3.0+507d3a7
I cannot reproduce this in the aws platform with: openshift v3.3.0.21 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git Could you please provide your master config ?
Could you please provide the output from 'oc get service/cakephp-mysql-example -o json'?
Stefanie, also, is this actually Origin, or is it OCP?
I think it's OCP, since it came from an enterprise repo. Here's the rpm version: atomic-openshift-3.3.0.19-1.git.0.93380aa.el7.x86_64 [root@dev-preview-int-master-490df ~]# oc get services cakephp-mysql-example -n dakinitest -o json { "kind": "Service", "apiVersion": "v1", "metadata": { "name": "cakephp-mysql-example", "namespace": "dakinitest", "selfLink": "/api/v1/namespaces/dakinitest/services/cakephp-mysql-example", "uid": "f61206a8-633f-11e6-a9c9-0a27f5fffde3", "resourceVersion": "6274169", "creationTimestamp": "2016-08-15T23:28:24Z", "labels": { "app": "cakephp-mysql-example", "template": "cakephp-mysql-example" }, "annotations": { "description": "Exposes and load balances the application pods", "openshift.io/generated-by": "OpenShiftNewApp" } }, "spec": { "ports": [ { "name": "web", "protocol": "TCP", "port": 8080, "targetPort": 8080 } ], "selector": { "name": "cakephp-mysql-example" }, "portalIP": "172.30.46.10", "clusterIP": "172.30.46.10", "type": "ClusterIP", "sessionAffinity": "None" }, "status": { "loadBalancer": {} } } The template itself comes from here: https://github.com/openshift/online/tree/master/templates/examples
Here is the master config: [root@dev-preview-int-master-490df ~]# cat /etc/origin/master/master-config.yaml admissionConfig: pluginConfig: PodNodeConstraints: configuration: apiVersion: v1 kind: PodNodeConstraintsConfig ProjectRequestLimit: configuration: apiVersion: v1 kind: ProjectRequestLimitConfig limits: - selector: admin: 'true' - maxProjects: '1' apiLevels: - v1 apiVersion: v1 assetConfig: extensionScripts: - /etc/openshift-online/ui-extensions/assets/extensions/online-extensions.js - /etc/openshift-online/ui-extensions/assets/extensions/intercom-widget-extension.js extensionStylesheets: - /etc/openshift-online/ui-extensions/assets/extensions/online-extensions.css logoutURL: "" masterPublicURL: https://api.dev-preview-int.openshift.com publicURL: https://console.dev-preview-int.openshift.com/console/ servingInfo: bindAddress: 0.0.0.0:443 bindNetwork: tcp4 certFile: master.server.crt clientCA: "" keyFile: master.server.key maxRequestsInFlight: 0 requestTimeoutSeconds: 0 controllerLeaseTTL: 30 controllers: '*' corsAllowedOrigins: - 127.0.0.1 - localhost - 172.31.6.178 - 52.91.93.189 - ip-172-31-6-178.ec2.internal - kubernetes.default - kubernetes.default.svc.cluster.local - kubernetes - openshift.default - openshift.default.svc - api.dev-preview-int.openshift.com - 172.30.0.1 - internal.api.dev-preview-int.openshift.com - ec2-52-91-93-189.compute-1.amazonaws.com - openshift.default.svc.cluster.local - kubernetes.default.svc - openshift - console.dev-preview-int.openshift.com - api.dev-preview-int.openshift.com dnsConfig: bindAddress: 0.0.0.0:8053 bindNetwork: tcp4 etcdClientInfo: ca: master.etcd-ca.crt certFile: master.etcd-client.crt keyFile: master.etcd-client.key urls: - https://ip-172-31-6-178.ec2.internal:2379 - https://ip-172-31-6-177.ec2.internal:2379 - https://ip-172-31-6-179.ec2.internal:2379 etcdStorageConfig: kubernetesStoragePrefix: kubernetes.io kubernetesStorageVersion: v1 openShiftStoragePrefix: openshift.io openShiftStorageVersion: v1 imageConfig: format: registry.qe.openshift.com/openshift3/ose-${component}:${version} latest: false imagePolicyConfig: disableScheduledImport: true maxImagesBulkImportedPerRepository: 3 kind: MasterConfig kubeletClientInfo: ca: ca.crt certFile: master.kubelet-client.crt keyFile: master.kubelet-client.key port: 10250 kubernetesMasterConfig: admissionConfig: pluginConfig: BuildOverrides: configuration: apiVersion: v1 forcePull: true kind: BuildOverridesConfig ClusterResourceOverride: configuration: apiVersion: v1 cpuRequestToLimitPercent: '6' kind: ClusterResourceOverrideConfig limitCPUToMemoryPercent: '200' memoryRequestToLimitPercent: '60' PodNodeConstraints: configuration: apiVersion: v1 kind: PodNodeConstraintsConfig RunOnceDuration: configuration: activeDeadlineSecondsOverride: '3600' apiVersion: v1 kind: RunOnceDurationConfig pluginOrderOverride: - RunOnceDuration - NamespaceLifecycle - PodNodeConstraints - OriginPodNodeEnvironment - ClusterResourceOverride - LimitRanger - ServiceAccount - SecurityContextConstraint - BuildDefaults - BuildOverrides - ResourceQuota - SCCExecRestrictions - AlwaysPullImages apiServerArguments: cloud-config: - /etc/origin/cloudprovider/aws.conf cloud-provider: - aws controllerArguments: cloud-config: - /etc/origin/cloudprovider/aws.conf cloud-provider: - aws pvclaimbinder-sync-period: - 30s terminated-pod-gc-threshold: - '3000' masterCount: 3 masterIP: 172.31.6.178 podEvictionTimeout: proxyClientInfo: certFile: master.proxy-client.crt keyFile: master.proxy-client.key schedulerConfigFile: /etc/origin/master/scheduler.json servicesNodePortRange: "" servicesSubnet: 172.30.0.0/16 staticNodeNames: [] masterClients: externalKubernetesClientConnectionOverrides: acceptContentTypes: application/vnd.kubernetes.protobuf,application/json burst: 400 contentType: application/vnd.kubernetes.protobuf ops: 200 externalKubernetesKubeConfig: "" openshiftLoopbackClientConnectionOverrides: acceptContentTypes: application/vnd.kubernetes.protobuf,application/json burst: 600 contentType: application/vnd.kubernetes.protobuf ops: 300 openshiftLoopbackKubeConfig: openshift-master.kubeconfig masterPublicURL: https://api.dev-preview-int.openshift.com networkConfig: clusterNetworkCIDR: 10.1.0.0/16 hostSubnetLength: 8 networkPluginName: redhat/openshift-ovs-multitenant serviceNetworkCIDR: 172.30.0.0/16 oauthConfig: alwaysShowProviderSelection: true assetPublicURL: https://console.dev-preview-int.openshift.com/console/ grantConfig: method: auto identityProviders: - challenge: false login: true mappingMethod: lookup name: github provider: apiVersion: v1 clientID: *********** clientSecret: ******************* kind: GitHubIdentityProvider masterCA: ca.crt masterPublicURL: https://api.dev-preview-int.openshift.com masterURL: https://internal.api.dev-preview-int.openshift.com sessionConfig: sessionMaxAgeSeconds: 3600 sessionName: ssn sessionSecretsFile: /etc/origin/master/session-secrets.yaml templates: error: /etc/openshift-online/ui-extensions/custom-templates/oauth-error-dev.html providerSelection: /etc/openshift-online/ui-extensions/custom-templates/provider-selection-dev.html tokenConfig: accessTokenMaxAgeSeconds: 2678400 authorizeTokenMaxAgeSeconds: 300 pauseControllers: false policyConfig: bootstrapPolicyFile: /etc/origin/master/policy.json openshiftInfrastructureNamespace: openshift-infra openshiftSharedResourcesNamespace: openshift projectConfig: defaultNodeSelector: type=compute projectRequestMessage: "" projectRequestTemplate: default/project-request securityAllocator: mcsAllocatorRange: s0:/2 mcsLabelsPerProject: 5 uidAllocatorRange: 1000000000-1999999999/10000 routingConfig: subdomain: 1ec1.dev-preview-int.openshiftapps.com serviceAccountConfig: limitSecretReferences: false managedNames: - default - builder - deployer masterCA: ca.crt privateKeyFile: serviceaccounts.private.key publicKeyFiles: - serviceaccounts.public.key servingInfo: bindAddress: 0.0.0.0:443 bindNetwork: tcp4 certFile: master.server.crt clientCA: ca.crt keyFile: master.server.key maxRequestsInFlight: 1000 namedCertificates: - certFile: /etc/origin/master/named_certificates/star.dev-preview-int.openshift.com.crt keyFile: /etc/origin/master/named_certificates/star.dev-preview-int.openshift.com.key names: - api.dev-preview-int.openshift.com - console.dev-preview-int.openshift.com requestTimeoutSeconds: 3600 volumeConfig: dynamicProvisioningEnabled: false
The "error" is confusing but harmless. See https://github.com/kubernetes/kubernetes/issues/30700 for more details. We will do a release note for this for the 3.3 release and try to fix in a future release.
We just installed OCP 3.3.0.33 in prod yesterday, which contains this bug. We're now seeing 27,000 to 35,000 instances of AWS throttling per controller per day in prod because of the repeated requests to view the non-existent AWS load balancer for each service. This is an example of the messages we're seeing that many times per day in the controller logs: "Failed to process service delta. Retrying in 5s: Error getting LB for service jboss/newapp: Throttling: Rate exceeded" Because of the heavy usage of AWS API, it's affecting our ability to use the AWS web console since it keeps displaying "rate limit exceeded" when we try to view or change components in the console. It makes it difficult (but not impossible) to do ad-hoc tasks in the console manually. I believe the amount of API requests we're generating is also interfering with the cluster's performance, since it prevents API calls for volume creates/deletes/attaches/detaches from succeeding immediately.
Opened a related bug for the controllers shutting down under these conditions. https://bugzilla.redhat.com/show_bug.cgi?id=1381745
Related issue: https://github.com/kubernetes/kubernetes/issues/33088
+1 Seeing similar behavior as what Sefanie is reporting. I believe we calculated an average of 600 API calls per minute of this type (from the AWS Cloudtrails logging): {"eventVersion":"1.04","userIdentity":{"type":"IAMUser","principalId":"SECRET_ACCESS_KEY","arn":"arn:aws:iam::ACCOUNT_NUMBER:user/cloud_provider","accountId":"ACCOUNT_NUMBER","accessKeyId":"SECRET_ACCESS_KEY_ID","userName":"cloud_provider"},"eventTime":"2016-10-10T23:56:40Z","eventSource":"elasticloadbalancing.amazonaws.com","eventName":"DescribeLoadBalancers","awsRegion":"us-east-1","sourceIPAddress":"SOURCE_IP","userAgent":"aws-sdk-go/1.0.8 (go1.6.2; linux; amd64)","errorCode":"AccessDenied","errorMessage":"User: arn:aws:iam::ACCOUNT_NUMBER:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers","requestParameters":null,"responseElements":null,"requestID":"REQUEST_ID","eventID":"EVENT_ID","eventType":"AwsApiCall","recipientAccountId":"ACCOUNT_ID"} This is causing API throttling through the AWS web UI and any kind of manual/automation scripts that attempt API calls on this AWS account.
I looked at the AWS account that Joel was using at the time. In region us-east-1, we had 23,317 calls to DescribeLoadBalancers during a 40-minute period (which does indeed come out to about 600 calls per minute). This particular region was hosting two test clusters. Since some of our Dedicated customers also host multiple clusters in a single AWS account, this has become a blocker for the 3.3 upgrade in Dedicated. The following API calls occurred between this time: "eventTime": "2016-10-11T16:54:50Z" "eventTime": "2016-10-11T17:34:52Z" Errors present: "errorCode": "AccessDenied", Events present: "eventName": "CreateSnapshot", "eventName": "CreateTags", "eventName": "DeleteSnapshot", "eventName": "DescribeInstances", "eventName": "DescribeLoadBalancers", "eventName": "DescribeSnapshots", "eventName": "DescribeVolumes", Number of calls per event: DescribeLoadBalancers 23317 DescribeVolumes 12 DescribeInstances 10 CreateTags 12 CreateSnapshot 6 DescribeSnapshots 1
I've opened https://github.com/openshift/ose/pull/414 to fix this.
My PR to the OSE 3.3 branch has merged.
Verified. openshift v3.3.1.2 kubernetes v1.3.0+52492b4 etcd 2.3.0+git watched for 16min, no message like "Error creating load balancer (will retry): Error getting LB for service dakinitest/cakephp-mysql-example: AccessDenied: User: arn:aws:iam::704252977135:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers" found.
steps: 1. set up HA cluster with ELB on AWS. 2. use account which does not have permission to describe load balancer on each master host. cd /etc/sysconfig sed -i 's/<old>/<new>/g' atomic-openshift-* sed -i "s/<old>/<new>/g" atomic-openshift-* systemctl restart atomic-openshift-master-api systemctl restart atomic-openshift-master-controllers systemctl restart atomic-openshift-node 3. login and create project 4. oc new-app cakephp-mysql-example 5. oc get events -w
We didn't meet the error messages about CreatingLoadBalancer again in our test against online STG 3.3.1: OpenShift Master: v3.3.1.3 Kubernetes Master: v1.3.0+52492b4