Bug 1367229
| Summary: | OSE 3.3 tries to create load balancers in AWS | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Stefanie Forrester <dakini> |
| Component: | Node | Assignee: | Paul Morie <pmorie> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Weihua Meng <wmeng> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.3.0 | CC: | acarter, agoldste, aos-bugs, bingli, dakini, decarr, dyocum, haowang, jdiaz, jokerman, mifiedle, mmccomas, tdawson, vrutkovs, wmeng, xxia, yufchang, zhaliu |
| Target Milestone: | --- | Keywords: | NeedsTestCase |
| Target Release: | 3.3.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-22 22:31:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 1
Stefanie Forrester
2016-08-15 23:55:28 UTC
I cannot reproduce this in the aws platform with: openshift v3.3.0.21 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git Could you please provide your master config ? Could you please provide the output from 'oc get service/cakephp-mysql-example -o json'? Stefanie, also, is this actually Origin, or is it OCP? I think it's OCP, since it came from an enterprise repo. Here's the rpm version:
atomic-openshift-3.3.0.19-1.git.0.93380aa.el7.x86_64
[root@dev-preview-int-master-490df ~]# oc get services cakephp-mysql-example -n dakinitest -o json
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "cakephp-mysql-example",
"namespace": "dakinitest",
"selfLink": "/api/v1/namespaces/dakinitest/services/cakephp-mysql-example",
"uid": "f61206a8-633f-11e6-a9c9-0a27f5fffde3",
"resourceVersion": "6274169",
"creationTimestamp": "2016-08-15T23:28:24Z",
"labels": {
"app": "cakephp-mysql-example",
"template": "cakephp-mysql-example"
},
"annotations": {
"description": "Exposes and load balances the application pods",
"openshift.io/generated-by": "OpenShiftNewApp"
}
},
"spec": {
"ports": [
{
"name": "web",
"protocol": "TCP",
"port": 8080,
"targetPort": 8080
}
],
"selector": {
"name": "cakephp-mysql-example"
},
"portalIP": "172.30.46.10",
"clusterIP": "172.30.46.10",
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
}
The template itself comes from here:
https://github.com/openshift/online/tree/master/templates/examples
Here is the master config:
[root@dev-preview-int-master-490df ~]# cat /etc/origin/master/master-config.yaml
admissionConfig:
pluginConfig:
PodNodeConstraints:
configuration:
apiVersion: v1
kind: PodNodeConstraintsConfig
ProjectRequestLimit:
configuration:
apiVersion: v1
kind: ProjectRequestLimitConfig
limits:
- selector:
admin: 'true'
- maxProjects: '1'
apiLevels:
- v1
apiVersion: v1
assetConfig:
extensionScripts:
- /etc/openshift-online/ui-extensions/assets/extensions/online-extensions.js
- /etc/openshift-online/ui-extensions/assets/extensions/intercom-widget-extension.js
extensionStylesheets:
- /etc/openshift-online/ui-extensions/assets/extensions/online-extensions.css
logoutURL: ""
masterPublicURL: https://api.dev-preview-int.openshift.com
publicURL: https://console.dev-preview-int.openshift.com/console/
servingInfo:
bindAddress: 0.0.0.0:443
bindNetwork: tcp4
certFile: master.server.crt
clientCA: ""
keyFile: master.server.key
maxRequestsInFlight: 0
requestTimeoutSeconds: 0
controllerLeaseTTL: 30
controllers: '*'
corsAllowedOrigins:
- 127.0.0.1
- localhost
- 172.31.6.178
- 52.91.93.189
- ip-172-31-6-178.ec2.internal
- kubernetes.default
- kubernetes.default.svc.cluster.local
- kubernetes
- openshift.default
- openshift.default.svc
- api.dev-preview-int.openshift.com
- 172.30.0.1
- internal.api.dev-preview-int.openshift.com
- ec2-52-91-93-189.compute-1.amazonaws.com
- openshift.default.svc.cluster.local
- kubernetes.default.svc
- openshift
- console.dev-preview-int.openshift.com
- api.dev-preview-int.openshift.com
dnsConfig:
bindAddress: 0.0.0.0:8053
bindNetwork: tcp4
etcdClientInfo:
ca: master.etcd-ca.crt
certFile: master.etcd-client.crt
keyFile: master.etcd-client.key
urls:
- https://ip-172-31-6-178.ec2.internal:2379
- https://ip-172-31-6-177.ec2.internal:2379
- https://ip-172-31-6-179.ec2.internal:2379
etcdStorageConfig:
kubernetesStoragePrefix: kubernetes.io
kubernetesStorageVersion: v1
openShiftStoragePrefix: openshift.io
openShiftStorageVersion: v1
imageConfig:
format: registry.qe.openshift.com/openshift3/ose-${component}:${version}
latest: false
imagePolicyConfig:
disableScheduledImport: true
maxImagesBulkImportedPerRepository: 3
kind: MasterConfig
kubeletClientInfo:
ca: ca.crt
certFile: master.kubelet-client.crt
keyFile: master.kubelet-client.key
port: 10250
kubernetesMasterConfig:
admissionConfig:
pluginConfig:
BuildOverrides:
configuration:
apiVersion: v1
forcePull: true
kind: BuildOverridesConfig
ClusterResourceOverride:
configuration:
apiVersion: v1
cpuRequestToLimitPercent: '6'
kind: ClusterResourceOverrideConfig
limitCPUToMemoryPercent: '200'
memoryRequestToLimitPercent: '60'
PodNodeConstraints:
configuration:
apiVersion: v1
kind: PodNodeConstraintsConfig
RunOnceDuration:
configuration:
activeDeadlineSecondsOverride: '3600'
apiVersion: v1
kind: RunOnceDurationConfig
pluginOrderOverride:
- RunOnceDuration
- NamespaceLifecycle
- PodNodeConstraints
- OriginPodNodeEnvironment
- ClusterResourceOverride
- LimitRanger
- ServiceAccount
- SecurityContextConstraint
- BuildDefaults
- BuildOverrides
- ResourceQuota
- SCCExecRestrictions
- AlwaysPullImages
apiServerArguments:
cloud-config:
- /etc/origin/cloudprovider/aws.conf
cloud-provider:
- aws
controllerArguments:
cloud-config:
- /etc/origin/cloudprovider/aws.conf
cloud-provider:
- aws
pvclaimbinder-sync-period:
- 30s
terminated-pod-gc-threshold:
- '3000'
masterCount: 3
masterIP: 172.31.6.178
podEvictionTimeout:
proxyClientInfo:
certFile: master.proxy-client.crt
keyFile: master.proxy-client.key
schedulerConfigFile: /etc/origin/master/scheduler.json
servicesNodePortRange: ""
servicesSubnet: 172.30.0.0/16
staticNodeNames: []
masterClients:
externalKubernetesClientConnectionOverrides:
acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
burst: 400
contentType: application/vnd.kubernetes.protobuf
ops: 200
externalKubernetesKubeConfig: ""
openshiftLoopbackClientConnectionOverrides:
acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
burst: 600
contentType: application/vnd.kubernetes.protobuf
ops: 300
openshiftLoopbackKubeConfig: openshift-master.kubeconfig
masterPublicURL: https://api.dev-preview-int.openshift.com
networkConfig:
clusterNetworkCIDR: 10.1.0.0/16
hostSubnetLength: 8
networkPluginName: redhat/openshift-ovs-multitenant
serviceNetworkCIDR: 172.30.0.0/16
oauthConfig:
alwaysShowProviderSelection: true
assetPublicURL: https://console.dev-preview-int.openshift.com/console/
grantConfig:
method: auto
identityProviders:
- challenge: false
login: true
mappingMethod: lookup
name: github
provider:
apiVersion: v1
clientID: ***********
clientSecret: *******************
kind: GitHubIdentityProvider
masterCA: ca.crt
masterPublicURL: https://api.dev-preview-int.openshift.com
masterURL: https://internal.api.dev-preview-int.openshift.com
sessionConfig:
sessionMaxAgeSeconds: 3600
sessionName: ssn
sessionSecretsFile: /etc/origin/master/session-secrets.yaml
templates:
error: /etc/openshift-online/ui-extensions/custom-templates/oauth-error-dev.html
providerSelection: /etc/openshift-online/ui-extensions/custom-templates/provider-selection-dev.html
tokenConfig:
accessTokenMaxAgeSeconds: 2678400
authorizeTokenMaxAgeSeconds: 300
pauseControllers: false
policyConfig:
bootstrapPolicyFile: /etc/origin/master/policy.json
openshiftInfrastructureNamespace: openshift-infra
openshiftSharedResourcesNamespace: openshift
projectConfig:
defaultNodeSelector: type=compute
projectRequestMessage: ""
projectRequestTemplate: default/project-request
securityAllocator:
mcsAllocatorRange: s0:/2
mcsLabelsPerProject: 5
uidAllocatorRange: 1000000000-1999999999/10000
routingConfig:
subdomain: 1ec1.dev-preview-int.openshiftapps.com
serviceAccountConfig:
limitSecretReferences: false
managedNames:
- default
- builder
- deployer
masterCA: ca.crt
privateKeyFile: serviceaccounts.private.key
publicKeyFiles:
- serviceaccounts.public.key
servingInfo:
bindAddress: 0.0.0.0:443
bindNetwork: tcp4
certFile: master.server.crt
clientCA: ca.crt
keyFile: master.server.key
maxRequestsInFlight: 1000
namedCertificates:
- certFile: /etc/origin/master/named_certificates/star.dev-preview-int.openshift.com.crt
keyFile: /etc/origin/master/named_certificates/star.dev-preview-int.openshift.com.key
names:
- api.dev-preview-int.openshift.com
- console.dev-preview-int.openshift.com
requestTimeoutSeconds: 3600
volumeConfig:
dynamicProvisioningEnabled: false
The "error" is confusing but harmless. See https://github.com/kubernetes/kubernetes/issues/30700 for more details. We will do a release note for this for the 3.3 release and try to fix in a future release. We just installed OCP 3.3.0.33 in prod yesterday, which contains this bug. We're now seeing 27,000 to 35,000 instances of AWS throttling per controller per day in prod because of the repeated requests to view the non-existent AWS load balancer for each service. This is an example of the messages we're seeing that many times per day in the controller logs: "Failed to process service delta. Retrying in 5s: Error getting LB for service jboss/newapp: Throttling: Rate exceeded" Because of the heavy usage of AWS API, it's affecting our ability to use the AWS web console since it keeps displaying "rate limit exceeded" when we try to view or change components in the console. It makes it difficult (but not impossible) to do ad-hoc tasks in the console manually. I believe the amount of API requests we're generating is also interfering with the cluster's performance, since it prevents API calls for volume creates/deletes/attaches/detaches from succeeding immediately. Opened a related bug for the controllers shutting down under these conditions. https://bugzilla.redhat.com/show_bug.cgi?id=1381745 Related issue: https://github.com/kubernetes/kubernetes/issues/33088 +1 Seeing similar behavior as what Sefanie is reporting.
I believe we calculated an average of 600 API calls per minute of this type (from the AWS Cloudtrails logging):
{"eventVersion":"1.04","userIdentity":{"type":"IAMUser","principalId":"SECRET_ACCESS_KEY","arn":"arn:aws:iam::ACCOUNT_NUMBER:user/cloud_provider","accountId":"ACCOUNT_NUMBER","accessKeyId":"SECRET_ACCESS_KEY_ID","userName":"cloud_provider"},"eventTime":"2016-10-10T23:56:40Z","eventSource":"elasticloadbalancing.amazonaws.com","eventName":"DescribeLoadBalancers","awsRegion":"us-east-1","sourceIPAddress":"SOURCE_IP","userAgent":"aws-sdk-go/1.0.8 (go1.6.2; linux; amd64)","errorCode":"AccessDenied","errorMessage":"User: arn:aws:iam::ACCOUNT_NUMBER:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers","requestParameters":null,"responseElements":null,"requestID":"REQUEST_ID","eventID":"EVENT_ID","eventType":"AwsApiCall","recipientAccountId":"ACCOUNT_ID"}
This is causing API throttling through the AWS web UI and any kind of manual/automation scripts that attempt API calls on this AWS account.
I looked at the AWS account that Joel was using at the time. In region us-east-1, we had 23,317 calls to DescribeLoadBalancers during a 40-minute period (which does indeed come out to about 600 calls per minute). This particular region was hosting two test clusters.
Since some of our Dedicated customers also host multiple clusters in a single AWS account, this has become a blocker for the 3.3 upgrade in Dedicated.
The following API calls occurred between this time:
"eventTime": "2016-10-11T16:54:50Z"
"eventTime": "2016-10-11T17:34:52Z"
Errors present:
"errorCode": "AccessDenied",
Events present:
"eventName": "CreateSnapshot",
"eventName": "CreateTags",
"eventName": "DeleteSnapshot",
"eventName": "DescribeInstances",
"eventName": "DescribeLoadBalancers",
"eventName": "DescribeSnapshots",
"eventName": "DescribeVolumes",
Number of calls per event:
DescribeLoadBalancers 23317
DescribeVolumes 12
DescribeInstances 10
CreateTags 12
CreateSnapshot 6
DescribeSnapshots 1
I've opened https://github.com/openshift/ose/pull/414 to fix this. My PR to the OSE 3.3 branch has merged. Verified.
openshift v3.3.1.2
kubernetes v1.3.0+52492b4
etcd 2.3.0+git
watched for 16min, no message like "Error creating load balancer (will retry): Error getting LB for service dakinitest/cakephp-mysql-example: AccessDenied: User: arn:aws:iam::704252977135:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers" found.
steps:
1. set up HA cluster with ELB on AWS.
2. use account which does not have permission to describe load balancer on each master host.
cd /etc/sysconfig
sed -i 's/<old>/<new>/g' atomic-openshift-*
sed -i "s/<old>/<new>/g" atomic-openshift-*
systemctl restart atomic-openshift-master-api
systemctl restart atomic-openshift-master-controllers
systemctl restart atomic-openshift-node
3. login and create project
4. oc new-app cakephp-mysql-example
5. oc get events -w
We didn't meet the error messages about CreatingLoadBalancer again in our test against online STG 3.3.1: OpenShift Master: v3.3.1.3 Kubernetes Master: v1.3.0+52492b4 |