+++ This bug was initially created as a clone of Bug #1318917 +++ Description of problem: builder or builder,deployer serviceaccounts are delayed to create after project created sometimes.The delay time is a random one, maybe 6 mins to 1 hours.So during this time,customer can't do build or deploy operation. Version-Release number of selected component (if applicable): oc v3.2.0.4 kubernetes v1.2.0-origin-41-g91d3e75 Docker 1.8.2-el7, build a01dc02/1.8.2 kernel 3.10.0-327.10.1.el7.x86_64 How reproducible: sometimes Steps to Reproduce: 1.Create a new project 2.Check sa 3.Create an application, then check builds. Actual results: oc get sa NAME SECRETS AGE builder 0 20m default 2 20m deployer 2 20m #oc get builds NAME TYPE FROM STATUS STARTED DURATION ruby22-sample-build-1 Source Git New (CannotCreateBuildPod) # oc describe builds ruby22-sample-build-1 Name: ruby22-sample-build-1 Created: 14 minutes ago Labels: app=ruby22-sample-build,buildconfig=ruby22-sample-build,name=ruby22-sample-build,openshift.io/build-config.name=ruby22-sample-build,template=application-template-stibuild Annotations: openshift.io/build.number=1 Build Config: ruby22-sample-build Duration: waiting for 14m7s Build Pod: ruby22-sample-build-1-build Strategy: Source URL: https://github.com/openshift/ruby-hello-world.git Image Source: copies /opt from registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest to xiuwangs2i-2 From Image: DockerImage registry.access.redhat.com/rhscl/ruby-22-rhel7:latest Output to: ImageStreamTag origin-ruby22-sample:latest Status: New (Failed to create build pod: pods "ruby22-sample-build-1-build" is forbidden: no API token found for service account xiuwang24/builder, retry after the token is automatically created and added to the service account.) Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 14m 14m 1 {build-controller } Warning HandleBuildError Build has error: failed to create build pod: pods "ruby22-sample-build-1-build" is forbidden: no API token found for service account xiuwang24/builder, retry after the token is automatically created and added to the service account 14m <invalid> 785 {build-controller } Warning FailedCreate Error creating: pods "ruby22-sample-build-1-build" is forbidden: no API token found for service account xiuwang24/builder, retry after the token is automatically created and added to the service account Expected results: Additional info: --- Additional comment from Jordan Liggitt on 2016-03-18 14:52:19 EDT --- are there master logs available? looking for things logged from "tokens_controller.go" --- Additional comment from XiuJuan Wang on 2016-03-21 02:59 EDT --- --- Additional comment from XiuJuan Wang on 2016-03-29 04:14:42 EDT --- Could reproduce in online 3.2. # oc get sa -n xiuwang NAME SECRETS AGE builder 0 5m default 1 5m deployer 2 5m # oc get sa -n xiuwang NAME SECRETS AGE builder 2 7m default 2 7m deployer 2 7m --- Additional comment from Jordan Liggitt on 2016-03-29 16:00:48 EDT --- looks like the secret creation is being rejected by the quota admission controller. what quota is in the project? --- Additional comment from Stefanie Forrester on 2016-03-29 17:44:15 EDT --- This seems to be affecting all new app creates on the cluster. Though I can work around it by creating a second build after the first one fails ('oc start-build cakephp-example', in my case). Let's work together tomorrow on debugging it, and I can provide any logs needed. Here are some from my last failed app create: Build has error: failed to create build pod: pods "cakephp-example-1-build" is forbidden: no API token found for service account dakinitest5/builder, retry after the token is automatically created and added to the service account [root@dev-preview-int-master-167b1 ~]# oc get quota dakinitest5-quota -n dakinitest5 -o yaml apiVersion: v1 kind: ResourceQuota metadata: creationTimestamp: 2016-03-29T20:58:05Z name: dakinitest5-quota namespace: dakinitest5 resourceVersion: "2451582" selfLink: /api/v1/namespaces/dakinitest5/resourcequotas/dakinitest5-quota uid: ef22960a-f5f0-11e5-9914-0a2bc7135307 spec: hard: cpu: "4" memory: 2Gi persistentvolumeclaims: "2" pods: "10" replicationcontrollers: "20" resourcequotas: "1" secrets: "20" services: "10" status: hard: cpu: "4" memory: 2Gi persistentvolumeclaims: "2" pods: "10" replicationcontrollers: "20" resourcequotas: "1" secrets: "20" services: "10" used: cpu: "0" memory: "0" persistentvolumeclaims: "0" pods: "0" replicationcontrollers: "0" resourcequotas: "1" secrets: "10" services: "1" --- Additional comment from Jordan Liggitt on 2016-03-30 09:22:05 EDT --- The issue is with token creating controllers not retrying quickly when their attempt to create a secret fails. When secrets are placed under quota (which is not a normal configuration we test with), the quota admission plugin will reject creation attempts until it has scanned the namespace to determine how many secrets are currently being used. When a project template includes a quota that limits the number of secrets, two things happen when a new project is created: 1. The quota admission plugin queues a task to scan for all quotaed objects in the namespace to determine how many are being used. When that task runs, the quota object's status is updated with the current "in use" counts. Until that task completes, attempts to create quotaed objects are rejected with a "Status unknown for quota" error. 2. Controllers immediately start trying to create service account tokens and dockercfg secrets for the service accounts in the namespace. If those create calls are rejected, the controllers wait until their resync period (which can be very long) before they retry creating the tokens and dockercfg secrets.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1343