Bug 1324179 - Some serviceaccounts are delayed to create after project created
Summary: Some serviceaccounts are delayed to create after project created
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.2.1
Assignee: Jordan Liggitt
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On:
Blocks: 1383870
TreeView+ depends on / blocked
 
Reported: 2016-04-05 18:18 UTC by Abhishek Gupta
Modified: 2016-10-12 02:59 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Creation of the "builder" and "deployer" service accounts could be delayed for newly-created projects, during which time users could not build or deploy applications. This was caused by an issue when project templates defined a quota for secrets. This bug fix ensures that service accounts and their tokens are created quickly in this scenario (within seconds), and as a result users do not have to wait after project creation to build or deploy applications.
Clone Of: 1318917
: 1357248 1383870 (view as bug list)
Environment:
Last Closed: 2016-06-27 15:05:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1343 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 3.2.1.1 bug fix and enhancement update 2016-06-27 19:04:05 UTC

Description Abhishek Gupta 2016-04-05 18:18:38 UTC
+++ This bug was initially created as a clone of Bug #1318917 +++

Description of problem:
builder or builder,deployer serviceaccounts are delayed to create after project created sometimes.The delay time is a random one, maybe 6 mins to 1 hours.So during this time,customer can't do build or deploy operation.


Version-Release number of selected component (if applicable):
oc v3.2.0.4
kubernetes v1.2.0-origin-41-g91d3e75
Docker 1.8.2-el7, build a01dc02/1.8.2
kernel 3.10.0-327.10.1.el7.x86_64

How reproducible:
sometimes

Steps to Reproduce:
1.Create a new project
2.Check sa
3.Create an application, then check builds.

Actual results:
oc get sa
NAME       SECRETS   AGE
builder    0         20m
default    2         20m
deployer   2         20m

#oc get builds
NAME                    TYPE      FROM      STATUS                       STARTED   DURATION
ruby22-sample-build-1   Source    Git       New (CannotCreateBuildPod)  

# oc describe builds ruby22-sample-build-1 
Name:		ruby22-sample-build-1
Created:	14 minutes ago
Labels:		app=ruby22-sample-build,buildconfig=ruby22-sample-build,name=ruby22-sample-build,openshift.io/build-config.name=ruby22-sample-build,template=application-template-stibuild
Annotations:	openshift.io/build.number=1
Build Config:	ruby22-sample-build
Duration:	waiting for 14m7s
Build Pod:	ruby22-sample-build-1-build
Strategy:	Source
URL:		https://github.com/openshift/ruby-hello-world.git
Image Source:	copies /opt from registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest to xiuwangs2i-2
From Image:	DockerImage registry.access.redhat.com/rhscl/ruby-22-rhel7:latest
Output to:	ImageStreamTag origin-ruby22-sample:latest
Status:		New (Failed to create build pod: pods "ruby22-sample-build-1-build" is forbidden: no API token found for service account xiuwang24/builder, retry after the token is automatically created and added to the service account.)
Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  14m		14m		1	{build-controller }			Warning		HandleBuildError	Build has error: failed to create build pod: pods "ruby22-sample-build-1-build" is forbidden: no API token found for service account xiuwang24/builder, retry after the token is automatically created and added to the service account
  14m		<invalid>	785	{build-controller }			Warning		FailedCreate		Error creating: pods "ruby22-sample-build-1-build" is forbidden: no API token found for service account xiuwang24/builder, retry after the token is automatically created and added to the service account


Expected results:


Additional info:

--- Additional comment from Jordan Liggitt on 2016-03-18 14:52:19 EDT ---

are there master logs available? looking for things logged from "tokens_controller.go"

--- Additional comment from XiuJuan Wang on 2016-03-21 02:59 EDT ---



--- Additional comment from XiuJuan Wang on 2016-03-29 04:14:42 EDT ---

Could reproduce in online 3.2.

# oc get  sa  -n xiuwang 
NAME       SECRETS   AGE
builder    0         5m
default    1         5m
deployer   2         5m

# oc get  sa  -n xiuwang 
NAME       SECRETS   AGE
builder    2         7m
default    2         7m
deployer   2         7m

--- Additional comment from Jordan Liggitt on 2016-03-29 16:00:48 EDT ---

looks like the secret creation is being rejected by the quota admission controller. what quota is in the project?

--- Additional comment from Stefanie Forrester on 2016-03-29 17:44:15 EDT ---

This seems to be affecting all new app creates on the cluster. Though I can work around it by creating a second build after the first one fails ('oc start-build cakephp-example', in my case).

Let's work together tomorrow on debugging it, and I can provide any logs needed. Here are some from my last failed app create:

Build has error: failed to create build pod: pods "cakephp-example-1-build" is forbidden: no API token found for service account dakinitest5/builder, retry after the token is automatically created and added to the service account

[root@dev-preview-int-master-167b1 ~]# oc get quota dakinitest5-quota -n dakinitest5 -o yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  creationTimestamp: 2016-03-29T20:58:05Z
  name: dakinitest5-quota
  namespace: dakinitest5
  resourceVersion: "2451582"
  selfLink: /api/v1/namespaces/dakinitest5/resourcequotas/dakinitest5-quota
  uid: ef22960a-f5f0-11e5-9914-0a2bc7135307
spec:
  hard:
    cpu: "4"
    memory: 2Gi
    persistentvolumeclaims: "2"
    pods: "10"
    replicationcontrollers: "20"
    resourcequotas: "1"
    secrets: "20"
    services: "10"
status:
  hard:
    cpu: "4"
    memory: 2Gi
    persistentvolumeclaims: "2"
    pods: "10"
    replicationcontrollers: "20"
    resourcequotas: "1"
    secrets: "20"
    services: "10"
  used:
    cpu: "0"
    memory: "0"
    persistentvolumeclaims: "0"
    pods: "0"
    replicationcontrollers: "0"
    resourcequotas: "1"
    secrets: "10"
    services: "1"

--- Additional comment from Jordan Liggitt on 2016-03-30 09:22:05 EDT ---

The issue is with token creating controllers not retrying quickly when their attempt to create a secret fails.

When secrets are placed under quota (which is not a normal configuration we test with), the quota admission plugin will reject creation attempts until it has scanned the namespace to determine how many secrets are currently being used.

When a project template includes a quota that limits the number of secrets, two things happen when a new project is created:

1. The quota admission plugin queues a task to scan for all quotaed objects in the namespace to determine how many are being used. When that task runs, the quota object's status is updated with the current "in use" counts. Until that task completes, attempts to create quotaed objects are rejected with a "Status unknown for quota" error.

2. Controllers immediately start trying to create service account tokens and dockercfg secrets for the service accounts in the namespace. If those create calls are rejected, the controllers wait until their resync period (which can be very long) before they retry creating the tokens and dockercfg secrets.

Comment 6 errata-xmlrpc 2016-06-27 15:05:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1343


Note You need to log in before you can comment on or make changes to this bug.