Bug 1704722

Summary: build pods waited too long before running on OCP4
Product: OpenShift Container Platform Reporter: Hongkai Liu <hongkliu>
Component: BuildAssignee: Ben Parees <bparees>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: akamra, akrzos, aos-bugs, bparees, deads, dwalsh, hongkliu, wewang, wzheng
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aos-scalability-41
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:48:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hongkai Liu 2019-04-30 12:31:26 UTC
Description of problem:
concurrent builds on scale-lab: 250 2xlarge workers.
#builds=250, 500, 750, 1000, 2000

1. Create n projects with nodejs app, during the creation, the builds were triggered automatically once.
2. when all builds in 1 above were completed, trigger them by `oc start-build -n ...`

We monitored the number of builds with status `Running`:
A. After the creation: we saw many builds with status `New`, eg, `svt-nodejs-1987   nodejs-mongodb-example-1   Source   Git           New (CannotRetrieveServiceAccount)`
B. After the `oc start-build`, it is at most 200 running builds (about 10% of overall). The rest of them is with `New` or `Pending`

All builds completed eventually but this is different from what we saw on OCP 311. It seems like we have kind of throttling on the number of running builds (specially for B abvoe).

Is this an expected behavior? Or we have some parameter that we can try?


How reproducible: always


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

###pbench node
# oc version
Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0", GitCommit:"da48e8391", GitTreeState:"", BuildDate:"2019-04-18T14:04:32Z", GoVersion:"", Compiler:"", Platform:""}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+81fc896", GitCommit:"81fc896", GitTreeState:"clean", BuildDate:"2019-04-21T23:18:54Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
###jump node for project>=250
# oc version
Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0-201904280051+49f08ba-dirty", GitCommit:"49f08ba", GitTreeState:"dirty", BuildDate:"2019-04-28T05:26:29Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+81fc896", GitCommit:"81fc896", GitTreeState:"clean", BuildDate:"2019-04-21T23:18:54Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

# oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-04-22-005054   True        False         5d15h   Cluster version is 4.1.0-0.nightly-2019-04-22-005054

# oc get pod -n openshift-image-registry                                image-registry-8ffd766d8-z6sld -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
image-registry-8ffd766d8-z6sld   1/1     Running   2          5d21h   10.128.5.253   ip-10-0-129-208.us-west-2.compute.internal   <none>           <none>
# no build pods running on this node
# oc get node ip-10-0-129-208.us-west-2.compute.internal 
NAME                                         STATUS   ROLES   AGE     VERSION
ip-10-0-129-208.us-west-2.compute.internal   Ready    infra   5d21h   v1.13.4+da48e8391

# oc get configs.imageregistry.operator.openshift.io cluster -o json | jq -r .spec.storage
{
  "s3": {
    "bucket": "image-registry-us-west-2-XXX",
    "encrypt": true,
    "keyID": "",
    "region": "us-west-2",
    "regionEndpoint": ""
  }
}

Comment 1 Ben Parees 2019-04-30 12:37:48 UTC
if they are in "new(CannotRetrieveServiceAccount)" it is because the SA controller has not created the SAs for your new project yet...  that is either "working as designed" or a bug against the SA controller to create the SAs faster, depending how much latency you are seeing between project creation and SA creation.

If they are in New for any other reason, we'd need to see the details of the build yaml and the controller logs to understand why.

If they are in pending, you should look for the associated build-pod and see why it is not running..most likely your cluster doesn't have enough resources to schedule the pod.  The pod yaml/descrption + events should provide this information.

Comment 4 Ben Parees 2019-04-30 17:48:46 UTC
Based on these logs it looks like you may have attempted this multiple times.

Can we get a clean run with timestamps for when you kicked off the run(created the buildconfigs)?

It would also help if you can enable loglevel=4 on the openshift controller manager, you can do this by:

oc edit openshiftcontrollermanagers.operator.openshift.io/cluster

set spec.logLevel: "4"

wait for the the controller manager pods to rollout.

then kick off your run.  Once you see some build pods in a complete state with their corresponding build still reporting "new" or "pending", gather the controller logs and tell us the exact namespace+name of the build in question.


My current theory (based on nothing in particular) is that the build controller is falling way behind on processing the pod events and getting the builds updated.  So it will also be useful to know if over time, your builds are eventually making it into a completed state.

Comment 5 Hongkai Liu 2019-04-30 18:28:05 UTC
# oc get pod -n svt-nodejs-861   nodejs-mongodb-example-8-build
NAME                             READY   STATUS      RESTARTS   AGE
nodejs-mongodb-example-8-build   0/1     Completed   0          4m55s
root@ip-172-31-19-86: ~ # 
root@ip-172-31-19-86: ~ # 
root@ip-172-31-19-86: ~ # oc get build -n svt-nodejs-861   nodejs-mongodb-example-8
NAME                       TYPE     FROM          STATUS    STARTED   DURATION
nodejs-mongodb-example-8   Source   Git@e59fe75   Pending             


logs are on the way.

Comment 6 Ben Parees 2019-04-30 22:41:29 UTC
Some data points/info on this:


The builds do eventually complete.  It appears we have a backlog of events to process and the build controller eventually works its way through them.  It is not clear (due to insufficient log tracing) whether the issue is that we aren't getting the events fast enough, or we aren't processing them fast enough.

We effectively process 4800 events in 5 minutes (those are both build+pod create/update events), which amounts to about 16/sec.  Since we have 5 queue workers, that's really 3/sec per worker.  or 1/3sec per event processing.  That sounds a bit slow to me, but there's quite a range of work to be done, in some cases no-op, in other cases pods have to be created.  It's possible we are also getting client-throttled which makes the worker take longer to process events that require api calls.

based on output like this:
I0430 21:46:40.074363       1 request.go:530] Throttling request took 293.9029ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/svt-nodejs-939/serviceaccounts/builder


I think we are getting throttled.  Need to figure out how to unthrottle the build controller's client.

Comment 8 Ben Parees 2019-04-30 23:04:23 UTC
David, did we change how client throttling works, or how client rate limiting is configured for controllers, in any meaningful way in 4.1?

If i'm reading the code correctly, it seems like all controllers in the process are sharing a single rate limit of 5 (burst to 10) for each openshift client type (builds, images, etc) and then sharing another rate limit of 5/10 for all k8s resources.

Comment 9 Ben Parees 2019-05-01 13:59:37 UTC
ok I dumped the rest config that the build controller is using, looks like it is configured as:

QPS:16, Burst:31

Comment 10 Ben Parees 2019-05-01 16:52:36 UTC
https://github.com/openshift/origin/pull/22734

Comment 12 Hongkai Liu 2019-05-07 15:18:55 UTC
Tested with 11 m5.2xlarge workers (including one dedicated to image-registry pod): the scale-lab cluster is gone.
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-07-085358   True        False         107m    Cluster version is 4.1.0-0.nightly-2019-05-07-085358

Results:
Much bigger number of conc. builds (500-700) was shown up:
http://file.rdu.redhat.com/~hongkliu/test_result/bz1704722/20190507/Screenshot%20from%202019-05-07%2009-17-31.png
Although it is not as expected, 1000, I think it could be related to the limited computing resources.
http://file.rdu.redhat.com/~hongkliu/test_result/bz1704722/20190507/Screenshot%20from%202019-05-07%2009-53-52.png
The build pods stayed in Pending and Init status for quite long.
So this can be confirmed only when i get hold of scale-lab the next round.


With it being said, I am convinced Problem B in this bug is verified.

During the test, I still saw Problem A: CannotRetrieveServiceAccount.
Will fire a new one if encountering again the next round.

Comment 13 Hongkai Liu 2019-05-07 15:21:01 UTC
Recovering need info

Comment 14 Ben Parees 2019-05-07 15:29:16 UTC
The service account bit would need to go to the master team.  The issue there is that you are creating new projects and then, before the service accounts are created, you are creating builds in those projects.  The build needs the service account to proceed, so it sits in New until the SA exists.  It's not a bug per se, but if you think it's taking too long for the SAs to get created, then you'd have to ask the master team to look into it as a separate bug.

Comment 15 Hongkai Liu 2019-05-07 17:23:07 UTC
All clear, Ben.
Thanks for the information.
I am not concerned about the SA issue either (since it wont be a problem after creation). Mentioning it here because we never hit it for OCP 311.

I am testing more with less conc. builds (since I have a much smaller cluster now) and will compare the build/push time in the last round (since you fixed the threadhold, I want to know if it has any impacts). Will update here.

Comment 16 Ben Parees 2019-05-07 17:30:35 UTC
I'd be surprised if it has any impact on build/push time.

Comment 17 Hongkai Liu 2019-05-07 20:25:07 UTC
This is what I got before your fix in the last round (4.0.0-0.nightly-2019-04-10-182914)
#500 (n=2) s3
2019-04-11 20:09:59,640 - build_test - MainThread - INFO - Failed builds: 2
2019-04-11 20:09:59,640 - build_test - MainThread - INFO - Invalid builds: 0
2019-04-11 20:09:59,640 - build_test - MainThread - INFO - Good builds included in stats: 998
2019-04-11 20:09:59,640 - build_test - MainThread - INFO - Average build time, all good builds: 1386
2019-04-11 20:09:59,640 - build_test - MainThread - INFO - Minimum build time, all good builds: 299
2019-04-11 20:09:59,640 - build_test - MainThread - INFO - Maximum build time, all good builds: 1850
2019-04-11 20:09:59,640 - build_test - MainThread - INFO - Average push time, all good builds: 13.8296593186
2019-04-11 20:09:59,640 - build_test - MainThread - INFO - Minimum push time, all good builds: 2.0
2019-04-11 20:09:59,640 - build_test - MainThread - INFO - Maximum push time, all good builds: 94.0

result today with your fix (4.1.0-0.nightly-2019-05-07-085358)
2019-05-07 19:03:14,219 - build_test - MainThread - INFO - Failed builds: 92
2019-05-07 19:03:14,219 - build_test - MainThread - INFO - Invalid builds: 0
2019-05-07 19:03:14,219 - build_test - MainThread - INFO - Good builds included in stats: 908
2019-05-07 19:03:14,219 - build_test - MainThread - INFO - Average build time, all good builds: 1374
2019-05-07 19:03:14,219 - build_test - MainThread - INFO - Minimum build time, all good builds: 719
2019-05-07 19:03:14,219 - build_test - MainThread - INFO - Maximum build time, all good builds: 2164
2019-05-07 19:03:14,219 - build_test - MainThread - INFO - Average push time, all good builds: 12.0947136564
2019-05-07 19:03:14,219 - build_test - MainThread - INFO - Minimum push time, all good builds: 2.0
2019-05-07 19:03:14,219 - build_test - MainThread - INFO - Maximum push time, all good builds: 77.0

Does not change much on the build/push time.

In the test we see the number of conc. builds/pods: 250/500 ... exactly like 311.
http://file.rdu.redhat.com/~hongkliu/test_result/bz1704722/20190507/round2/

About the build time, results on OCP 311: 500 builds
2018-09-26 16:25:15,514 - build_test - MainThread - INFO - Average build time, all good builds: 240
2018-09-26 16:25:15,514 - build_test - MainThread - INFO - Minimum build time, all good builds: 70
2018-09-26 16:25:15,514 - build_test - MainThread - INFO - Maximum build time, all good builds: 332
2018-09-26 16:25:15,514 - build_test - MainThread - INFO - Average push time, all good builds: 30.2643333333
2018-09-26 16:25:15,514 - build_test - MainThread - INFO - Minimum push time, all good builds: 1.0
2018-09-26 16:25:15,514 - build_test - MainThread - INFO - Maximum push time, all good builds: 147.0

The avg build time is much smaller, but I think that is caused by docker -> crio/buildah change from 311 to 40.

Comment 19 errata-xmlrpc 2019-06-04 10:48:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Comment 20 Red Hat Bugzilla 2023-09-14 05:27:45 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days