Description of problem: Building the same application on 4.0 vs 3.11 takes significantly longer. Using the quickstarts as examples, the difference is: Building the same app on m5.xlarge AWS instances running in us-west-2 on 4.0 and 3.11 yields the following build times. The times below are a) first/last timestamp in build pod log, b) time reported in oc get builds and c) time reported by "time oc start-build --follow --timestamps". A sample pod lifecycle comparison is in Additional Information. 3.11 4.0 a b c a b c nodejs-mongodb-example 18s 22s 26s 32s 54s 58s django-psql-example 15s 26s 31s 38s 63s 68s rails-postgresql-ex 101s 112s 117s 113s 142s 145s dancer-mysql-example 114s 124s 129s 120s 149s 155s cakephp-mysql-example 37s 49s 53s 51s 81s 89s The time from start of Init to the build pod Running seems to consistently be 10 seconds longer on 4.0 regardless of build type. Version-Release number of selected component (if applicable): 4.0.0-0.nightly-2019-03-04-180718 How reproducible: Always Additional info: Sample timeline for cakephp. 3.11: nodejs-mongodb-example-1-build 0/1 Completed 0 11m nodejs-mongodb-example-2-build 0/1 Completed 0 7m nodejs-mongodb-example-3-build 0/1 Pending 0 0s nodejs-mongodb-example-3-build 0/1 Pending 0 0s nodejs-mongodb-example-3-build 0/1 Init:0/2 0 0s nodejs-mongodb-example-3-build 0/1 Init:0/2 0 3s nodejs-mongodb-example-3-build 0/1 Init:1/2 0 8s nodejs-mongodb-example-3-build 0/1 PodInitializing 0 10s nodejs-mongodb-example-3-build 1/1 Running 0 11s nodejs-mongodb-example-3-build 0/1 Completed 0 22s 4.0: nodejs-mongodb-example-3-build 0/1 Pending 0 0s nodejs-mongodb-example-3-build 0/1 Pending 0 0s nodejs-mongodb-example-3-build 0/1 Init:0/2 0 0s nodejs-mongodb-example-3-build 0/1 Init:0/2 0 8s nodejs-mongodb-example-3-build 0/1 Init:0/2 0 8s nodejs-mongodb-example-3-build 0/1 Init:1/2 0 19s nodejs-mongodb-example-3-build 0/1 PodInitializing 0 20s nodejs-mongodb-example-3-build 1/1 Running 0 21s nodejs-mongodb-example-3-build 0/1 Completed 0 53s nodejs-mongodb-example-3-build 0/1 Completed 0 53s django-psql-example: 3.11: django django-psql-example-2-build 0/1 Pending 0 0s django django-psql-example-2-build 0/1 Pending 0 0s django django-psql-example-2-build 0/1 Init:0/2 0 0s django django-psql-example-2-build 0/1 Init:0/2 0 3s django django-psql-example-2-build 0/1 Init:1/2 0 9s django django-psql-example-2-build 0/1 PodInitializing 0 10s django django-psql-example-2-build 1/1 Running 0 11s django django-psql-example-2-build 0/1 Completed 0 26s 4.0: django django-psql-example-2-build 0/1 Pending 0 1s django django-psql-example-2-build 0/1 Pending 0 1s django django-psql-example-2-build 0/1 Init:0/2 0 1s django django-psql-example-2-build 0/1 Init:0/2 0 9s django django-psql-example-2-build 0/1 Init:0/2 0 9s django django-psql-example-2-build 0/1 Init:1/2 0 20s django django-psql-example-2-build 0/1 PodInitializing 0 21s django django-psql-example-2-build 1/1 Running 0 23s django django-psql-example-2-build 0/1 Completed 0 64s django django-psql-example-2-build 0/1 Completed 0 64s
I did a test between 3.11 and 4.1 running the eap72-basic-s2i build. After warming up all of the nodes, I ran 5 builds of each with build-loglevel=5 and compared representative build output. I did not see some of the huge differences mentioned in comment 12, but did breakdown the differences. Most of the descrepancy can be explained by the upfront pull of the jboss-eap72-openshift image but there is a non-trivial difference in additional time in 4.1 after the BUILD SUCCESS message and before the push starts. Details: oc get builds -o wide 3.11: eap-app-9 Source Git@99fa61a Complete 3 minutes ago 36s 4.1: eap-app-9 Source Git@99fa61a Complete 2 minutes ago 1m30s Seconds to reach milestone in build: 3.11 4.1 Start -> Performing Maven build 19s 58s BUILD SUCCESS 7s 7s Pushing image... 1s 10s Push complete 6s 6s I'll attach the logs. Experiments continue.
Created attachment 1574916 [details] 3.11 and 4.1 logs at build-loglevel=5
The other phenomenon (which can be seen in https://bugzilla.redhat.com/show_bug.cgi?id=1704722#c17) is that the greater the number of concurrent builds started simultaneously on a node, the faster average build times degrade on 4.1. The initial pulls of the build images get longer the more concurrent builds are running. I'll try to profile the registry and network tomorrow, but my guess is general competition for pulling and writing to storage. Example: building cakephp avg build time - concurrent builds/node: 3.11 4.1 4.1 - time from build log start to "Starting S2I build" message 1 build/node 15s 46s 14s 3 build/node 13s 82s 50s 5 build/node 17s 108s 76s
Created attachment 1575325 [details] logs comparing binary builds in 3.11 and 4.1 Another example using binary builds. On 3.11 the build runs in 8-10 seconds and in 4.1 it is 55-65 seconds. The logs show the bulk of the difference in the initial pull of the jboss-eap71-openshift image and the delay before the start of the push.
Closing this as DEFERRED because there are multiple initiatives on this front aiming to improve build speeds. We have already seen a 50% improvement in build speed/scale between 4.1 and 4.2 due to fixes in buildah and crio. We still need to address the issue with providing a means of caching images on nodes and/or another persistent storage. This is being tracked in JIRA - https://jira.coreos.com/browse/DEVEXP-467