Bug 1685352 - S2I build times 1.25x - 5x longer on 4.1 compared to 3.11
Summary: S2I build times 1.25x - 5x longer on 4.1 compared to 3.11
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.4.0
Assignee: Adam Kaplan
QA Contact: wewang
URL:
Whiteboard: aos-scalability-40
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-05 02:26 UTC by Mike Fiedler
Modified: 2024-03-25 15:14 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-03 15:18:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
3.11 and 4.1 logs at build-loglevel=5 (2.40 MB, application/x-tar)
2019-05-29 16:51 UTC, Mike Fiedler
no flags Details
logs comparing binary builds in 3.11 and 4.1 (50.00 KB, application/x-tar)
2019-05-30 17:58 UTC, Mike Fiedler
no flags Details

Description Mike Fiedler 2019-03-05 02:26:35 UTC
Description of problem:

Building the same application on 4.0 vs 3.11 takes significantly longer.   Using the quickstarts as examples, the difference is:

Building the same app on m5.xlarge AWS instances running in us-west-2 on 4.0 and 3.11 yields the following build times.   The times below are a) first/last timestamp in build pod log, b) time reported in oc get builds and c) time reported by "time oc start-build --follow --timestamps".    A sample pod lifecycle comparison is in Additional Information.
                                  3.11                       4.0
                          a        b         c       a       b        c
nodejs-mongodb-example   18s      22s       26s     32s     54s      58s
django-psql-example      15s      26s       31s     38s     63s      68s
rails-postgresql-ex     101s     112s      117s    113s    142s     145s
dancer-mysql-example    114s     124s      129s    120s    149s     155s
cakephp-mysql-example    37s      49s       53s     51s     81s      89s


The time from start of Init to the build pod Running seems to consistently be 10 seconds longer on 4.0 regardless of build type.

Version-Release number of selected component (if applicable):  4.0.0-0.nightly-2019-03-04-180718


How reproducible: Always


Additional info:

Sample timeline for cakephp.  

3.11:

nodejs-mongodb-example-1-build   0/1       Completed   0          11m
nodejs-mongodb-example-2-build   0/1       Completed   0          7m
nodejs-mongodb-example-3-build   0/1       Pending   0         0s
nodejs-mongodb-example-3-build   0/1       Pending   0         0s
nodejs-mongodb-example-3-build   0/1       Init:0/2   0         0s
nodejs-mongodb-example-3-build   0/1       Init:0/2   0         3s
nodejs-mongodb-example-3-build   0/1       Init:1/2   0         8s
nodejs-mongodb-example-3-build   0/1       PodInitializing   0         10s
nodejs-mongodb-example-3-build   1/1       Running   0         11s
nodejs-mongodb-example-3-build   0/1       Completed   0         22s


4.0:

nodejs-mongodb-example-3-build   0/1   Pending   0     0s
nodejs-mongodb-example-3-build   0/1   Pending   0     0s
nodejs-mongodb-example-3-build   0/1   Init:0/2   0     0s
nodejs-mongodb-example-3-build   0/1   Init:0/2   0     8s
nodejs-mongodb-example-3-build   0/1   Init:0/2   0     8s
nodejs-mongodb-example-3-build   0/1   Init:1/2   0     19s
nodejs-mongodb-example-3-build   0/1   PodInitializing   0     20s
nodejs-mongodb-example-3-build   1/1   Running   0     21s
nodejs-mongodb-example-3-build   0/1   Completed   0     53s
nodejs-mongodb-example-3-build   0/1   Completed   0     53s

django-psql-example:

3.11:

django    django-psql-example-2-build   0/1       Pending   0         0s
django    django-psql-example-2-build   0/1       Pending   0         0s
django    django-psql-example-2-build   0/1       Init:0/2   0         0s
django    django-psql-example-2-build   0/1       Init:0/2   0         3s
django    django-psql-example-2-build   0/1       Init:1/2   0         9s
django    django-psql-example-2-build   0/1 PodInitializing   0         10s
django    django-psql-example-2-build   1/1       Running   0         11s
django    django-psql-example-2-build   0/1       Completed   0         26s


4.0:

django   django-psql-example-2-build   0/1   Pending   0     1s
django   django-psql-example-2-build   0/1   Pending   0     1s
django   django-psql-example-2-build   0/1   Init:0/2   0     1s
django   django-psql-example-2-build   0/1   Init:0/2   0     9s
django   django-psql-example-2-build   0/1   Init:0/2   0     9s
django   django-psql-example-2-build   0/1   Init:1/2   0     20s
django   django-psql-example-2-build   0/1   PodInitializing   0     21s
django   django-psql-example-2-build   1/1   Running   0     23s
django   django-psql-example-2-build   0/1   Completed   0     64s
django   django-psql-example-2-build   0/1   Completed   0     64s

Comment 14 Mike Fiedler 2019-05-29 16:48:51 UTC
I did a test between 3.11 and 4.1 running the eap72-basic-s2i build.   After warming up all of the nodes, I ran 5 builds of each with build-loglevel=5 and compared representative build output.   I did not see some of the huge differences mentioned in comment 12, but did breakdown the differences.   Most of the descrepancy can be explained by the upfront pull of the jboss-eap72-openshift image but there is a non-trivial difference in additional time in 4.1 after the BUILD SUCCESS message and before the push starts.

Details:

oc get builds -o wide

3.11: eap-app-9   Source   Git@99fa61a   Complete   3 minutes ago       36s
4.1:  eap-app-9   Source   Git@99fa61a   Complete   2 minutes ago       1m30s 

Seconds to reach milestone in build:
                                           3.11             4.1
Start -> Performing Maven build             19s             58s
BUILD SUCCESS                                7s              7s
Pushing image...                             1s             10s
Push complete                                6s              6s

I'll attach the logs.   Experiments continue.

Comment 15 Mike Fiedler 2019-05-29 16:51:51 UTC
Created attachment 1574916 [details]
3.11 and 4.1 logs at build-loglevel=5

Comment 16 Mike Fiedler 2019-05-30 00:52:46 UTC
The other phenomenon (which can be seen in https://bugzilla.redhat.com/show_bug.cgi?id=1704722#c17) is that the greater the number of concurrent builds started simultaneously on a node, the faster average build times degrade on 4.1.   The initial pulls of the build images get longer the more concurrent builds are running.   I'll try to profile the registry and network tomorrow, but my guess is general competition for pulling and writing to storage.

Example:  building cakephp avg build time - concurrent builds/node:
                   3.11               4.1         4.1 - time from build log start to "Starting S2I build" message
1 build/node        15s               46s              14s      
3 build/node        13s               82s              50s
5 build/node        17s              108s              76s

Comment 17 Mike Fiedler 2019-05-30 17:58:16 UTC
Created attachment 1575325 [details]
logs comparing binary builds in 3.11 and 4.1

Another example using binary builds.   On 3.11 the build runs in 8-10 seconds and in 4.1 it is 55-65 seconds.   The logs show the bulk of the difference in the initial pull of the jboss-eap71-openshift image and the delay before the start of the push.

Comment 21 Adam Kaplan 2019-12-03 15:18:28 UTC
Closing this as DEFERRED because there are multiple initiatives on this front aiming to improve build speeds.
We have already seen a 50% improvement in build speed/scale between 4.1 and 4.2 due to fixes in buildah and crio.

We still need to address the issue with providing a means of caching images on nodes and/or another persistent storage.
This is being tracked in JIRA - https://jira.coreos.com/browse/DEVEXP-467


Note You need to log in before you can comment on or make changes to this bug.