Bug 1685352
Summary: | S2I build times 1.25x - 5x longer on 4.1 compared to 3.11 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||||
Component: | Build | Assignee: | Adam Kaplan <adam.kaplan> | ||||||
Status: | CLOSED DEFERRED | QA Contact: | wewang <wewang> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 4.1.0 | CC: | aos-bugs, bparees, dornelas, dwalsh, erich, hongkliu, nstielau, ochaloup, pkremens, pthomas, rheinzma, ricferna, skordas, vcojot, wzheng | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.4.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | aos-scalability-40 | ||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-12-03 15:18:28 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Mike Fiedler
2019-03-05 02:26:35 UTC
I did a test between 3.11 and 4.1 running the eap72-basic-s2i build. After warming up all of the nodes, I ran 5 builds of each with build-loglevel=5 and compared representative build output. I did not see some of the huge differences mentioned in comment 12, but did breakdown the differences. Most of the descrepancy can be explained by the upfront pull of the jboss-eap72-openshift image but there is a non-trivial difference in additional time in 4.1 after the BUILD SUCCESS message and before the push starts. Details: oc get builds -o wide 3.11: eap-app-9 Source Git@99fa61a Complete 3 minutes ago 36s 4.1: eap-app-9 Source Git@99fa61a Complete 2 minutes ago 1m30s Seconds to reach milestone in build: 3.11 4.1 Start -> Performing Maven build 19s 58s BUILD SUCCESS 7s 7s Pushing image... 1s 10s Push complete 6s 6s I'll attach the logs. Experiments continue. Created attachment 1574916 [details]
3.11 and 4.1 logs at build-loglevel=5
The other phenomenon (which can be seen in https://bugzilla.redhat.com/show_bug.cgi?id=1704722#c17) is that the greater the number of concurrent builds started simultaneously on a node, the faster average build times degrade on 4.1. The initial pulls of the build images get longer the more concurrent builds are running. I'll try to profile the registry and network tomorrow, but my guess is general competition for pulling and writing to storage. Example: building cakephp avg build time - concurrent builds/node: 3.11 4.1 4.1 - time from build log start to "Starting S2I build" message 1 build/node 15s 46s 14s 3 build/node 13s 82s 50s 5 build/node 17s 108s 76s Created attachment 1575325 [details]
logs comparing binary builds in 3.11 and 4.1
Another example using binary builds. On 3.11 the build runs in 8-10 seconds and in 4.1 it is 55-65 seconds. The logs show the bulk of the difference in the initial pull of the jboss-eap71-openshift image and the delay before the start of the push.
Closing this as DEFERRED because there are multiple initiatives on this front aiming to improve build speeds. We have already seen a 50% improvement in build speed/scale between 4.1 and 4.2 due to fixes in buildah and crio. We still need to address the issue with providing a means of caching images on nodes and/or another persistent storage. This is being tracked in JIRA - https://jira.coreos.com/browse/DEVEXP-467 |