Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1899820

Summary: Running s2i build via buildconfig takes 2-3 hours to complete for openshift 4.4.27
Product: OpenShift Container Platform Reporter: Amit Kumar Das <amdas>
Component: BuildAssignee: Adam Kaplan <adam.kaplan>
Status: CLOSED DUPLICATE QA Contact: XiuJuan Wang <xiuwang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4CC: adam.kaplan, aos-bugs, sthaha
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-14 18:10:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Amit Kumar Das 2020-11-20 05:59:04 UTC
Description of problem:
Azure RedHat OpenShift  (ARO)
Java build took 2-3 hours to finish.  The build image process took more than 2 hour.

Version-Release number of selected component (if applicable):

Cluster API Address: https://xxx.xxx.xx.com:6443
Cluster ID: c5ac2a2f-26cd-4e6e-9fa6-xxxxxxxx
Provider: Azure
OpenShift Version: 4.4.27

Details: Analyzing the build log shows 

# file: 0070-build-regen-s2i-11.yaml

### status section of the build

 86   duration: 5228000000000  ##  build took  87m ( 1h 27m )
 87   startTimestamp: '2020-11-17T21:39:27Z'
 88   stages:
 89     - name: FetchInputs
 90       startTime: '2020-11-17T21:39:30Z'
 91       durationMilliseconds: 560
  ...
 96     - name: PullImages
 97       startTime: '2020-11-17T21:39:33Z'
 98       durationMilliseconds: 183971   ## 3 mins
 ...
103     - name: Build
104       startTime: '2020-11-17T21:42:37Z'
105       durationMilliseconds: 3855336  ## Image build took: 3855.336s -> 64.25 mins  1h 4m
106       steps:
107         - name: DockerBuild
108           startTime: '2020-11-17T21:42:37Z'
109           durationMilliseconds: 3855336
110     - name: PushImage
111       startTime: '2020-11-17T22:46:52Z'
112       durationMilliseconds: 1182343  ## Pushing the image too: 19m
 ...
117 #  Pods lifecycle
118 #  New      '2020-11-17T21:39:27Z'
119 #  Pending  '2020-11-17T21:39:31Z'
120 #  Running  '2020-11-17T23:06:35Z'
121 #  Complete '2020-11-17T23:06:35Z'  # 21:39 - 23:06  -> 87m or 1h 27m (see: duration)


In below logs approximate ~68 mins are just wait time with no log entries. Mostly wait or delay is seem whenever blob or overlay is involved.

log between '2020-11-17T21:39’ till ‘2020-11-17T23:06’ (~87m build time).
LOG :  0080-regen-s2i-11.log 

## Approx ~17 min delay

#grep -i "2020-11-17T21" 0080-regen-s2i-11.log
\"created\":\"2020-11-17T21:42:38.770589487Z\",\"created_by\":\"/bin/sh -c #(nop) USER root\",\"empty_layer\":true},{
\"created\":\"2020-11-17T21:42:40.037909784Z\",\"created_by\":\"/bin/sh -c #(nop) COPY dir:deeacbaee7f76e446e4d6855a23d2b4d265c1392dfcce0bdd7646ae29b675363 in /tmp/src \",\"empty_layer\":true},{
\"created\":\"2020-11-17T21:42:41.237594799Z\",\"created_by\":\"/bin/sh -c chown -R 185:0 /tmp/src\",\"empty_layer\":true},{
\"created\":\"2020-11-17T21:42:41.237772811Z\",\"created_by\":\"/bin/sh -c #(nop) USER 185\",\"empty_layer\":true},{
\"created\":\"2020-11-17T21:58:08.138889668Z\",\"created_by\":\"/bin/sh -c /usr/local/s2i/assemble\",\"empty_layer\":true},{
\"created\":\"2020-11-17T21:58:08.180508258Z\",\"created_by\":\"/bin/sh -c #(nop) CMD /usr/local/s2i/run\"}]}"
\"created\":\"2020-11-17T21:58:08.180508258Z\",\"container\":\"193e6352fd230713694ec83d608bd99fadbd7e51b45ab798b64f9705b5154c48\",\"container_config\":{


## Approx ~11 min delay

#rep -i "2020-11-17T22" 0080-regen-s2i-11.log
ime="2020-11-17T22:08:22Z" level=debug msg="using \"/tmp/buildah074119291\" to hold temporary data"
time="2020-11-17T22:08:22Z" level=debug msg="Tar with options on /var/lib/containers/storage/overlay/9e58596e7c07b7e388803bb285098370e6b7c1d0dfb4c22b065629b635a17959/diff"
time="2020-11-17T22:19:35Z" level=debug msg="layer \"9e58596e7c07b7e388803bb285098370e6b7c1d0dfb4c22b065629b635a17959\" size is 18796365824 bytes"
time="2020-11-17T22:19:35Z" level=debug msg="OCIv1 config = {
time="2020-11-17T22:19:35Z" level=debug msg="OCIv1 manifest = {
time="2020-11-17T22:19:35Z" level=debug msg="Docker v2s2 config = {

## Approx ~17 min delay 

time="2020-11-17T22:19:35Z" level=debug msg="Skipping blob sha256:1b6a6dcaa1e6085e35334b0b1cdf174da94a31bb7c46a9ebbc88c7fa4d645009 (already present):"
time="2020-11-17T22:19:35Z" level=debug msg="reading layer \"sha256:5b2aa6832b08a06e3a1ade2bee40cdfc442c0dee523287a5817c7f4c70b382f1\""
time="2020-11-17T22:19:35Z" level=debug msg="No compression detected"
time="2020-11-17T22:19:35Z" level=debug msg="Using original blob without modification"
time="2020-11-17T22:35:13Z" level=debug msg="added blob \"sha256:5b2aa6832b08a06e3a1ade2bee40cdfc442c0dee523287a5817c7f4c70b382f1\" to the cache at \"/var/cache/blobs\""
time="2020-11-17T22:35:13Z" level=debug msg="finished reading layer \"sha256:5b2aa6832b08a06e3a1ade2bee40cdfc442c0dee523287a5817c7f4c70b382f1\""
time="2020-11-17T22:35:13Z" level=debug msg="No compression detected"

## Approx ~11 mins delay

time="2020-11-17T22:35:13Z" level=debug msg="Using original blob without modification"
time="2020-11-17T22:35:13Z" level=debug msg="added blob \"sha256:a629615e1df9988da42e117f5cee071b9ced25459c683710b1128a166026dfc6\" to the cache at \"/var/cache/blobs\""
time="2020-11-17T22:35:13Z" level=debug msg="Applying tar in /var/lib/containers/storage/overlay/8655b38682fa3f3e08128accbd130776d992e00f05265e01d500ee8430728ad1/diff"
time="2020-11-17T22:46:25Z" level=debug msg="setting image creation date to 2020-11-17 21:58:08.180508258 +0000 UTC"
time="2020-11-17T22:46:25Z" level=debug msg="created new image ID \"a629615e1df9988da42e117f5cee071b9ced25459c683710b1128a166026dfc6\""
time="2020-11-17T22:46:25Z" level=debug msg="set names of image \"a629615e1df9988da42e117f5cee071b9ced25459c683710b1128a166026dfc6\" to [temp.builder.openshift.io/nilpoc/regen-s2i-11:400e4465]"
time="2020-11-17T22:46:27Z" level=debug msg="saved image metadata \"{

## Approx ~12 mins delay

time="2020-11-17T22:53:50Z" level=debug msg="PATCH https://image-registry.openshift-image-registry.svc:5000/v2/nilpoc/regen-s2i/blobs/uploads/0d40d263-54a4-4fa6-8f58-85d43e0ce327?_state=kp-2UESQ97w26Vz139ellIxIZV0xTqd9LDlZ2gTDpAl7Ik5hbWUiOiJuaWxwb2MvcmVnZW4tczJpIiwiVVVJRCI6IjBkNDBkMjYzLTU0YTQtNGZhNi04ZjU4LTg1ZDQzZTBjZTMyNyIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAyMC0xMS0xN1QyMjo1Mzo1MC43Njg5MTA5MDJaIn0%3D"
time="2020-11-17T22:53:50Z" level=debug msg="... already exists"
time="2020-11-17T22:53:50Z" level=debug msg="Skipping blob sha256:dda497eef22b8fc30d499a43904508a149bcff99bbce302ae1b6690b3358f2b9 (already present):"
Copying blob sha256:dda497eef22b8fc30d499a43904508a149bcff99bbce302ae1b6690b3358f2b9
time="2020-11-17T23:06:32Z" level=debug msg="GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?account=serviceaccount&scope=repository%3Anilpoc%2Fregen-s2i%3Apull%2Cpush"
time="2020-11-17T23:06:32Z" level=debug msg="Increasing token expiration to: 60 seconds"



It seems from looking at the build logs that the issue could be due to a bug in an old buildah version that got shipped with 4.4x-4.5x but seems to be fixed in 4.6.1. 

Ref: https://access.redhat.com/solutions/5565271


However, upgrade to 4.6.1 is not possible since it is an ARO OCP4 cluster; therefore, enduser don't have control over the version. Seems it is bounded with specific ARO channels. We might need to backport the fix into 4.4x-4.5x . 


Additional Info:
Whenever build fails after 2-3 hours of wait, it give space error following error('no space left on device') seen in "pod-regen-s2i-7-build.yaml"


        ===========
          exitCode: 1
          reason: Error

            <..LINES REMOVED..>

            Error processing tar file(exit status 1): write
            /tmp/src/target/REGen2Web-0.0.1-SNAPSHOT/WEB-INF/lib/jvm-exports/jre-1.8.0/sasl-1.8.0.jar:
            no space left on device
          startedAt: '2020-11-11T21:39:35Z'
          finishedAt: '2020-11-11T22:35:00Z'
        ===========


The above build is using ephemeral-storage: 100Gi since it failed with 50Gi.

Comment 3 Amit Kumar Das 2020-11-25 01:42:28 UTC
Per customer, their build time was improved after adding the request-resource (memory, CPU, and ephemeral-storage) for the build. The build will take about ~ 1.5 hours. Earlier it was taking 3-4 hours. Enabling ephemeral-storage solved the issue with build error.

We had to enable the ephemeral storage technology preview. This feature is disabled by default. Per the documentation, It will be a problem to upgrade since the FeatureGates is enabled. We will need to wait for ARO to have OCP 4.6.1 to test and we may not be able to do that in the current ARO cluster since we enable the ephemeral-storage feature.

Any idea if the OCP 4.6.1 fixes[1] will be backport into 4.4x-4.5x ?


  [1]  https://access.redhat.com/solutions/5565271

  Bug 1833328 - [build] OpenShift Build Image - COPY Command is Very Slow
  Bug 1828119 - OpenShift Build Image - COPY Command is Very Slow  
  Bug 1813258 - Copy operation is very slow during a build from dockerfile with buildah-1.9.0-2.el7.x86_64 and
                buildah-1.11.6-8.el7_8.x86_64

Comment 4 Adam Kaplan 2020-12-14 18:10:15 UTC
After reviewing the changelogs for buildah, there are a few performance fixes that are available in the 1.14.11 that we can safely add to OCP 4.4.z. This includes the following:

1. blobcache: avoid an unnecessary NewImage() [1]
2. Fix .dockerignore with globs and ! commands [2]

Item 1 can be observed in the debug logs sent over, where up to 10 minutes is spent extracting blob layers.

There is another important fix in 1.14.11 that was added to OCP 4.5, but wasn't backported to 4.4.z (Make imagebuildah.BuildOptions.Architecture/OS optional). I've filed Bug #1907573 to track this backport, and am going to close this BZ as a duplicate.

[1] https://github.com/containers/buildah/releases/tag/v1.14.11
[2] https://github.com/containers/buildah/releases/tag/v1.14.4
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1907573

*** This bug has been marked as a duplicate of bug 1907573 ***