Bug 1844469

Summary: s2i builds are failing on OCP 4.3.z that are successfully building on OCP 3.7
Product: OpenShift Container Platform Reporter: Anand Paladugu <apaladug>
Component: BuildAssignee: Adam Kaplan <adam.kaplan>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.zCC: adam.kaplan, aos-bugs, gmontero, nalin, palonsor, wzheng
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: buildah had an extraneous call that read layers from its blob cache Consequence: layers could fail to read, particularly if a layer was large Fix: removed extraneous call to read layers Result: buildah builds should succeed and not fail to read an image layer that had already been pulled
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:05:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anand Paladugu 2020-06-05 13:39:44 UTC
Description of problem:

s2i builds are failing on OCP  4.3.z that are successfully building on OCP 3.7

Version-Release number of selected component (if applicable):

OCP 4.3.z

How reproducible:

Readily

Steps to Reproduce:
1. Run the builds psap or common
2.
3.

Actual results:

builds that are building larger images are failing


Expected results:

All builds should be successful as there were passing OCP 3.7

Additional info:

1. Debug level 6 logs are attached to the case
2. Node and Pod have sufficient ephemeral storage
3. Docker file is common for all build and source code changes from build to build
4. Docker file and build config are attached to the case

Comment 5 Gabe Montero 2020-06-05 17:36:41 UTC
OK I've attached the latest round of debug data from the customer.

There is a hiccup in buildah during the copy after the build completes.

As noted in the description, there certainly appears to be adequate storage available on the host.

For the last few weeks Nalin from our buildah team has been in the middle of some buildah copy optimzations.

It is quite possible those would have bearing here.

Making him the owner (but will leave under OCP/build for now) so he can look at the data I attached.

Comment 6 Nalin Dahyabhai 2020-06-05 19:22:07 UTC
Some of the errors in the linked issue look similar to bug #1720730, though I'm not familiar enough with what's happening in the assemble scripts to be able to diagnose what part we're playing when they don't succeed.

Comment 7 Gabe Montero 2020-06-05 19:32:35 UTC
https://github.com/inteliquent is accessible Nalin but not https://github.com/inteliquent/ng911-common from their build config

Nor do I see it in the customer case attachments.

Let's ask for it:  

Arnand - we need any s2i scripts that are related to the 

  name: common
  namespace: ng911

Build config.

Comment 12 Anand Paladugu 2020-06-10 13:35:30 UTC
@Nalin 

Please confirm if attachments are ok or if you need any other info?

Comment 13 Anand Paladugu 2020-06-11 00:53:03 UTC
It looks like the customer was able to fix the issue by updating the base gradle-spring-boot image. The new image is optimized for space. The previous image was approximately 400MB; the new one is approximately 30MB and all build are succeeding.

I will probably close the case, but let me know if BZ still needs to be open.

Thx

Anand

Comment 14 Anand Paladugu 2020-06-11 14:59:39 UTC
and the Customer posted this question today.  I think he meant 400 GB.

"Although we have been able to work past our current issue. It appears there may be size limits and/or bugs related to image build size. We do not have plans to deploy images of 400MB anytime soon, but would like to understand if there are size limitations within OpenShift environment"

Comment 16 Anand Paladugu 2020-08-04 13:29:28 UTC
Adam:   Any update on this ticket ?

Comment 17 Adam Kaplan 2020-08-27 14:38:43 UTC
This bug was caused by an issue with buildah having an extraneous call to read an image from its blob cache [1]. This was fixed in buildah v1.14.11, which was vendored into OpenShift builds in 4.6.0 [2] and 4.5.z [3].

[1] https://github.com/containers/buildah/pull/2502
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1720730
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1868401

Comment 24 errata-xmlrpc 2020-10-27 16:05:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 25 Red Hat Bugzilla 2023-09-14 06:01:50 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days