Bug 1844469 - s2i builds are failing on OCP 4.3.z that are successfully building on OCP 3.7
Summary: s2i builds are failing on OCP 4.3.z that are successfully building on OCP 3.7
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Adam Kaplan
QA Contact: wewang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-05 13:39 UTC by Anand Paladugu
Modified: 2023-12-15 18:05 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: buildah had an extraneous call that read layers from its blob cache Consequence: layers could fail to read, particularly if a layer was large Fix: removed extraneous call to read layers Result: buildah builds should succeed and not fail to read an image layer that had already been pulled
Clone Of:
Environment:
Last Closed: 2020-10-27 16:05:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:05:49 UTC

Description Anand Paladugu 2020-06-05 13:39:44 UTC
Description of problem:

s2i builds are failing on OCP  4.3.z that are successfully building on OCP 3.7

Version-Release number of selected component (if applicable):

OCP 4.3.z

How reproducible:

Readily

Steps to Reproduce:
1. Run the builds psap or common
2.
3.

Actual results:

builds that are building larger images are failing


Expected results:

All builds should be successful as there were passing OCP 3.7

Additional info:

1. Debug level 6 logs are attached to the case
2. Node and Pod have sufficient ephemeral storage
3. Docker file is common for all build and source code changes from build to build
4. Docker file and build config are attached to the case

Comment 5 Gabe Montero 2020-06-05 17:36:41 UTC
OK I've attached the latest round of debug data from the customer.

There is a hiccup in buildah during the copy after the build completes.

As noted in the description, there certainly appears to be adequate storage available on the host.

For the last few weeks Nalin from our buildah team has been in the middle of some buildah copy optimzations.

It is quite possible those would have bearing here.

Making him the owner (but will leave under OCP/build for now) so he can look at the data I attached.

Comment 6 Nalin Dahyabhai 2020-06-05 19:22:07 UTC
Some of the errors in the linked issue look similar to bug #1720730, though I'm not familiar enough with what's happening in the assemble scripts to be able to diagnose what part we're playing when they don't succeed.

Comment 7 Gabe Montero 2020-06-05 19:32:35 UTC
https://github.com/inteliquent is accessible Nalin but not https://github.com/inteliquent/ng911-common from their build config

Nor do I see it in the customer case attachments.

Let's ask for it:  

Arnand - we need any s2i scripts that are related to the 

  name: common
  namespace: ng911

Build config.

Comment 12 Anand Paladugu 2020-06-10 13:35:30 UTC
@Nalin 

Please confirm if attachments are ok or if you need any other info?

Comment 13 Anand Paladugu 2020-06-11 00:53:03 UTC
It looks like the customer was able to fix the issue by updating the base gradle-spring-boot image. The new image is optimized for space. The previous image was approximately 400MB; the new one is approximately 30MB and all build are succeeding.

I will probably close the case, but let me know if BZ still needs to be open.

Thx

Anand

Comment 14 Anand Paladugu 2020-06-11 14:59:39 UTC
and the Customer posted this question today.  I think he meant 400 GB.

"Although we have been able to work past our current issue. It appears there may be size limits and/or bugs related to image build size. We do not have plans to deploy images of 400MB anytime soon, but would like to understand if there are size limitations within OpenShift environment"

Comment 16 Anand Paladugu 2020-08-04 13:29:28 UTC
Adam:   Any update on this ticket ?

Comment 17 Adam Kaplan 2020-08-27 14:38:43 UTC
This bug was caused by an issue with buildah having an extraneous call to read an image from its blob cache [1]. This was fixed in buildah v1.14.11, which was vendored into OpenShift builds in 4.6.0 [2] and 4.5.z [3].

[1] https://github.com/containers/buildah/pull/2502
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1720730
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1868401

Comment 24 errata-xmlrpc 2020-10-27 16:05:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 25 Red Hat Bugzilla 2023-09-14 06:01:50 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.