Bug 1809861

Summary: build failure reason not reported for image input failure
Product: OpenShift Container Platform Reporter: Ben Parees <bparees>
Component: BuildAssignee: Gabe Montero <gmontero>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4CC: aos-bugs, gmontero, wzheng
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: build failures because of failed image imports or invalid dockerfiles were only categorized as generic build errors Consequence: users are forced to do unreasonable amounts of diagnosis of build logs (usually with non standard logging levels) to determine an actionable root cause Fix: new failure reasons for failed image imports and invalid dockerfiles were introduced at the api level and are now utilized by the openshift builder image Result: users now can easily see on their build objects status if a build failed because of a failed image import or invalid dockerfile
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:17:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Parees 2020-03-04 03:31:23 UTC
Description of problem:
if a build fails because we could not extract input content from an image, the reason is reported as "genericbuildfailure" whereas if we fail to clone git content we report fetchsourcefailure.

We should report a more suitable reason when the image fetch init container fails.


Version-Release number of selected component (if applicable):
4.3

How reproducible:
always

Steps to Reproduce:
1. setup a buildconfig that includes input from an image but make the image inaccessible or the extract path invalid
2. run the build


Actual results:
see a build failure reason of genericbuildfailure

Expected results:
a build failure reason that point to the failure that occurred during image content extraction.


Additional info:
compare:
https://github.com/openshift/builder/blob/adc71a7dbf07918033f3b06b75988a71fef5458f/pkg/build/builder/cmd/builder.go#L251-L257

vs:
https://github.com/openshift/builder/blob/adc71a7dbf07918033f3b06b75988a71fef5458f/pkg/build/builder/cmd/builder.go#L285-L292


should audit any other init container steps as well to ensure we are reporting reasons+messages for them.

Comment 1 Gabe Montero 2020-03-19 20:25:35 UTC
So there are 3 init containers for both source and docker strategy builds:

1) git clone
2) extract image
3) manage dockerfile

See https://github.com/openshift/openshift-controller-manager/blob/master/pkg/build/controller/strategy/sti.go#L120-L187 and https://github.com/openshift/openshift-controller-manager/blob/master/pkg/build/controller/strategy/docker.go#L114-L181

Ben covered the first two in his description.

For the third, see https://github.com/openshift/builder/blob/master/pkg/build/builder/source.go#L102-L126

Similar to extract image, it does no setting on build status phase/reason/message.

And taking inventory of the existing reason/messages at https://github.com/openshift/api/blob/master/build/v1/consts.go#L63-L157 I see nothing 
that could be used for extract image or manage dockerfile.

So I'll craft an openshift/api PR to add the new reason/message constants, and then an openshift/builder PR, with an API vendor bump and associated code changes,
to use the new constants.

Comment 5 wewang 2020-03-23 07:03:28 UTC
Verified in version:
4.5.0-0.nightly-2020-03-22-175100

[wewang@wangwen ~]$ oc get builds
NAME                                   TYPE     FROM          STATUS                             STARTED          DURATION
statusfail-fetchimagecontentdocker-1   Docker                 Failed (FetchImageContentFailed)   15 seconds ago   14s

Comment 7 errata-xmlrpc 2020-07-13 17:17:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409