Bug 1415946

Summary: Should show build failure reason for docker build when builder image is wrong
Product: OpenShift Container Platform Reporter: Yadan Pei <yapei>
Component: BuildAssignee: Corey Daley <cdaley>
Status: CLOSED ERRATA QA Contact: Wang Haoran <haowang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.5.0CC: aos-bugs, bparees, ipalade, tdawson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Build failure reason was not getting set/saved correctly Consequence: Build failure reason could not be shown in command output Fix: Updated code to correctly save build failure reason Result: Build failure reason now shows correctly in command output
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 19:10:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yadan Pei 2017-01-24 07:27:06 UTC
Description of problem:
Should show build failure reason for docker build when builder image is wrong.
When creating docker build with Wrong Post Commit/Wrong Source, detail failure reason is shown in 'oc get builds' output as expected, this should also applies to docker build when wrong builder image is given

Version-Release number of selected component (if applicable):
oc v3.5.0.7+390ef18
kubernetes v1.5.2+43a9be4

How reproducible:
Always

Steps to Reproduce:
1. Create docker build with Wrong Builder Image
$ wget https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/build/ruby22rhel7-template-docker.json
$ vi ruby22rhel7-template-docker.json
.....
"strategy": {
"type": "Docker",
"dockerStrategy": {
"from": {
"kind": "DockerImage",
"name": "docker.io/openshift/rubyyyy-20-centos7:latest"
},
........
$ oc new-app -f ruby22rhel7-template-docker.json
2.Wait for #1 build finish, check build status via CLI
$ oc get builds


Actual results:
2. Only show "Failed" in Status, detail failure reason was not shown
$ oc get builds
NAME                    TYPE      FROM          STATUS    STARTED         DURATION
ruby22-sample-build-1   Docker    Git@e79d887   Failed    3 minutes ago   11s

Expected results:
2. Should also show failure reason in output like 
$ oc get builds
NAME                    TYPE      FROM          STATUS    STARTED         DURATION
ruby22-sample-build-1   Docker    Git@e79d887   Failed (PullBuilderImageFailed)     3 minutes ago   11s


Additional info:

Comment 1 Victor Ionut Palade 2017-01-24 15:15:50 UTC
i hate to criticize, but the description of this issue makes very little sense.

Anyway, there's an actual test case that covers this exact problem, see: https://github.com/openshift/origin/blob/master/test/extended/builds/failure_status.go#L92-L107

The problem is a race condition in updating the build object and not a missing implementation.
Will update once i have a fix.

Comment 2 Ben Parees 2017-01-30 15:05:25 UTC
Victor what's the status on this bug?  Any theories on where/why the race is occurring?

Comment 3 Victor Ionut Palade 2017-01-30 20:08:53 UTC
i believe somewhere in the controller, we update the build object and overwrite whatever values we pushed to the API from the build pod.
see:
https://github.com/openshift/origin/blob/master/pkg/build/controller/controller.go#L134

Usually the fields are empty for actions involving the build process. s2i and docker strategy.
This happens very rarely and usually happens more often as a test flake than by running the build yourself.
The controller update, should either be updated to use something similar to this https://github.com/openshift/origin/blob/master/pkg/build/builder/sti.go#L121 
Or the only updates that are done to the build object should be done only from the controller itself. But, because the build pod is isolated, the controller doesn't know what happens to it unless it receives information from the API that is updated by the build pod.

Comment 4 Ben Parees 2017-01-30 20:34:53 UTC
the build status reason and build status message get cleared by:
https://github.com/openshift/origin/blob/master/pkg/build/controller/controller.go#L229-L230

but i can't think my way through a path where we'd have a build enter the failed state, and then still go through nextBuildPhase (which is normally only called for build objects that are in the New state. We wouldn't even have a pod yet if the build were in the New state.  And if the build were being updated elsewhere, then the controller's update would fail due to a version conflict, so it wouldn't be able to overwrite the reason/message fields)

Comment 5 Corey Daley 2017-02-08 21:34:05 UTC
Submitted PR https://github.com/openshift/origin/pull/12873

Comment 6 Corey Daley 2017-02-17 21:00:14 UTC
https://github.com/openshift/origin/pull/12873

Comment 7 Troy Dawson 2017-02-21 22:37:37 UTC
This has been merged into ocp and is in OCP v3.5.0.32 or newer.

Comment 9 Yadan Pei 2017-02-23 05:06:39 UTC
Checked against oc v3.5.0.32-1+4f84c83 with the same step in "Steps to Reproduce", after build finishes, 'oc get builds' returns Build Failure reason

$ oc get builds
NAME                    TYPE      FROM          STATUS                            STARTED              DURATION
ruby22-sample-build-1   Docker    Git@e79d887   Failed (PullBuilderImageFailed)   About a minute ago   


Move to VERIFIED

Comment 11 errata-xmlrpc 2017-04-12 19:10:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884