Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1596449

Summary: Build update that sets phase status failed doesn't contain log snippet data
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: BuildAssignee: Ben Parees <bparees>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: low Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, bparees, wewang
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The logic for updating the build status waited to update the lognsippet containing the tail of the build log only ran after the build status was updated to the failed state. Consequence: The build would first transition to failed, then get updated again with the log snippet. This means code watching for the build to enter a failed state would not see the logsnippet value populated initially. Fix: Code has been changed to populate the logsnippet field when the build transitions to failed, so the build update will contain both the failed state and the logsnippet. Result: Code that watches the build for a transition to failed state will see the logsnippet as part of the update that transitioned the build to failed, instead of seeing a secondary update later.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:20:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2018-06-29 03:17:41 UTC
I'm observing that when certain pod failures occur I see the build transition to failed status with an empty logSnippet, and then a fraction of a second later the logSnippet is set.

This makes trying to programmatically capture build logSnippet painful because i have to wait an arbitrary amount of time after seeing failed for a logSnippet that may or may not get set.  The logSnippet is useful to my client (ci-operator) so that I can convey that back to the user for summarizing why we're failing a build.

Is this something in the build controller, or something in the pod?

Expected outcome: phase set to Failed at the same time logSnippet is set.

Actual outcome: phase set to failed, and then a subsequent update sets logSnippet.

Impact: Makes it hard to have a good UI for a developer of openshift using our CI tools for when a build fails.

Comment 1 Clayton Coleman 2018-06-29 03:19:21 UTC
I *think* it's because setBuildCompletionData checks:

	if build.Status.Phase == buildapi.BuildPhaseFailed && len(build.Status.LogSnippet) == 0 &&

but should instead check the current and new phase

	if build.Status.Phase == buildapi.BuildPhaseFailed || update.getPhase() == buildapi.BuildPhaseFailed && len(build.Status.LogSnippet) == 0 && ...

Comment 2 Ben Parees 2018-07-11 20:03:39 UTC
https://github.com/openshift/origin/pull/20290

Comment 3 wewang 2018-08-27 08:46:47 UTC
verified in openshift v3.11.0-0.22.0

steps:
1. Download file application-template-dockerbuild.json and updated with wrong post-command
 -----------------

        "postCommit": {
          "args": ["bundle", "exec", "rake1", "test"]
        },
--------------------

2. Create apps
 $ oc new-app application-template-dockerbuild.json

3.Check the build 
$ oc get builds --watch
NAME                  TYPE      FROM          STATUS    STARTED          DURATION
ruby-sample-build-1   Docker    Git@7ccd324   Running   13 seconds ago   
ruby-sample-build-1   Docker    Git@7ccd324   Failed (PostCommitHookFailed)   19 seconds ago 

4. Check the build controller log in master 
[root@qe-wewang-master-etcd-1 ~]# oc logs pod/master-controllers-qe-wewang-master-etcd-1 -n kube-system |grep build_controller.go

I0827 08:23:25.120689       1 build_controller.go:340] Handling build wewang/ruby-sample-build-1 (Failed)
I0827 08:23:25.121315       1 build_controller.go:1192] Patching build wewang/ruby-sample-build-1 (Failed) with buildUpdate(logSnippet: "Successfully built 14c7c1584c28\nRunning post commit hook ...\nbundler: command not found: rake1\nInstall missing gem executables with `bundle install`\nerror: build error: container \"openshift_docker-build_ruby...wang_post-commit_289cef79\" returned non-zero exit code: 127")

@Ben Parees I saw the pr and thought when build controll set status to failed and at the same time gave logSnippet, do you help to check if my verified steps are correct? 
thanks.

Comment 4 Ben Parees 2018-08-27 14:15:37 UTC
Yes that looks fine, thanks!

Comment 6 errata-xmlrpc 2018-10-11 07:20:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652