Bug 1596449 - Build update that sets phase status failed doesn't contain log snippet data
Summary: Build update that sets phase status failed doesn't contain log snippet data
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 3.11.0
Assignee: Ben Parees
QA Contact: wewang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-29 03:17 UTC by Clayton Coleman
Modified: 2018-10-11 07:21 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The logic for updating the build status waited to update the lognsippet containing the tail of the build log only ran after the build status was updated to the failed state. Consequence: The build would first transition to failed, then get updated again with the log snippet. This means code watching for the build to enter a failed state would not see the logsnippet value populated initially. Fix: Code has been changed to populate the logsnippet field when the build transitions to failed, so the build update will contain both the failed state and the logsnippet. Result: Code that watches the build for a transition to failed state will see the logsnippet as part of the update that transitioned the build to failed, instead of seeing a secondary update later.
Clone Of:
Environment:
Last Closed: 2018-10-11 07:20:43 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:21:19 UTC

Description Clayton Coleman 2018-06-29 03:17:41 UTC
I'm observing that when certain pod failures occur I see the build transition to failed status with an empty logSnippet, and then a fraction of a second later the logSnippet is set.

This makes trying to programmatically capture build logSnippet painful because i have to wait an arbitrary amount of time after seeing failed for a logSnippet that may or may not get set.  The logSnippet is useful to my client (ci-operator) so that I can convey that back to the user for summarizing why we're failing a build.

Is this something in the build controller, or something in the pod?

Expected outcome: phase set to Failed at the same time logSnippet is set.

Actual outcome: phase set to failed, and then a subsequent update sets logSnippet.

Impact: Makes it hard to have a good UI for a developer of openshift using our CI tools for when a build fails.

Comment 1 Clayton Coleman 2018-06-29 03:19:21 UTC
I *think* it's because setBuildCompletionData checks:

	if build.Status.Phase == buildapi.BuildPhaseFailed && len(build.Status.LogSnippet) == 0 &&

but should instead check the current and new phase

	if build.Status.Phase == buildapi.BuildPhaseFailed || update.getPhase() == buildapi.BuildPhaseFailed && len(build.Status.LogSnippet) == 0 && ...

Comment 2 Ben Parees 2018-07-11 20:03:39 UTC
https://github.com/openshift/origin/pull/20290

Comment 3 wewang 2018-08-27 08:46:47 UTC
verified in openshift v3.11.0-0.22.0

steps:
1. Download file application-template-dockerbuild.json and updated with wrong post-command
 -----------------

        "postCommit": {
          "args": ["bundle", "exec", "rake1", "test"]
        },
--------------------

2. Create apps
 $ oc new-app application-template-dockerbuild.json

3.Check the build 
$ oc get builds --watch
NAME                  TYPE      FROM          STATUS    STARTED          DURATION
ruby-sample-build-1   Docker    Git@7ccd324   Running   13 seconds ago   
ruby-sample-build-1   Docker    Git@7ccd324   Failed (PostCommitHookFailed)   19 seconds ago 

4. Check the build controller log in master 
[root@qe-wewang-master-etcd-1 ~]# oc logs pod/master-controllers-qe-wewang-master-etcd-1 -n kube-system |grep build_controller.go

I0827 08:23:25.120689       1 build_controller.go:340] Handling build wewang/ruby-sample-build-1 (Failed)
I0827 08:23:25.121315       1 build_controller.go:1192] Patching build wewang/ruby-sample-build-1 (Failed) with buildUpdate(logSnippet: "Successfully built 14c7c1584c28\nRunning post commit hook ...\nbundler: command not found: rake1\nInstall missing gem executables with `bundle install`\nerror: build error: container \"openshift_docker-build_ruby...wang_post-commit_289cef79\" returned non-zero exit code: 127")

@Ben Parees I saw the pr and thought when build controll set status to failed and at the same time gave logSnippet, do you help to check if my verified steps are correct? 
thanks.

Comment 4 Ben Parees 2018-08-27 14:15:37 UTC
Yes that looks fine, thanks!

Comment 6 errata-xmlrpc 2018-10-11 07:20:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.