1596449 – Build update that sets phase status failed doesn't contain log snippet data

Bug 1596449 - Build update that sets phase status failed doesn't contain log snippet data

Summary: Build update that sets phase status failed doesn't contain log snippet data

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Build
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	3.11.0
Assignee:	Ben Parees
QA Contact:	wewang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-29 03:17 UTC by Clayton Coleman
Modified:	2018-10-11 07:21 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The logic for updating the build status waited to update the lognsippet containing the tail of the build log only ran after the build status was updated to the failed state. Consequence: The build would first transition to failed, then get updated again with the log snippet. This means code watching for the build to enter a failed state would not see the logsnippet value populated initially. Fix: Code has been changed to populate the logsnippet field when the build transitions to failed, so the build update will contain both the failed state and the logsnippet. Result: Code that watches the build for a transition to failed state will see the logsnippet as part of the update that transitioned the build to failed, instead of seeing a secondary update later.
Clone Of:
Environment:
Last Closed:	2018-10-11 07:20:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:2652	0	None	None	None	2018-10-11 07:21:19 UTC

Description Clayton Coleman 2018-06-29 03:17:41 UTC

I'm observing that when certain pod failures occur I see the build transition to failed status with an empty logSnippet, and then a fraction of a second later the logSnippet is set.

This makes trying to programmatically capture build logSnippet painful because i have to wait an arbitrary amount of time after seeing failed for a logSnippet that may or may not get set.  The logSnippet is useful to my client (ci-operator) so that I can convey that back to the user for summarizing why we're failing a build.

Is this something in the build controller, or something in the pod?

Expected outcome: phase set to Failed at the same time logSnippet is set.

Actual outcome: phase set to failed, and then a subsequent update sets logSnippet.

Impact: Makes it hard to have a good UI for a developer of openshift using our CI tools for when a build fails.

Comment 1 Clayton Coleman 2018-06-29 03:19:21 UTC

I *think* it's because setBuildCompletionData checks:

	if build.Status.Phase == buildapi.BuildPhaseFailed && len(build.Status.LogSnippet) == 0 &&

but should instead check the current and new phase

	if build.Status.Phase == buildapi.BuildPhaseFailed || update.getPhase() == buildapi.BuildPhaseFailed && len(build.Status.LogSnippet) == 0 && ...

Comment 2 Ben Parees 2018-07-11 20:03:39 UTC

https://github.com/openshift/origin/pull/20290

Comment 3 wewang 2018-08-27 08:46:47 UTC

verified in openshift v3.11.0-0.22.0

steps:
1. Download file application-template-dockerbuild.json and updated with wrong post-command
 -----------------

        "postCommit": {
          "args": ["bundle", "exec", "rake1", "test"]
        },
--------------------

2. Create apps
 $ oc new-app application-template-dockerbuild.json

3.Check the build 
$ oc get builds --watch
NAME                  TYPE      FROM          STATUS    STARTED          DURATION
ruby-sample-build-1   Docker    Git@7ccd324   Running   13 seconds ago   
ruby-sample-build-1   Docker    Git@7ccd324   Failed (PostCommitHookFailed)   19 seconds ago 

4. Check the build controller log in master 
[root@qe-wewang-master-etcd-1 ~]# oc logs pod/master-controllers-qe-wewang-master-etcd-1 -n kube-system |grep build_controller.go

I0827 08:23:25.120689       1 build_controller.go:340] Handling build wewang/ruby-sample-build-1 (Failed)
I0827 08:23:25.121315       1 build_controller.go:1192] Patching build wewang/ruby-sample-build-1 (Failed) with buildUpdate(logSnippet: "Successfully built 14c7c1584c28\nRunning post commit hook ...\nbundler: command not found: rake1\nInstall missing gem executables with `bundle install`\nerror: build error: container \"openshift_docker-build_ruby...wang_post-commit_289cef79\" returned non-zero exit code: 127")

@Ben Parees I saw the pr and thought when build controll set status to failed and at the same time gave logSnippet, do you help to check if my verified steps are correct? 
thanks.

Comment 4 Ben Parees 2018-08-27 14:15:37 UTC

Yes that looks fine, thanks!

Comment 6 errata-xmlrpc 2018-10-11 07:20:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652

Note You need to log in before you can comment on or make changes to this bug.