I'm observing that when certain pod failures occur I see the build transition to failed status with an empty logSnippet, and then a fraction of a second later the logSnippet is set.
This makes trying to programmatically capture build logSnippet painful because i have to wait an arbitrary amount of time after seeing failed for a logSnippet that may or may not get set. The logSnippet is useful to my client (ci-operator) so that I can convey that back to the user for summarizing why we're failing a build.
Is this something in the build controller, or something in the pod?
Expected outcome: phase set to Failed at the same time logSnippet is set.
Actual outcome: phase set to failed, and then a subsequent update sets logSnippet.
Impact: Makes it hard to have a good UI for a developer of openshift using our CI tools for when a build fails.
I *think* it's because setBuildCompletionData checks:
if build.Status.Phase == buildapi.BuildPhaseFailed && len(build.Status.LogSnippet) == 0 &&
but should instead check the current and new phase
if build.Status.Phase == buildapi.BuildPhaseFailed || update.getPhase() == buildapi.BuildPhaseFailed && len(build.Status.LogSnippet) == 0 && ...
verified in openshift v3.11.0-0.22.0
1. Download file application-template-dockerbuild.json and updated with wrong post-command
"args": ["bundle", "exec", "rake1", "test"]
2. Create apps
$ oc new-app application-template-dockerbuild.json
3.Check the build
$ oc get builds --watch
NAME TYPE FROM STATUS STARTED DURATION
ruby-sample-build-1 Docker Git@7ccd324 Running 13 seconds ago
ruby-sample-build-1 Docker Git@7ccd324 Failed (PostCommitHookFailed) 19 seconds ago
4. Check the build controller log in master
[root@qe-wewang-master-etcd-1 ~]# oc logs pod/master-controllers-qe-wewang-master-etcd-1 -n kube-system |grep build_controller.go
I0827 08:23:25.120689 1 build_controller.go:340] Handling build wewang/ruby-sample-build-1 (Failed)
I0827 08:23:25.121315 1 build_controller.go:1192] Patching build wewang/ruby-sample-build-1 (Failed) with buildUpdate(logSnippet: "Successfully built 14c7c1584c28\nRunning post commit hook ...\nbundler: command not found: rake1\nInstall missing gem executables with `bundle install`\nerror: build error: container \"openshift_docker-build_ruby...wang_post-commit_289cef79\" returned non-zero exit code: 127")
@Ben Parees I saw the pr and thought when build controll set status to failed and at the same time gave logSnippet, do you help to check if my verified steps are correct?
Yes that looks fine, thanks!
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.