Updated api.ci, got a panic in the build controller E0725 01:39:57.220107 1 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 /usr/local/go/src/runtime/asm_amd64.s:573 /usr/local/go/src/runtime/panic.go:502 /usr/local/go/src/runtime/panic.go:63 /usr/local/go/src/runtime/signal_unix.go:388 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/pkg/build/controller/build/build_controller.go:1020 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/pkg/build/controller/build/build_controller.go:1043 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/pkg/build/controller/build/build_controller.go:366 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/pkg/build/controller/build/build_controller.go:285 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/pkg/build/controller/build/build_controller.go:261 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/pkg/build/controller/build/build_controller.go:246 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 /tmp/openshift/build-rpms/rpm/BUILD/origin-3.11.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 /usr/local/go/src/runtime/asm_amd64.s:2361
This is to latest master branch images in origin.
The panic is because handleCompletedBuild() can be passed a nil pod if the pod has been deleted already.
Fix in https://github.com/openshift/origin/pull/20414
Hi, @Clayton Coleman, In my understand, when build is in terminal status(success, failed or cancel), then delete build pod, should no error about Observed a panic according: https://github.com/smarterclayton/origin/blob/5d12941b2ee9da5ce17c9ec296f483f363182dd5/pkg/build/controller/build/build_controller.go#L1012, right? but I want to reproduce the bug in v3.10, I tried with steps 1. Create apps $ oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/sample-app/application-template-dockerbuild.json $ oc get builds NAME TYPE FROM STATUS STARTED DURATION ruby-sample-build-1 Source Git@7ccd324 Running 7 seconds ago 2. cancel build $oc cancel-build ruby-sample-build-1 3.oc delete pod ruby-sample-build-1-build 4. check the logs of bc $oc logs -f bc/ruby-sample-build how can i get log for build controller?
the build controller logs are part of the master logs. (if you have a separate api server and controller server, then they are part of the controller server).
Hi @Ben Parees, I want to reproduce it in openshift v3.10.35 with steps Comment 5, but cannot get error info "Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)", could you help to check if my steps need to update? I think it's hard to decide when to delete build pod before setBuildCompletionData. thanks so much:>
Fyi, pls ignore step 4, I checked the build controller log in controller server
I think you can recreate this by just starting a build and as soon as the build starts running, deleting the build pod.
@Ben Parees, a. I know why cannot reproduce it in openshift v3.10.35, because there is no isOOMKilled() function in buildconfig_controller.go. b. But in v3.11.0-0.10.0, it had isOOMKilled (https://github.com/openshift/ose/blob/v3.11.0-0.10.0/pkg/build/controller/build/build_controller.go#L1020), which is not include pr 20414, still cannot reproduce issue about "Observed a panic".you can check my steps as follow. c. Should I Update api.ci to test the bug? I am not familiar with that, or we just checked the latest v3.11(v3.11.0-0.24.0) with my steps, if no panic issue in build controller. here's my steps: 1. start a build $oc new-build https://github.com/openshift/ruby-hello-world 2. When build is running, delete build pod $oc delete pod ruby-hello-world-1-build 3. Check the build $ oc get builds ruby-hello-world-1 Docker Git@7ccd324 Error (BuildPodDeleted) 15 seconds ago 8s 4.Check the build controller log in controller server root@qe-wewang2-bugcheckmaster-etcd-nfs-1 ~]# oc logs pod/master-controllers-qe-wewang2-bugcheckmaster-etcd-nfs-1 -n kube-system --loglevel=8 |grep -i panic
After deleting the build pod, edit the build object (change an annotation or something). That should force the build through the "handleCompletedBuild" codepath which will attempt to reference the pod trigger the nil pointer error.
Finally, could reproduce the issue in openshift:v3.11.0-0.10.0, and verified it in openshift v3.11.0-0.24.0, Thanks Ben Parees! Here's my reproduce steps: 1. Create a build $oc new-build https://github.com/openshift/ruby-hello-world 2.When build is complete, edit the build object $ oc get builds NAME TYPE FROM STATUS STARTED DURATION ruby-hello-world-1 Docker Git@7ccd324 Complete About a minute ago 1m2s $ oc edit build ruby-hello-world-1 status: completionTimestamp: 2018-08-29T08:19:40Z #delete completionTimestamp 3. Delete build pod $ oc delete pod ruby-hello-world-1-build 4. Then completionTimestamp should created automatically in build object, then delete it manually, then can force build through the "handleCompletedBuild" codepath $oc edit build ruby-hello-world-1 status: completionTimestamp: 2018-08-29T08:24:34Z # delete completionTimestamp 5. Check build controller log in control server $oc logs pod/master-controllers-qe-wewang2-bugcheckmaster-etcd-nfs-1 -n kube-system --loglevel=8 |grep -A 15 -B 3 pointer $I0829 08:14:37.089836 1 build_controller.go:344] Handling build wen/ruby-hello-world-1 (Complete) E0829 08:14:37.090007 1 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) /builddir/build/BUILD/atomic-openshift-git-0.766dbc4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652