Created attachment 1488287 [details] cluster-monitoring-operator pod logs Description of problem: cluster-monitoring-operator pods in CrashLoopBackOff, image wrongly packaged 4.0 Telemeter client # oc -n openshift-monitoring get pod ..... cluster-monitoring-operator-56bb5946c4-49v8n 0/1 CrashLoopBackOff 12 1h # oc -n openshift-monitoring logs cluster-monitoring-operator-56bb5946c4-49v8n I0929 03:13:21.844419 1 tasks.go:37] running task Updating Telemeter client I0929 03:13:21.844509 1 decoder.go:224] decoding stream as YAML panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x10d344d] goroutine 36 [running]: github.com/openshift/cluster-monitoring-operator/pkg/manifests.(*Factory).TelemeterClientServiceMonitor(0xc4204ccc60, 0xc4200ae840, 0xc420a51cc0, 0x6ae93a) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/manifests/manifests.go:1441 +0x11d github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TelemeterClientTask).Run(0xc42036a0c0, 0x1, 0x1) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/telemeter.go:36 +0x33 github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TaskRunner).ExecuteTask(0xc420a51eb8, 0xc4204cce40, 0xf, 0xc420a51d60) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/tasks.go:48 +0x34 github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TaskRunner).RunAll(0xc420a51eb8, 0xc420473a80, 0x53f961cb21d1086d) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/tasks.go:38 +0x141 github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).sync(0xc42017bf00, 0xc4203b0d50, 0x2e, 0x11a2f40, 0xc42042b790) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:251 +0x828 github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).processNextWorkItem(0xc42017bf00, 0xc420037f00) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:201 +0xfb github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).worker(0xc42017bf00) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:171 +0x15a created by github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).Run /go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:130 +0x1e2 Version-Release number of selected component (if applicable): ose-cluster-monitoring-operator/images/v3.11.17-1 # openshift version openshift v3.11.17 How reproducible: Always Steps to Reproduce: 1. Deploy cluster monitoring upon openshift v3.11.17 2. 3. Actual results: cluster-monitoring-operator pods in CrashLoopBackOff, image wrongly packaged 4.0 Telemeter client Expected results: cluster-monitoring-operator pods should be running well. Additional info:
please don't package 3.11 images from master branch, master branch is 4.0 now. This issue blocks etcd monitoring function.
Assigning to lserven.
This issue is a result of a bug introduced in https://github.com/openshift/cluster-monitoring-operator/pull/103. A follow up PR was made to correct the issue: https://github.com/openshift/cluster-monitoring-operator/pull/109. This PR was dependent on upstream fixes in https://github.com/openshift/telemeter/pull/35 but the patches were mixed out of order. Finally, one last PR https://github.com/openshift/cluster-monitoring-operator/pull/110 was made to correct everything. From my tests, CMO master is stable again. Please note that CMO master is 4.0 and _not_ 3.11. Please verify again to confirm that the issue is fixed.
*** Bug 1635103 has been marked as a duplicate of this bug. ***
As noted in 1635103, the root cause of these issues is that 3.11 OCP images are incorrectly being built from the master branch of the Cluster Monitoring Operator rather than the release-3.11 branch. The commit that caused this external crash should never have ended up in 3.11. The master branch switched to 4.0 development some time ago.
I have just made a PR to the OCP images repo to fix the branch for cluster monitoring operator images for 3.11.
Issue is fixed, 4.0 Telemeter client is removed from 3.11 brach Images: ose-cluster-monitoring-operator-v3.11.20-1 Please change to ON_QA then I will close it
Thanks!
As comment 7 move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0024