Bug 1634227 - cluster-monitoring-operator pods in CrashLoopBackOff, image wrongly packaged 4.0 Telemeter client
Summary: cluster-monitoring-operator pods in CrashLoopBackOff, image wrongly packaged ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Release
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.11.z
Assignee: lserven
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1635103 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-29 05:19 UTC by Junqi Zhao
Modified: 2019-01-10 09:04 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-10 09:04:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cluster-monitoring-operator pod logs (13.81 KB, text/plain)
2018-09-29 05:19 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0024 0 None None None 2019-01-10 09:04:07 UTC

Description Junqi Zhao 2018-09-29 05:19:14 UTC
Created attachment 1488287 [details]
cluster-monitoring-operator pod logs

Description of problem:
cluster-monitoring-operator pods in CrashLoopBackOff, image wrongly packaged 4.0 Telemeter client
# oc -n openshift-monitoring get pod
.....
cluster-monitoring-operator-56bb5946c4-49v8n   0/1       CrashLoopBackOff   12         1h

# oc -n openshift-monitoring logs cluster-monitoring-operator-56bb5946c4-49v8n
I0929 03:13:21.844419       1 tasks.go:37] running task Updating Telemeter client
I0929 03:13:21.844509       1 decoder.go:224] decoding stream as YAML
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x10d344d]

goroutine 36 [running]:
github.com/openshift/cluster-monitoring-operator/pkg/manifests.(*Factory).TelemeterClientServiceMonitor(0xc4204ccc60, 0xc4200ae840, 0xc420a51cc0, 0x6ae93a)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/manifests/manifests.go:1441 +0x11d
github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TelemeterClientTask).Run(0xc42036a0c0, 0x1, 0x1)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/telemeter.go:36 +0x33
github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TaskRunner).ExecuteTask(0xc420a51eb8, 0xc4204cce40, 0xf, 0xc420a51d60)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/tasks.go:48 +0x34
github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TaskRunner).RunAll(0xc420a51eb8, 0xc420473a80, 0x53f961cb21d1086d)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/tasks.go:38 +0x141
github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).sync(0xc42017bf00, 0xc4203b0d50, 0x2e, 0x11a2f40, 0xc42042b790)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:251 +0x828
github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).processNextWorkItem(0xc42017bf00, 0xc420037f00)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:201 +0xfb
github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).worker(0xc42017bf00)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:171 +0x15a
created by github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).Run
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:130 +0x1e2

Version-Release number of selected component (if applicable):
ose-cluster-monitoring-operator/images/v3.11.17-1
# openshift version
openshift v3.11.17

How reproducible:
Always

Steps to Reproduce:
1. Deploy cluster monitoring upon openshift v3.11.17
2.
3.

Actual results:
cluster-monitoring-operator pods in CrashLoopBackOff, image wrongly packaged 4.0 Telemeter client

Expected results:
cluster-monitoring-operator pods should be running well.

Additional info:

Comment 1 Junqi Zhao 2018-09-29 05:21:56 UTC
please don't package 3.11 images from master branch, master branch is 4.0 now.
This issue blocks etcd monitoring function.

Comment 2 minden 2018-10-01 11:50:28 UTC
Assigning to lserven.

Comment 3 lserven 2018-10-01 11:54:26 UTC
This issue is a result of a bug introduced in https://github.com/openshift/cluster-monitoring-operator/pull/103.

A follow up PR was made to correct the issue: https://github.com/openshift/cluster-monitoring-operator/pull/109. This PR was dependent on upstream fixes in https://github.com/openshift/telemeter/pull/35 but the patches were mixed out of order.

Finally, one last PR https://github.com/openshift/cluster-monitoring-operator/pull/110 was made to correct everything. From my tests, CMO master is stable again.

Please note that CMO master is 4.0 and _not_ 3.11.

Please verify again to confirm that the issue is fixed.

Comment 4 lserven 2018-10-02 15:10:14 UTC
*** Bug 1635103 has been marked as a duplicate of this bug. ***

Comment 5 lserven 2018-10-02 15:12:02 UTC
As noted in 1635103, the root cause of these issues is that 3.11 OCP images are incorrectly being built from the master branch of the Cluster Monitoring Operator rather than the release-3.11 branch. The commit that caused this external crash should never have ended up in 3.11. The master branch switched to 4.0 development some time ago.

Comment 6 lserven 2018-10-02 15:20:56 UTC
I have just made a PR to the OCP images repo to fix the branch for cluster monitoring operator images for 3.11.

Comment 7 Junqi Zhao 2018-10-10 03:22:35 UTC
Issue is fixed, 4.0 Telemeter client is removed from 3.11 brach

Images:
ose-cluster-monitoring-operator-v3.11.20-1

Please change to ON_QA then I will close it

Comment 8 lserven 2018-10-10 06:34:31 UTC
Thanks!

Comment 9 DeShuai Ma 2018-10-15 02:20:39 UTC
As comment 7 move to verified.

Comment 11 errata-xmlrpc 2019-01-10 09:04:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0024


Note You need to log in before you can comment on or make changes to this bug.