Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1635103

Summary: [3.11] cluster-monitoring-operator pod CrashLoopBackOff
Product: OpenShift Container Platform Reporter: Weihua Meng <wmeng>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: high    
Version: 3.11.0CC: lserven, minden
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-02 15:10:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Weihua Meng 2018-10-02 06:38:46 UTC
Description of problem:
cluster-monitoring-operator pod CrashLoopBackOff

Version-Release number of selected component (if applicable):
openshift v3.11.18

How reproducible:
Always

Steps to Reproduce:
1. Install OCP v3.11

Actual results:
Install succeeded.
cluster-monitoring-operator pod CrashLoopBackOff

# oc get pod -n openshift-monitoring
NAME                                           READY     STATUS             RESTARTS   AGE
cluster-monitoring-operator-56bb5946c4-d5b5b   0/1       CrashLoopBackOff   23         2h


# oc describe pod/cluster-monitoring-operator-56bb5946c4-gb7lf

Events:
  Type     Reason     Age               From                                  Message
  ----     ------     ----              ----                                  -------
  Normal   Scheduled  10m               default-scheduler                     Successfully assigned openshift-monitoring/cluster-monitoring-operator-56bb5946c4-gb7lf to ip-172-18-8-81.ec2.internal
  Normal   Pulling    10m               kubelet, ip-172-18-8-81.ec2.internal  pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-cluster-monitoring-operator:v3.11"
  Normal   Pulled     10m               kubelet, ip-172-18-8-81.ec2.internal  Successfully pulled image "registry.reg-aws.openshift.com:443/openshift3/ose-cluster-monitoring-operator:v3.11"
  Normal   Created    1m (x5 over 10m)  kubelet, ip-172-18-8-81.ec2.internal  Created container
  Normal   Pulled     1m (x4 over 5m)   kubelet, ip-172-18-8-81.ec2.internal  Container image "registry.reg-aws.openshift.com:443/openshift3/ose-cluster-monitoring-operator:v3.11" already present on machine
  Normal   Started    1m (x5 over 10m)  kubelet, ip-172-18-8-81.ec2.internal  Started container
  Warning  BackOff    3s (x8 over 4m)   kubelet, ip-172-18-8-81.ec2.internal  Back-off restarting failed container


# oc logs pod/cluster-monitoring-operator-56bb5946c4-gb7lf

I1002 04:52:59.433584       1 decoder.go:224] decoding stream as YAML
I1002 04:53:00.624866       1 tasks.go:37] running task Updating Telemeter client
I1002 04:53:00.624964       1 decoder.go:224] decoding stream as YAML
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x10d344d]

goroutine 13 [running]:
github.com/openshift/cluster-monitoring-operator/pkg/manifests.(*Factory).TelemeterClientServiceMonitor(0xc4203d4a20, 0xc420098420, 0xc4208d3cc0, 0x6ae93a)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/manifests/manifests.go:1441 +0x11d
github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TelemeterClientTask).Run(0xc4203e10a0, 0x1, 0x1)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/telemeter.go:36 +0x33
github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TaskRunner).ExecuteTask(0xc4208d3eb8, 0xc4203d4ba0, 0xf, 0xc4208d3d60)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/tasks.go:48 +0x34
github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TaskRunner).RunAll(0xc4208d3eb8, 0xc4200f31c0, 0x351c07d935ca6caa)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/tasks.go:38 +0x141
github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).sync(0xc420348500, 0xc4203e21e0, 0x2e, 0x11a2f40, 0xc42042a060)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:251 +0x828
github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).processNextWorkItem(0xc420348500, 0xc42003bf00)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:201 +0xfb
github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).worker(0xc420348500)
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:171 +0x15a
created by github.com/openshift/cluster-monitoring-operator/pkg/operator.(*Operator).Run
	/go/src/github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go:130 +0x1e2


Expected results:
pod not crash

Additional info:

Comment 1 lserven 2018-10-02 15:09:09 UTC
I have been speaking OOB with N. Harrison Ripps. This issue is fundamentally the same as https://bugzilla.redhat.com/show_bug.cgi?id=1634227. The root cause of these issues is that 3.11 OCP images are incorrectly being built from the master branch of the Cluster Monitoring Operator rather than the release-3.11 branch. The commit that caused this external crash should never have ended up in 3.11. The master branch switched to 4.0 development some time ago.

Comment 2 lserven 2018-10-02 15:10:14 UTC

*** This bug has been marked as a duplicate of bug 1634227 ***