Bug 1963833

Summary:	Cluster monitoring operator crashlooping on single node clusters due to segfault
Product:	OpenShift Container Platform	Reporter:	Omer Tuchfeld <otuchfel>
Component:	Monitoring	Assignee:	Omer Tuchfeld <otuchfel>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	4.8	CC:	alegrand, anpicker, aos-bugs, erooth, kakkoyun, pkrupa, pmuller, sasha, wking
Target Milestone:	---
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 23:09:50 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Omer Tuchfeld 2021-05-24 07:10:08 UTC

Description of problem:
Cluster monitoring operator crashlooping on single node clusters due to segfault

Version-Release number of selected component (if applicable):
4.8-fc5

How reproducible:
100% of single node clusters, see test grid starting 2021/05/21, link [1]

Steps to Reproduce:
Single node live ISO CI, e.g. [3]

Actual results:
Monitoring operator pod crashlooping

Expected results:
Monitoring operator pod not crashlooping

Additional info:
Seems to come from this PR [2]


[1] https://testgrid.k8s.io/redhat-single-node#periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso

[2] https://github.com/openshift/cluster-monitoring-operator/pull/1151#issuecomment-846809313 

[3] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso/1395892352853217280

Comment 1 Junqi Zhao 2021-05-24 07:46:35 UTC

same bug as bug 1963775

Comment 5 Junqi Zhao 2021-05-25 07:40:12 UTC

checked with 4.8.0-0.nightly-2021-05-25-041803, cluster-monitoring-operator pod is normal now
# oc get no
NAME                                         STATUS   ROLES           AGE   VERSION
ip-10-0-139-220.us-east-2.compute.internal   Ready    master,worker   64m   v1.21.0-rc.0+ee60d07
# oc get pod -n openshift-monitoring
NAME                                           READY   STATUS    RESTARTS   AGE
alertmanager-main-0                            5/5     Running   0          50m
cluster-monitoring-operator-77d7fdc7cb-jvvtz   2/2     Running   3          62m
grafana-6589f84cd6-7g7dv                       2/2     Running   0          50m
kube-state-metrics-69cc98557f-pgpc5            3/3     Running   0          62m
node-exporter-8dqvh                            2/2     Running   0          62m
openshift-state-metrics-5f54b4ff58-p5rjs       3/3     Running   0          62m
prometheus-adapter-789fbc4d86-6qfhl            1/1     Running   0          54m
prometheus-k8s-0                               7/7     Running   1          50m
prometheus-operator-fd77ffdd8-t22s9            2/2     Running   0          50m
telemeter-client-79976c8cd5-vkd7g              3/3     Running   0          61m
thanos-querier-9d758d9d-kh49k                  5/5     Running   0          50m

Comment 6 Simon Pasquier 2021-05-25 10:18:58 UTC

*** Bug 1963775 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2021-07-27 23:09:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438