1963833 – Cluster monitoring operator crashlooping on single node clusters due to segfault

Bug 1963833 - Cluster monitoring operator crashlooping on single node clusters due to segfault

Summary: Cluster monitoring operator crashlooping on single node clusters due to segfault

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Omer Tuchfeld
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1963775 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-24 07:10 UTC by Omer Tuchfeld
Modified:	2021-07-27 23:10 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 23:09:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 1176	0	None	open	Bug 1963833: don't attempt to delete nil PodDisruptionBudget object	2021-05-24 07:51:59 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:10:03 UTC

Description Omer Tuchfeld 2021-05-24 07:10:08 UTC

Description of problem:
Cluster monitoring operator crashlooping on single node clusters due to segfault

Version-Release number of selected component (if applicable):
4.8-fc5

How reproducible:
100% of single node clusters, see test grid starting 2021/05/21, link [1]

Steps to Reproduce:
Single node live ISO CI, e.g. [3]

Actual results:
Monitoring operator pod crashlooping

Expected results:
Monitoring operator pod not crashlooping

Additional info:
Seems to come from this PR [2]


[1] https://testgrid.k8s.io/redhat-single-node#periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso

[2] https://github.com/openshift/cluster-monitoring-operator/pull/1151#issuecomment-846809313 

[3] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso/1395892352853217280

Comment 1 Junqi Zhao 2021-05-24 07:46:35 UTC

same bug as bug 1963775

Comment 5 Junqi Zhao 2021-05-25 07:40:12 UTC

checked with 4.8.0-0.nightly-2021-05-25-041803, cluster-monitoring-operator pod is normal now
# oc get no
NAME                                         STATUS   ROLES           AGE   VERSION
ip-10-0-139-220.us-east-2.compute.internal   Ready    master,worker   64m   v1.21.0-rc.0+ee60d07
# oc get pod -n openshift-monitoring
NAME                                           READY   STATUS    RESTARTS   AGE
alertmanager-main-0                            5/5     Running   0          50m
cluster-monitoring-operator-77d7fdc7cb-jvvtz   2/2     Running   3          62m
grafana-6589f84cd6-7g7dv                       2/2     Running   0          50m
kube-state-metrics-69cc98557f-pgpc5            3/3     Running   0          62m
node-exporter-8dqvh                            2/2     Running   0          62m
openshift-state-metrics-5f54b4ff58-p5rjs       3/3     Running   0          62m
prometheus-adapter-789fbc4d86-6qfhl            1/1     Running   0          54m
prometheus-k8s-0                               7/7     Running   1          50m
prometheus-operator-fd77ffdd8-t22s9            2/2     Running   0          50m
telemeter-client-79976c8cd5-vkd7g              3/3     Running   0          61m
thanos-querier-9d758d9d-kh49k                  5/5     Running   0          50m

Comment 6 Simon Pasquier 2021-05-25 10:18:58 UTC

*** Bug 1963775 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2021-07-27 23:09:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.