Bug 1963833 - Cluster monitoring operator crashlooping on single node clusters due to segfault
Summary: Cluster monitoring operator crashlooping on single node clusters due to segfault
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.8
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.8.0
Assignee: Omer Tuchfeld
QA Contact: Junqi Zhao
: 1963775 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2021-05-24 07:10 UTC by Omer Tuchfeld
Modified: 2021-07-27 23:10 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2021-07-27 23:09:50 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1176 0 None open Bug 1963833: don't attempt to delete nil PodDisruptionBudget object 2021-05-24 07:51:59 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:10:03 UTC

Description Omer Tuchfeld 2021-05-24 07:10:08 UTC
Description of problem:
Cluster monitoring operator crashlooping on single node clusters due to segfault

Version-Release number of selected component (if applicable):

How reproducible:
100% of single node clusters, see test grid starting 2021/05/21, link [1]

Steps to Reproduce:
Single node live ISO CI, e.g. [3]

Actual results:
Monitoring operator pod crashlooping

Expected results:
Monitoring operator pod not crashlooping

Additional info:
Seems to come from this PR [2]

[1] https://testgrid.k8s.io/redhat-single-node#periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso

[2] https://github.com/openshift/cluster-monitoring-operator/pull/1151#issuecomment-846809313 

[3] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-single-node-live-iso/1395892352853217280

Comment 1 Junqi Zhao 2021-05-24 07:46:35 UTC
same bug as bug 1963775

Comment 5 Junqi Zhao 2021-05-25 07:40:12 UTC
checked with 4.8.0-0.nightly-2021-05-25-041803, cluster-monitoring-operator pod is normal now
# oc get no
NAME                                         STATUS   ROLES           AGE   VERSION
ip-10-0-139-220.us-east-2.compute.internal   Ready    master,worker   64m   v1.21.0-rc.0+ee60d07
# oc get pod -n openshift-monitoring
NAME                                           READY   STATUS    RESTARTS   AGE
alertmanager-main-0                            5/5     Running   0          50m
cluster-monitoring-operator-77d7fdc7cb-jvvtz   2/2     Running   3          62m
grafana-6589f84cd6-7g7dv                       2/2     Running   0          50m
kube-state-metrics-69cc98557f-pgpc5            3/3     Running   0          62m
node-exporter-8dqvh                            2/2     Running   0          62m
openshift-state-metrics-5f54b4ff58-p5rjs       3/3     Running   0          62m
prometheus-adapter-789fbc4d86-6qfhl            1/1     Running   0          54m
prometheus-k8s-0                               7/7     Running   1          50m
prometheus-operator-fd77ffdd8-t22s9            2/2     Running   0          50m
telemeter-client-79976c8cd5-vkd7g              3/3     Running   0          61m
thanos-querier-9d758d9d-kh49k                  5/5     Running   0          50m

Comment 6 Simon Pasquier 2021-05-25 10:18:58 UTC
*** Bug 1963775 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2021-07-27 23:09:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.