Bug 1724248 - Cluster Monitoring Operator fails to complete - reconciling Cluster Monitoring Operator ServiceMonitor failed
Summary: Cluster Monitoring Operator fails to complete - reconciling Cluster Monitorin...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.0
Assignee: Lili Cosic
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-26 14:53 UTC by Mark McLoughlin
Modified: 2019-11-19 13:26 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:32:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 451 0 'None' closed Bug 1724248: Schedule prometheus-operator on master nodes 2021-02-18 16:48:10 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:32:53 UTC

Description Mark McLoughlin 2019-06-26 14:53:07 UTC
Description of problem:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/1259
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/3115
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/3135

The cluster-monitoring-operator is reporting this in its ClusterOperator:

Failed to rollout the stack. Error: running task Updating Cluster Monitoring Operator failed: reconciling Cluster Monitoring Operator ServiceMonitor failed: creating ServiceMonitor object failed: the server could not find the requested resource (post servicemonitors.monitoring.coreos.com)

and CVO waits for >20 minutes waiting for it to complete, then we time out


Version-Release number of selected component (if applicable):

4.2.0-0.ci-2019-06-26-011753
4.2.0-0.ci-2019-06-25-195655
4.2.0-0.nightly-2019-06-25-222454


How reproducible:

Three 4.2 based installs failed in the past 24 hours with similar signatures

Comment 1 Matthias Loibl 2019-07-02 12:29:01 UTC
Here are some more specific log lines: 
Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator": the server does not recognize this resource, check extension API servers
Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator": the server does not recognize this resource, check extension API servers

Comment 6 Sergiusz Urbaniak 2019-08-23 15:18:27 UTC
sounds good, thanks, this indeed is assigned for 4.2.0 and planned out for this sprint.

Comment 9 errata-xmlrpc 2019-10-16 06:32:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.