Bug 1724248

Summary: Cluster Monitoring Operator fails to complete - reconciling Cluster Monitoring Operator ServiceMonitor failed
Product: OpenShift Container Platform Reporter: Mark McLoughlin <markmc>
Component: MonitoringAssignee: Lili Cosic <lcosic>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: alegrand, anpicker, erooth, lserven, mloibl, pkrupa, surbania
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:32:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mark McLoughlin 2019-06-26 14:53:07 UTC
Description of problem:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/1259
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/3115
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/3135

The cluster-monitoring-operator is reporting this in its ClusterOperator:

Failed to rollout the stack. Error: running task Updating Cluster Monitoring Operator failed: reconciling Cluster Monitoring Operator ServiceMonitor failed: creating ServiceMonitor object failed: the server could not find the requested resource (post servicemonitors.monitoring.coreos.com)

and CVO waits for >20 minutes waiting for it to complete, then we time out


Version-Release number of selected component (if applicable):

4.2.0-0.ci-2019-06-26-011753
4.2.0-0.ci-2019-06-25-195655
4.2.0-0.nightly-2019-06-25-222454


How reproducible:

Three 4.2 based installs failed in the past 24 hours with similar signatures

Comment 1 Matthias Loibl 2019-07-02 12:29:01 UTC
Here are some more specific log lines: 
Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator": the server does not recognize this resource, check extension API servers
Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator": the server does not recognize this resource, check extension API servers

Comment 6 Sergiusz Urbaniak 2019-08-23 15:18:27 UTC
sounds good, thanks, this indeed is assigned for 4.2.0 and planned out for this sprint.

Comment 9 errata-xmlrpc 2019-10-16 06:32:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922