1717494 – ingress operator fails to integrate with metrics and stops syncing status

Bug 1717494 - ingress operator fails to integrate with metrics and stops syncing status

Summary: ingress operator fails to integrate with metrics and stops syncing status

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Dan Mace
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-06-05 15:22 UTC by Dan Mace
Modified:	2022-08-04 22:24 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:31:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 244	0	None	closed	Bug 1717494: Refactor client and cache handling	2020-10-14 14:17:52 UTC
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:31:26 UTC

Description Dan Mace 2019-06-05 15:22:13 UTC

Description of problem:

The ingress operator was observed failing to integrate with metrics and reporting status sync errors, and the only recovery was to restart the operator. This is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1687640, which we thought was fixed, but the fix was not effective in all cases.

Example error output from an Azure cluster:


2019-06-05T13:16:43.0651753Z 2019-06-05T13:16:43.065Z	ERROR	operator.init.controller-runtime.controller	controller/controller.go:217	Reconciler error	{"controller": "operator-controller", "request": "openshift-ingress-operator/default", "error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"", "errorCauses": [{"error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\""}]}


Version-Release number of selected component (if applicable):

4.2.0-0.ci-2019-06-04-085838

How reproducible:

Create a new cluster with the installer.

Actual results:

Sometimes, the ingress operator will report those errors indefinitely depending on the order in which the operator starts relative to the prometheus operator.

Expected results:

The operator should eventually fix itself.

Additional info:

Comment 2 Hongan Li 2019-06-25 10:10:45 UTC

verified with 4.2.0-0.nightly-2019-06-25-003324 and the issue has been fixed.
Ingress operator reported the errors but eventually operator fix itself.

$ oc -n openshift-ingress get servicemonitor
NAME             AGE
router-default   3h42m


2019-06-25T06:22:02.743Z	ERROR	operator.init.controller-runtime.controller	controller/controller.go:212	Reconciler error	{"controller": "ingress_controller", "request": "openshift-ingress-operator/default", "error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"", "errorCauses": [{"error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\""}]}
<---snip--->
2019-06-25T06:24:03.512Z	ERROR	operator.init.controller-runtime.controller	controller/controller.go:212	Reconciler error	{"controller": "ingress_controller", "request": "openshift-ingress-operator/default", "error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"", "errorCauses": [{"error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\""}]}

2019-06-25T06:24:31.702Z	INFO	operator.controller	controller/monitoring.go:30	created servicemonitor	{"namespace": "openshift-ingress", "name": "router-default"}

Comment 4 errata-xmlrpc 2019-10-16 06:31:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.