Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1717494 - ingress operator fails to integrate with metrics and stops syncing status
Summary: ingress operator fails to integrate with metrics and stops syncing status
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.0
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-05 15:22 UTC by Dan Mace
Modified: 2019-10-16 06:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:31:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 244 0 None closed Bug 1717494: Refactor client and cache handling 2020-10-14 14:17:52 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:31:26 UTC

Description Dan Mace 2019-06-05 15:22:13 UTC
Description of problem:

The ingress operator was observed failing to integrate with metrics and reporting status sync errors, and the only recovery was to restart the operator. This is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1687640, which we thought was fixed, but the fix was not effective in all cases.

Example error output from an Azure cluster:


2019-06-05T13:16:43.0651753Z 2019-06-05T13:16:43.065Z	ERROR	operator.init.controller-runtime.controller	controller/controller.go:217	Reconciler error	{"controller": "operator-controller", "request": "openshift-ingress-operator/default", "error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"", "errorCauses": [{"error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\""}]}


Version-Release number of selected component (if applicable):

4.2.0-0.ci-2019-06-04-085838

How reproducible:

Create a new cluster with the installer.

Actual results:

Sometimes, the ingress operator will report those errors indefinitely depending on the order in which the operator starts relative to the prometheus operator.

Expected results:

The operator should eventually fix itself.

Additional info:

Comment 2 Hongan Li 2019-06-25 10:10:45 UTC
verified with 4.2.0-0.nightly-2019-06-25-003324 and the issue has been fixed.
Ingress operator reported the errors but eventually operator fix itself.

$ oc -n openshift-ingress get servicemonitor
NAME             AGE
router-default   3h42m


2019-06-25T06:22:02.743Z	ERROR	operator.init.controller-runtime.controller	controller/controller.go:212	Reconciler error	{"controller": "ingress_controller", "request": "openshift-ingress-operator/default", "error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"", "errorCauses": [{"error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\""}]}
<---snip--->
2019-06-25T06:24:03.512Z	ERROR	operator.init.controller-runtime.controller	controller/controller.go:212	Reconciler error	{"controller": "ingress_controller", "request": "openshift-ingress-operator/default", "error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"", "errorCauses": [{"error": "failed to ensure ingresscontroller: failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\""}]}

2019-06-25T06:24:31.702Z	INFO	operator.controller	controller/monitoring.go:30	created servicemonitor	{"namespace": "openshift-ingress", "name": "router-default"}

Comment 4 errata-xmlrpc 2019-10-16 06:31:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.