Bug 1829584 - Ingress operator should log the reason when an IngressController is in degraded state
Summary: Ingress operator should log the reason when an IngressController is in degrad...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.5
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.5.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-29 20:08 UTC by Miciah Dashiel Butler Masters
Modified: 2020-07-13 17:33 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1827152
Environment:
Last Closed: 2020-07-13 17:32:51 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 395 0 None closed Bug 1829584: computeDeploymentDegradedCondition: Better errors 2021-02-15 04:14:22 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:33:14 UTC

Description Miciah Dashiel Butler Masters 2020-04-29 20:08:46 UTC
+++ This bug was initially created as a clone of Bug #1827152 +++

The ingress operator logs show the following:

    2020-04-23T08:50:21.106Z	INFO	operator.ingress_controller	ingress/controller.go:142	reconciling	{"request": "openshift-ingress-operator/default"}
    2020-04-23T08:50:21.161Z	INFO	operator.ingress_controller	ingress/deployment.go:805	updated router deployment	{"namespace": "openshift-ingress", "name": "router-default"}
    2020-04-23T08:50:21.220Z	ERROR	operator.ingress_controller	ingress/controller.go:209	got retryable error; requeueing	{"after": "1m0s", "error": "IngressController is degraded"}

We could benefit from better visibility into why it took the ingress controller 20 minutes to become ready; I'll spin off a Bugzilla report for that.

Comment 1 Miciah Dashiel Butler Masters 2020-05-08 19:28:10 UTC
I plan to make a PR to repeat the status condition message in the log message.

Comment 2 Daneyon Hansen 2020-05-14 00:34:53 UTC
I hit the same issue today while I was testing changes on a locally running ingress operator:

2020-05-13T16:56:06.575-0700	ERROR	operator.ingress_controller	ingress/controller.go:209	got retryable error; requeueing	{"after": "1m0s", "error": "IngressController is degraded"}

Comment 5 Arvind iyengar 2020-05-18 11:20:44 UTC
This feature has been tested in "4.5.0-0.nightly-2020-05-14-190315" which has the code patch from the PR merge. With this fix, we could see additional details/flags being added when the ingress goes in degraded state: 

* log Excerpt from patched version: 
-----
2020-05-18T10:34:35.328Z        INFO    operator.status_controller      status/controller.go:90 Reconciling     {"request": "openshift-ingress-operator/internalapps5"}
2020-05-18T10:34:35.346Z        DEBUG   operator.init.controller-runtime.controller     controller/controller.go:282    Successfully Reconciled {"controller": "status_controller", "request": "openshift-ingress-operator/internalapps5"}
2020-05-18T10:34:35.428Z        INFO    operator.ingress_controller     ingress/deployment.go:805       updated router deployment       {"namespace": "openshift-ingress", "name": "router-internalapps5"}
2020-05-18T10:34:35.563Z        INFO    operator.status_controller      status/controller.go:90 Reconciling     {"request": "openshift-ingress-operator/internalapps5"}
2020-05-18T10:34:35.572Z        ERROR   operator.ingress_controller     ingress/controller.go:209       got retryable error; requeueing {"after": "29.999979646s", "error": "IngressController may become degraded soon: DeploymentDegraded=True, LoadBalancerReady=False"}  <------

2020-05-18T10:35:08.058Z        DEBUG   operator.init.controller-runtime.controller     controller/controller.go:282    Successfully Reconciled {"controller": "status_controller", "request": "openshift-ingress-operator/internalapps5"}
2020-05-18T10:35:08.112Z        INFO    operator.ingress_controller     ingress/deployment.go:805       updated router deployment       {"namespace": "openshift-ingress", "name": "router-internalapps5"}
2020-05-18T10:35:08.194Z        ERROR   operator.ingress_controller     ingress/controller.go:209       got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: DNSReady=False"} <-------
2020-05-18T10:35:08.194Z        INFO    operator.ingress_controller     ingress/controller.go:142       reconciling     {"request": "openshift-ingress-operator/internalapps5"}
-----

* Logs excerpts from unpatched version: 
-----
2020-05-18T10:34:48.928Z        INFO    operator.ingress_controller     ingress/metrics.go:30   created router stats secret     {"namespace": "openshift-ingress", "name": "router-stats-internalapps5"}
2020-05-18T10:34:48.965Z        INFO    operator.ingress_controller     ingress/monitoring.go:36        created servicemonitor  {"namespace": "openshift-ingress", "name": "router-internalapps5"}
2020-05-18T10:34:48.997Z        ERROR   operator.ingress_controller     ingress/controller.go:209       got retryable error; requeueing {"after": "29.999981532s", "error": "IngressController may become degraded soon"} <---

2020-05-18T10:35:22.064Z        DEBUG   operator.init.controller-runtime.controller     controller/controller.go:282    Successfully Reconciled {"controller": "status_controller", "request": "openshift-ingress-operator/internalapps5"}
2020-05-18T10:35:22.106Z        INFO    operator.ingress_controller     ingress/deployment.go:805       updated router deployment       {"namespace": "openshift-ingress", "name": "router-internalapps5"}
2020-05-18T10:35:22.167Z        ERROR   operator.ingress_controller     ingress/controller.go:209       got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded"} <---
-----

Comment 6 errata-xmlrpc 2020-07-13 17:32:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.