1829584 – Ingress operator should log the reason when an IngressController is in degraded state

Bug 1829584 - Ingress operator should log the reason when an IngressController is in degraded state

Summary: Ingress operator should log the reason when an IngressController is in degrad...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Miciah Dashiel Butler Masters
QA Contact:	Arvind iyengar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-29 20:08 UTC by Miciah Dashiel Butler Masters
Modified:	2023-10-06 19:49 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1827152
Environment:
Last Closed:	2020-07-13 17:32:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 395	0	None	closed	Bug 1829584: computeDeploymentDegradedCondition: Better errors	2021-02-15 04:14:22 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:33:14 UTC

Description Miciah Dashiel Butler Masters 2020-04-29 20:08:46 UTC

+++ This bug was initially created as a clone of Bug #1827152 +++

The ingress operator logs show the following:

    2020-04-23T08:50:21.106Z	INFO	operator.ingress_controller	ingress/controller.go:142	reconciling	{"request": "openshift-ingress-operator/default"}
    2020-04-23T08:50:21.161Z	INFO	operator.ingress_controller	ingress/deployment.go:805	updated router deployment	{"namespace": "openshift-ingress", "name": "router-default"}
    2020-04-23T08:50:21.220Z	ERROR	operator.ingress_controller	ingress/controller.go:209	got retryable error; requeueing	{"after": "1m0s", "error": "IngressController is degraded"}

We could benefit from better visibility into why it took the ingress controller 20 minutes to become ready; I'll spin off a Bugzilla report for that.

Comment 1 Miciah Dashiel Butler Masters 2020-05-08 19:28:10 UTC

I plan to make a PR to repeat the status condition message in the log message.

Comment 2 Daneyon Hansen 2020-05-14 00:34:53 UTC

I hit the same issue today while I was testing changes on a locally running ingress operator:

2020-05-13T16:56:06.575-0700	ERROR	operator.ingress_controller	ingress/controller.go:209	got retryable error; requeueing	{"after": "1m0s", "error": "IngressController is degraded"}

Comment 5 Arvind iyengar 2020-05-18 11:20:44 UTC

This feature has been tested in "4.5.0-0.nightly-2020-05-14-190315" which has the code patch from the PR merge. With this fix, we could see additional details/flags being added when the ingress goes in degraded state: 

* log Excerpt from patched version: 
-----
2020-05-18T10:34:35.328Z        INFO    operator.status_controller      status/controller.go:90 Reconciling     {"request": "openshift-ingress-operator/internalapps5"}
2020-05-18T10:34:35.346Z        DEBUG   operator.init.controller-runtime.controller     controller/controller.go:282    Successfully Reconciled {"controller": "status_controller", "request": "openshift-ingress-operator/internalapps5"}
2020-05-18T10:34:35.428Z        INFO    operator.ingress_controller     ingress/deployment.go:805       updated router deployment       {"namespace": "openshift-ingress", "name": "router-internalapps5"}
2020-05-18T10:34:35.563Z        INFO    operator.status_controller      status/controller.go:90 Reconciling     {"request": "openshift-ingress-operator/internalapps5"}
2020-05-18T10:34:35.572Z        ERROR   operator.ingress_controller     ingress/controller.go:209       got retryable error; requeueing {"after": "29.999979646s", "error": "IngressController may become degraded soon: DeploymentDegraded=True, LoadBalancerReady=False"}  <------

2020-05-18T10:35:08.058Z        DEBUG   operator.init.controller-runtime.controller     controller/controller.go:282    Successfully Reconciled {"controller": "status_controller", "request": "openshift-ingress-operator/internalapps5"}
2020-05-18T10:35:08.112Z        INFO    operator.ingress_controller     ingress/deployment.go:805       updated router deployment       {"namespace": "openshift-ingress", "name": "router-internalapps5"}
2020-05-18T10:35:08.194Z        ERROR   operator.ingress_controller     ingress/controller.go:209       got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: DNSReady=False"} <-------
2020-05-18T10:35:08.194Z        INFO    operator.ingress_controller     ingress/controller.go:142       reconciling     {"request": "openshift-ingress-operator/internalapps5"}
-----

* Logs excerpts from unpatched version: 
-----
2020-05-18T10:34:48.928Z        INFO    operator.ingress_controller     ingress/metrics.go:30   created router stats secret     {"namespace": "openshift-ingress", "name": "router-stats-internalapps5"}
2020-05-18T10:34:48.965Z        INFO    operator.ingress_controller     ingress/monitoring.go:36        created servicemonitor  {"namespace": "openshift-ingress", "name": "router-internalapps5"}
2020-05-18T10:34:48.997Z        ERROR   operator.ingress_controller     ingress/controller.go:209       got retryable error; requeueing {"after": "29.999981532s", "error": "IngressController may become degraded soon"} <---

2020-05-18T10:35:22.064Z        DEBUG   operator.init.controller-runtime.controller     controller/controller.go:282    Successfully Reconciled {"controller": "status_controller", "request": "openshift-ingress-operator/internalapps5"}
2020-05-18T10:35:22.106Z        INFO    operator.ingress_controller     ingress/deployment.go:805       updated router deployment       {"namespace": "openshift-ingress", "name": "router-internalapps5"}
2020-05-18T10:35:22.167Z        ERROR   operator.ingress_controller     ingress/controller.go:209       got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded"} <---
-----

Comment 6 errata-xmlrpc 2020-07-13 17:32:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.