Bug 1781948 - Ingress operator logs spurious "failed to sync ingresscontroller status" errors
Summary: Ingress operator logs spurious "failed to sync ingresscontroller status" errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.3.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.4.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks: 1781950
TreeView+ depends on / blocked
 
Reported: 2019-12-10 23:21 UTC by Miciah Dashiel Butler Masters
Modified: 2020-05-13 21:54 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The ingress operator was logging "failed to sync ingresscontroller status" errors and rapidly rerunning its synchronization loop when an ingresscontroller's "Degraded" status condition was true, even if the operator had in fact successfully updated the ingresscontroller's status. Consequence: The ingress operator logs had spurious error messages. Fix: The error handling logic was fixed to avoid the spurious error messages. Result: The "failed to sync ingresscontroller status" errors should no longer appear in the ingress operator's logs when it updates an ingresscontroller's status.
Clone Of: 1781345
: 1781950 (view as bug list)
Environment:
Last Closed: 2020-05-13 21:54:36 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 337 None closed Bug 1781948: Do not wrap errors from syncIngressControllerStatus 2020-04-27 20:45:42 UTC
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-13 21:54:38 UTC

Description Miciah Dashiel Butler Masters 2019-12-10 23:21:03 UTC
The ingress operator logs spurious "failed to sync ingresscontroller status" errors, even if it has successfully updated the ingresscontroller's status, when the ingresscontroller's "Degraded" status condition is true. 

+++ This bug was initially created as a clone of Bug #1781345 +++

Description of problem:
This on an 4.3 OCP IPI installed cluster on Azure.  When trying to run the node-vertical test where we try to deploy up to 250 gcr.io/google_containers/pause-amd64:3.0 pods per worker node in a single namespace, the ingress operator degraded and 2 worker nodes became NotReady.
This cluster is fips-enabled and using SDN network type.

root@ip-172-31-40-229: ~/openshift-scale/workloads/workloads # oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.3.0-0.nightly-2019-12-09-035405   True        False         False      148m
cloud-credential                           4.3.0-0.nightly-2019-12-09-035405   True        False         False      170m
cluster-autoscaler                         4.3.0-0.nightly-2019-12-09-035405   True        False         False      161m
console                                    4.3.0-0.nightly-2019-12-09-035405   True        False         False      156m
dns                                        4.3.0-0.nightly-2019-12-09-035405   True        False         False      166m
image-registry                             4.3.0-0.nightly-2019-12-09-035405   True        False         False      17m
ingress                                    4.3.0-0.nightly-2019-12-09-035405   False       True          True       23m
insights                                   4.3.0-0.nightly-2019-12-09-035405   True        False         False      167m
kube-apiserver                             4.3.0-0.nightly-2019-12-09-035405   True        False         False      165m
kube-controller-manager                    4.3.0-0.nightly-2019-12-09-035405   True        False         False      164m
kube-scheduler                             4.3.0-0.nightly-2019-12-09-035405   True        False         False      163m
machine-api                                4.3.0-0.nightly-2019-12-09-035405   True        False         False      166m
machine-config                             4.3.0-0.nightly-2019-12-09-035405   True        False         False      161m
marketplace                                4.3.0-0.nightly-2019-12-09-035405   True        False         False      162m
monitoring                                 4.3.0-0.nightly-2019-12-09-035405   False       True          True       22m
network                                    4.3.0-0.nightly-2019-12-09-035405   True        True          True       165m
node-tuning                                4.3.0-0.nightly-2019-12-09-035405   True        False         False      162m
openshift-apiserver                        4.3.0-0.nightly-2019-12-09-035405   True        False         False      161m
openshift-controller-manager               4.3.0-0.nightly-2019-12-09-035405   True        False         False      165m
openshift-samples                          4.3.0-0.nightly-2019-12-09-035405   True        False         False      161m
operator-lifecycle-manager                 4.3.0-0.nightly-2019-12-09-035405   True        False         False      166m
operator-lifecycle-manager-catalog         4.3.0-0.nightly-2019-12-09-035405   True        False         False      166m
operator-lifecycle-manager-packageserver   4.3.0-0.nightly-2019-12-09-035405   True        False         False      162m
service-ca                                 4.3.0-0.nightly-2019-12-09-035405   True        False         False      167m
service-catalog-apiserver                  4.3.0-0.nightly-2019-12-09-035405   True        False         False      164m
service-catalog-controller-manager         4.3.0-0.nightly-2019-12-09-035405   True        False         False      164m
storage                                    4.3.0-0.nightly-2019-12-09-035405   True        False         False      162m
root@ip-172-31-40-229: ~/openshift-scale/workloads/workloads # 


In openshift-ingress-operator logs, I am seeing:

2019-12-09T15:48:01.016Z        ERROR   operator.init.controller-runtime.controller     controller/controller.go:218    Reconciler error        {"controller": "ingress_controller", "request": "openshift-ingress-operator/default", "error": "failed to sync ingresscontroller status: IngressController is degraded", "errorCauses": [{"error": "failed to sync ingresscontroller status: IngressController is degraded"}]}

[...]

Comment 2 Hongan Li 2019-12-23 03:25:35 UTC
verified with 4.4.0-0.nightly-2019-12-20-210709

Comment 4 errata-xmlrpc 2020-05-13 21:54:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.