Bug 2075671

Summary:	Cluster Ingress Operator K8S API cache contains duplicate objects
Product:	OpenShift Container Platform	Reporter:	Grant Spence <gspence>
Component:	Networking	Assignee:	Grant Spence <gspence>
Networking sub component:	router	QA Contact:	Melvin Joseph <mjoseph>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	aos-bugs, hongli, mmasters
Version:	4.11
Target Milestone:	---
Target Release:	4.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: In cluster-ingress-operator code, it called cache.MultiNamespacedCacheBuilder with a parameter of "". This added all kubernetes objects into the cache. Consequence: The cache was 2x to 3x larger than it should be and the cache contained duplicate objects. The duplication caused the ingress operator to do more reconcilations than it needed to do. Fix: Remove the "" from cache.MultiNamespacedCacheBuilder initialization. Result: The cache contains less objects which decreases the memory utilization of the ingress-operator and the ingress-operator is more efficient because it isn't getting extra reconcilation requests.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-08-10 11:07:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 4 Melvin Joseph 2022-04-26 12:43:32 UTC

melvinjoseph@mjoseph-mac Downloads %    oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-24-135651   True        False         8h      Cluster version is 4.11.0-0.nightly-2022-04-24-135651

Steps to Reproduce (in detail):
1. Create a vanilla cluster
2. Open up the console to the cluster and go to observe -> metrics. Turn on auto-refresh to 15 seconds.
3. apply the following as the query:
   controller_runtime_reconcile_total{controller="ingress_controller"}
4. Reset the counter by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
5. oc apply the following:
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: loadbalancer
    namespace: openshift-ingress-operator
  spec:
    domain: test-apps.gspence-2022-02-14-1013.gcp.devcluster.openshift.com
    endpointPublishingStrategy:
      type: Private
    replicas: 1
    nodePlacement:
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker: ""
  status: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
melvinjoseph@mjoseph-mac Downloads % oc create -f testfile.yaml
ingresscontroller.operator.openshift.io/loadbalancer created

6. Then observe the counters for reconciles increase. Wait about 2 minutes for it to settle to a steady state. You should see the following numbers in the metrics counter:
result = error: 0
result = requeue: 2
result = requeue_after: 6
result = success: 4


1. Use the prometheus query:
container_memory_usage_bytes{container="ingress-operator"}
2. Reset the query by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
3. The memory should be between 35mb to 50mb. 
value:-	41111552

Hence marking as verified

Comment 8 Melvin Joseph 2022-06-07 07:38:00 UTC

melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-06-025509   True        False         3h15m   Cluster version is 4.11.0-0.nightly-2022-06-06-025509
melvinjoseph@mjoseph-mac Downloads % 

Steps to Reproduce (in detail):
1. Create a vanilla cluster
2. Open up the console to the cluster and go to observe -> metrics. Turn on auto-refresh to 15 seconds.
3. apply the following as the query:
   controller_runtime_reconcile_total{controller="ingress_controller"}
4. Reset the counter by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
5. oc apply the following:
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: loadbalancer
    namespace: openshift-ingress-operator
  spec:
    domain: test-apps.gspence-2022-02-14-1013.gcp.devcluster.openshift.com
    endpointPublishingStrategy:
      type: Private
    replicas: 1
    nodePlacement:
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker: ""
  status: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
melvinjoseph@mjoseph-mac Downloads % oc create -f testfile.yaml
ingresscontroller.operator.openshift.io/loadbalancer created

6. Then observe the counters for reconciles increase. Wait about 2 minutes for it to settle to a steady state. You should see the following numbers in the metrics counter:
result = error: 0
result = requeue: 2
result = requeue_after: 7
result = success: 4


1. Use the prometheus query:
container_memory_usage_bytes{container="ingress-operator"}
2. Reset the query by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
3. The memory should be between 35mb to 50mb. 
value:-	46178304

Hence marking as verified

Comment 10 errata-xmlrpc 2022-08-10 11:07:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069