2075671 – Cluster Ingress Operator K8S API cache contains duplicate objects

Bug 2075671 - Cluster Ingress Operator K8S API cache contains duplicate objects

Summary: Cluster Ingress Operator K8S API cache contains duplicate objects

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Grant Spence
QA Contact:	Melvin Joseph
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-14 20:02 UTC by Grant Spence
Modified:	2022-08-10 11:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: In cluster-ingress-operator code, it called cache.MultiNamespacedCacheBuilder with a parameter of "". This added all kubernetes objects into the cache. Consequence: The cache was 2x to 3x larger than it should be and the cache contained duplicate objects. The duplication caused the ingress operator to do more reconcilations than it needed to do. Fix: Remove the "" from cache.MultiNamespacedCacheBuilder initialization. Result: The cache contains less objects which decreases the memory utilization of the ingress-operator and the ingress-operator is more efficient because it isn't getting extra reconcilation requests.
Clone Of:
Environment:
Last Closed:	2022-08-10 11:07:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 740	None	Merged	Bug 2075671: Fix k8s client cache object global inclusion and duplication.	2022-06-03 21:00:35 UTC
Github	openshift cluster-ingress-operator pull 764	None	Merged	Bug 2075671: Add a e2e test that verifies the ingress operator's cache doesn't include everything	2022-06-07 16:35:30 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-10 11:07:44 UTC

Comment 4 Melvin Joseph 2022-04-26 12:43:32 UTC

melvinjoseph@mjoseph-mac Downloads %    oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-24-135651   True        False         8h      Cluster version is 4.11.0-0.nightly-2022-04-24-135651

Steps to Reproduce (in detail):
1. Create a vanilla cluster
2. Open up the console to the cluster and go to observe -> metrics. Turn on auto-refresh to 15 seconds.
3. apply the following as the query:
   controller_runtime_reconcile_total{controller="ingress_controller"}
4. Reset the counter by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
5. oc apply the following:
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: loadbalancer
    namespace: openshift-ingress-operator
  spec:
    domain: test-apps.gspence-2022-02-14-1013.gcp.devcluster.openshift.com
    endpointPublishingStrategy:
      type: Private
    replicas: 1
    nodePlacement:
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker: ""
  status: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
melvinjoseph@mjoseph-mac Downloads % oc create -f testfile.yaml
ingresscontroller.operator.openshift.io/loadbalancer created

6. Then observe the counters for reconciles increase. Wait about 2 minutes for it to settle to a steady state. You should see the following numbers in the metrics counter:
result = error: 0
result = requeue: 2
result = requeue_after: 6
result = success: 4


1. Use the prometheus query:
container_memory_usage_bytes{container="ingress-operator"}
2. Reset the query by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
3. The memory should be between 35mb to 50mb. 
value:-	41111552

Hence marking as verified

Comment 8 Melvin Joseph 2022-06-07 07:38:00 UTC

melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-06-025509   True        False         3h15m   Cluster version is 4.11.0-0.nightly-2022-06-06-025509
melvinjoseph@mjoseph-mac Downloads % 

Steps to Reproduce (in detail):
1. Create a vanilla cluster
2. Open up the console to the cluster and go to observe -> metrics. Turn on auto-refresh to 15 seconds.
3. apply the following as the query:
   controller_runtime_reconcile_total{controller="ingress_controller"}
4. Reset the counter by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
5. oc apply the following:
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: loadbalancer
    namespace: openshift-ingress-operator
  spec:
    domain: test-apps.gspence-2022-02-14-1013.gcp.devcluster.openshift.com
    endpointPublishingStrategy:
      type: Private
    replicas: 1
    nodePlacement:
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker: ""
  status: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
melvinjoseph@mjoseph-mac Downloads % oc create -f testfile.yaml
ingresscontroller.operator.openshift.io/loadbalancer created

6. Then observe the counters for reconciles increase. Wait about 2 minutes for it to settle to a steady state. You should see the following numbers in the metrics counter:
result = error: 0
result = requeue: 2
result = requeue_after: 7
result = success: 4


1. Use the prometheus query:
container_memory_usage_bytes{container="ingress-operator"}
2. Reset the query by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
3. The memory should be between 35mb to 50mb. 
value:-	46178304

Hence marking as verified

Comment 10 errata-xmlrpc 2022-08-10 11:07:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.