Bug 2075671 - Cluster Ingress Operator K8S API cache contains duplicate objects
Summary: Cluster Ingress Operator K8S API cache contains duplicate objects
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.11.0
Assignee: Grant Spence
QA Contact: Melvin Joseph
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-14 20:02 UTC by Grant Spence
Modified: 2022-08-10 11:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: In cluster-ingress-operator code, it called cache.MultiNamespacedCacheBuilder with a parameter of "". This added all kubernetes objects into the cache. Consequence: The cache was 2x to 3x larger than it should be and the cache contained duplicate objects. The duplication caused the ingress operator to do more reconcilations than it needed to do. Fix: Remove the "" from cache.MultiNamespacedCacheBuilder initialization. Result: The cache contains less objects which decreases the memory utilization of the ingress-operator and the ingress-operator is more efficient because it isn't getting extra reconcilation requests.
Clone Of:
Environment:
Last Closed: 2022-08-10 11:07:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 740 0 None Merged Bug 2075671: Fix k8s client cache object global inclusion and duplication. 2022-06-03 21:00:35 UTC
Github openshift cluster-ingress-operator pull 764 0 None Merged Bug 2075671: Add a e2e test that verifies the ingress operator's cache doesn't include everything 2022-06-07 16:35:30 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:07:44 UTC

Comment 4 Melvin Joseph 2022-04-26 12:43:32 UTC
melvinjoseph@mjoseph-mac Downloads %    oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-24-135651   True        False         8h      Cluster version is 4.11.0-0.nightly-2022-04-24-135651

Steps to Reproduce (in detail):
1. Create a vanilla cluster
2. Open up the console to the cluster and go to observe -> metrics. Turn on auto-refresh to 15 seconds.
3. apply the following as the query:
   controller_runtime_reconcile_total{controller="ingress_controller"}
4. Reset the counter by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
5. oc apply the following:
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: loadbalancer
    namespace: openshift-ingress-operator
  spec:
    domain: test-apps.gspence-2022-02-14-1013.gcp.devcluster.openshift.com
    endpointPublishingStrategy:
      type: Private
    replicas: 1
    nodePlacement:
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker: ""
  status: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
melvinjoseph@mjoseph-mac Downloads % oc create -f testfile.yaml
ingresscontroller.operator.openshift.io/loadbalancer created

6. Then observe the counters for reconciles increase. Wait about 2 minutes for it to settle to a steady state. You should see the following numbers in the metrics counter:
result = error: 0
result = requeue: 2
result = requeue_after: 6
result = success: 4


1. Use the prometheus query:
container_memory_usage_bytes{container="ingress-operator"}
2. Reset the query by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
3. The memory should be between 35mb to 50mb. 
value:-	41111552

Hence marking as verified

Comment 8 Melvin Joseph 2022-06-07 07:38:00 UTC
melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-06-025509   True        False         3h15m   Cluster version is 4.11.0-0.nightly-2022-06-06-025509
melvinjoseph@mjoseph-mac Downloads % 

Steps to Reproduce (in detail):
1. Create a vanilla cluster
2. Open up the console to the cluster and go to observe -> metrics. Turn on auto-refresh to 15 seconds.
3. apply the following as the query:
   controller_runtime_reconcile_total{controller="ingress_controller"}
4. Reset the counter by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
5. oc apply the following:
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: loadbalancer
    namespace: openshift-ingress-operator
  spec:
    domain: test-apps.gspence-2022-02-14-1013.gcp.devcluster.openshift.com
    endpointPublishingStrategy:
      type: Private
    replicas: 1
    nodePlacement:
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/worker: ""
  status: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
melvinjoseph@mjoseph-mac Downloads % oc create -f testfile.yaml
ingresscontroller.operator.openshift.io/loadbalancer created

6. Then observe the counters for reconciles increase. Wait about 2 minutes for it to settle to a steady state. You should see the following numbers in the metrics counter:
result = error: 0
result = requeue: 2
result = requeue_after: 7
result = success: 4


1. Use the prometheus query:
container_memory_usage_bytes{container="ingress-operator"}
2. Reset the query by scaling down/up the ingress controller from 1->0->1
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
deployment.apps/ingress-operator scaled
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1
deployment.apps/ingress-operator scaled
3. The memory should be between 35mb to 50mb. 
value:-	46178304

Hence marking as verified

Comment 10 errata-xmlrpc 2022-08-10 11:07:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.