Bug 2075671
Summary: | Cluster Ingress Operator K8S API cache contains duplicate objects | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Grant Spence <gspence> |
Component: | Networking | Assignee: | Grant Spence <gspence> |
Networking sub component: | router | QA Contact: | Melvin Joseph <mjoseph> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | aos-bugs, hongli, mmasters |
Version: | 4.11 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: In cluster-ingress-operator code, it called cache.MultiNamespacedCacheBuilder with a parameter of "". This added all kubernetes objects into the cache.
Consequence: The cache was 2x to 3x larger than it should be and the cache contained duplicate objects. The duplication caused the ingress operator to do more reconcilations than it needed to do.
Fix: Remove the "" from cache.MultiNamespacedCacheBuilder initialization.
Result: The cache contains less objects which decreases the memory utilization of the ingress-operator and the ingress-operator is more efficient because it isn't getting extra reconcilation requests.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 11:07:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 4
Melvin Joseph
2022-04-26 12:43:32 UTC
melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-06-025509 True False 3h15m Cluster version is 4.11.0-0.nightly-2022-06-06-025509 melvinjoseph@mjoseph-mac Downloads % Steps to Reproduce (in detail): 1. Create a vanilla cluster 2. Open up the console to the cluster and go to observe -> metrics. Turn on auto-refresh to 15 seconds. 3. apply the following as the query: controller_runtime_reconcile_total{controller="ingress_controller"} 4. Reset the counter by scaling down/up the ingress controller from 1->0->1 melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0 deployment.apps/ingress-operator scaled melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1 deployment.apps/ingress-operator scaled 5. oc apply the following: apiVersion: v1 items: - apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: loadbalancer namespace: openshift-ingress-operator spec: domain: test-apps.gspence-2022-02-14-1013.gcp.devcluster.openshift.com endpointPublishingStrategy: type: Private replicas: 1 nodePlacement: nodeSelector: matchLabels: node-role.kubernetes.io/worker: "" status: {} kind: List metadata: resourceVersion: "" selfLink: "" melvinjoseph@mjoseph-mac Downloads % oc create -f testfile.yaml ingresscontroller.operator.openshift.io/loadbalancer created 6. Then observe the counters for reconciles increase. Wait about 2 minutes for it to settle to a steady state. You should see the following numbers in the metrics counter: result = error: 0 result = requeue: 2 result = requeue_after: 7 result = success: 4  1. Use the prometheus query: container_memory_usage_bytes{container="ingress-operator"} 2. Reset the query by scaling down/up the ingress controller from 1->0->1 melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0 deployment.apps/ingress-operator scaled melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=1 deployment.apps/ingress-operator scaled 3. The memory should be between 35mb to 50mb. value:- 46178304 Hence marking as verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |