Bug 2111167

Summary: Project auth cache is fully invalidated on changes to namespaces and namespaced RBAC
Product: OpenShift Container Platform Reporter: Ben Luddy <bluddy>
Component: openshift-apiserverAssignee: Abu Kashem <akashem>
Status: CLOSED WONTFIX QA Contact: Rahul Gangwar <rgangwar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: akashem, alkazako, amisevsk, ibuziuk, mfojtik, rgangwar, wking
Target Milestone: ---   
Target Release: 4.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2111165 Environment:
Last Closed: 2023-01-16 11:57:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2111165    
Bug Blocks:    

Description Ben Luddy 2022-07-26 16:02:10 UTC
+++ This bug was initially created as a clone of Bug #2111165 +++

Description of problem:

The Openshift API server maintains a cache used to scope project list and watch requests to namespaces that are visible to the requesting user. A periodic task runs in each openshift-apiserver process and updates the cache when namespaces, roles, rolebindings, clusterroles, or clusterrolebindings change. The cache can be updated in parts when a namespace, role, or rolebinding changes because the effect of these resources is limited to specific namespaces. Changes to clusterroles and clusterrolebindings perform a full invalidation, since they may impact any or all namespaces.

Today, the cache sync task is always performing a full invalidation, which is particularly expensive on clusters with many namespaces.

Version-Release number of selected component (if applicable): 4

How reproducible: Always

Steps to Reproduce:

It's difficult to observe directly, because the full invalidation still produces the correct behavior, but the secondary effect of increased CPU consumption in all openshift-apiserver processes is easy to observe.

1a. Create 100 namespaces (not necessary, but it makes the effect more obvious).

1b. Repeatedly update a namespace about once per second (suggest patching an annotation with a current timestamp as the value).

$ while true; do sleep 1; kubectl annotate namespace default --overwrite "timestamp=$(date)"; done

3. While continuing to update the namespace, monitor the CPU utilization metrics for openshift-apiserver.

rate(container_cpu_usage_seconds_total{namespace="openshift-apiserver",container="openshift-apiserver"}[1m])

Actual results:

Significant cpu utilization increase over idle. At least doubling, and I see about a 6-7x increase on a cluster with 1000 namespaces.

Expected results:

Little or no cpu utilization change.

Comment 1 Ilya Buziuk 2022-10-22 12:53:58 UTC
Hello, could you please clarify in which OpenShift 4.11 z-stream it is going to be backported? The issue is currently affecting Developer Sandbox clusters https://developers.redhat.com/developer-sandbox and Dev Spaces / workspaces.openshift.com

Comment 2 Michal Fojtik 2023-01-16 11:57:46 UTC
Dear reporter, we greatly appreciate the bug you have reported here. Unfortunately, due to migration to a new issue-tracking system (https://issues.redhat.com/), we cannot continue triaging bugs reported in Bugzilla. Since this bug has been stale for multiple days, we, therefore, decided to close this bug.
If you think this is a mistake or this bug has a higher priority or severity as set today, please feel free to reopen this bug and tell us why. We are going to move every re-opened bug to https://issues.redhat.com. 

Thank you for your patience and understanding.