Bug 2111167 - Project auth cache is fully invalidated on changes to namespaces and namespaced RBAC
Summary: Project auth cache is fully invalidated on changes to namespaces and namespac...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-apiserver
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.11.z
Assignee: Abu Kashem
QA Contact: Rahul Gangwar
URL:
Whiteboard:
Depends On: 2111165
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-26 16:02 UTC by Ben Luddy
Modified: 2023-03-24 08:01 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2111165
Environment:
Last Closed: 2023-01-16 11:57:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ben Luddy 2022-07-26 16:02:10 UTC
+++ This bug was initially created as a clone of Bug #2111165 +++

Description of problem:

The Openshift API server maintains a cache used to scope project list and watch requests to namespaces that are visible to the requesting user. A periodic task runs in each openshift-apiserver process and updates the cache when namespaces, roles, rolebindings, clusterroles, or clusterrolebindings change. The cache can be updated in parts when a namespace, role, or rolebinding changes because the effect of these resources is limited to specific namespaces. Changes to clusterroles and clusterrolebindings perform a full invalidation, since they may impact any or all namespaces.

Today, the cache sync task is always performing a full invalidation, which is particularly expensive on clusters with many namespaces.

Version-Release number of selected component (if applicable): 4

How reproducible: Always

Steps to Reproduce:

It's difficult to observe directly, because the full invalidation still produces the correct behavior, but the secondary effect of increased CPU consumption in all openshift-apiserver processes is easy to observe.

1a. Create 100 namespaces (not necessary, but it makes the effect more obvious).

1b. Repeatedly update a namespace about once per second (suggest patching an annotation with a current timestamp as the value).

$ while true; do sleep 1; kubectl annotate namespace default --overwrite "timestamp=$(date)"; done

3. While continuing to update the namespace, monitor the CPU utilization metrics for openshift-apiserver.

rate(container_cpu_usage_seconds_total{namespace="openshift-apiserver",container="openshift-apiserver"}[1m])

Actual results:

Significant cpu utilization increase over idle. At least doubling, and I see about a 6-7x increase on a cluster with 1000 namespaces.

Expected results:

Little or no cpu utilization change.

Comment 1 Ilya Buziuk 2022-10-22 12:53:58 UTC
Hello, could you please clarify in which OpenShift 4.11 z-stream it is going to be backported? The issue is currently affecting Developer Sandbox clusters https://developers.redhat.com/developer-sandbox and Dev Spaces / workspaces.openshift.com

Comment 2 Michal Fojtik 2023-01-16 11:57:46 UTC
Dear reporter, we greatly appreciate the bug you have reported here. Unfortunately, due to migration to a new issue-tracking system (https://issues.redhat.com/), we cannot continue triaging bugs reported in Bugzilla. Since this bug has been stale for multiple days, we, therefore, decided to close this bug.
If you think this is a mistake or this bug has a higher priority or severity as set today, please feel free to reopen this bug and tell us why. We are going to move every re-opened bug to https://issues.redhat.com. 

Thank you for your patience and understanding.


Note You need to log in before you can comment on or make changes to this bug.