2111167 – Project auth cache is fully invalidated on changes to namespaces and namespaced RBAC

Bug 2111167 - Project auth cache is fully invalidated on changes to namespaces and namespaced RBAC

Summary: Project auth cache is fully invalidated on changes to namespaces and namespac...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	openshift-apiserver
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.11.z
Assignee:	Abu Kashem
QA Contact:	Rahul Gangwar
Docs Contact:
URL:
Whiteboard:
Depends On:	2111165
Blocks:
TreeView+	depends on / blocked

Reported:	2022-07-26 16:02 UTC by Ben Luddy
Modified:	2023-03-24 08:01 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2111165
Environment:
Last Closed:	2023-01-16 11:57:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Ben Luddy 2022-07-26 16:02:10 UTC

+++ This bug was initially created as a clone of Bug #2111165 +++

Description of problem:

The Openshift API server maintains a cache used to scope project list and watch requests to namespaces that are visible to the requesting user. A periodic task runs in each openshift-apiserver process and updates the cache when namespaces, roles, rolebindings, clusterroles, or clusterrolebindings change. The cache can be updated in parts when a namespace, role, or rolebinding changes because the effect of these resources is limited to specific namespaces. Changes to clusterroles and clusterrolebindings perform a full invalidation, since they may impact any or all namespaces.

Today, the cache sync task is always performing a full invalidation, which is particularly expensive on clusters with many namespaces.

Version-Release number of selected component (if applicable): 4

How reproducible: Always

Steps to Reproduce:

It's difficult to observe directly, because the full invalidation still produces the correct behavior, but the secondary effect of increased CPU consumption in all openshift-apiserver processes is easy to observe.

1a. Create 100 namespaces (not necessary, but it makes the effect more obvious).

1b. Repeatedly update a namespace about once per second (suggest patching an annotation with a current timestamp as the value).

$ while true; do sleep 1; kubectl annotate namespace default --overwrite "timestamp=$(date)"; done

3. While continuing to update the namespace, monitor the CPU utilization metrics for openshift-apiserver.

rate(container_cpu_usage_seconds_total{namespace="openshift-apiserver",container="openshift-apiserver"}[1m])

Actual results:

Significant cpu utilization increase over idle. At least doubling, and I see about a 6-7x increase on a cluster with 1000 namespaces.

Expected results:

Little or no cpu utilization change.

Comment 1 Ilya Buziuk 2022-10-22 12:53:58 UTC

Hello, could you please clarify in which OpenShift 4.11 z-stream it is going to be backported? The issue is currently affecting Developer Sandbox clusters https://developers.redhat.com/developer-sandbox and Dev Spaces / workspaces.openshift.com

Comment 2 Michal Fojtik 2023-01-16 11:57:46 UTC

Dear reporter, we greatly appreciate the bug you have reported here. Unfortunately, due to migration to a new issue-tracking system (https://issues.redhat.com/), we cannot continue triaging bugs reported in Bugzilla. Since this bug has been stale for multiple days, we, therefore, decided to close this bug.
If you think this is a mistake or this bug has a higher priority or severity as set today, please feel free to reopen this bug and tell us why. We are going to move every re-opened bug to https://issues.redhat.com. 

Thank you for your patience and understanding.

Note You need to log in before you can comment on or make changes to this bug.