Bug 2111165
| Summary: | Project auth cache is fully invalidated on changes to namespaces and namespaced RBAC | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ben Luddy <bluddy> | |
| Component: | openshift-apiserver | Assignee: | Abu Kashem <akashem> | |
| Status: | CLOSED ERRATA | QA Contact: | Rahul Gangwar <rgangwar> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.2.0 | CC: | mfojtik, vsolanki, wking | |
| Target Milestone: | --- | |||
| Target Release: | 4.12.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2111167 (view as bug list) | Environment: | ||
| Last Closed: | 2023-01-17 19:53:34 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2111167 | |||
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.0-0.nightly-2022-07-27-133042 True False 4h47m Cluster version is 4.12.0-0.nightly-2022-07-27-133042
CPU utilisation before creating 1000 namespace
oc adm top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
rgangwar-28t3-vngb5-master-0.c.openshift-qe.internal 680m 19% 7053Mi 51%
rgangwar-28t3-vngb5-master-1.c.openshift-qe.internal 946m 27% 8818Mi 64%
rgangwar-28t3-vngb5-master-2.c.openshift-qe.internal 1005m 28% 10001Mi 72%
rgangwar-28t3-vngb5-worker-a-5qcr8.c.openshift-qe.internal 325m 9% 3808Mi 27%
rgangwar-28t3-vngb5-worker-b-pngcz.c.openshift-qe.internal 311m 8% 3397Mi 24%
rgangwar-28t3-vngb5-worker-c-6qtpb.c.openshift-qe.internal 207m 5% 2102Mi 15%
CPU utilisation after creating 1000 namespace.
oc get namespace|grep -i "test-"|wc -l
1000
while true; do sleep 1; oc annotate namespace default --overwrite "timestamp=$(date)"; done
oc adm top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
rgangwar-28t3-vngb5-master-0.c.openshift-qe.internal 731m 20% 8259Mi 60%
rgangwar-28t3-vngb5-master-1.c.openshift-qe.internal 919m 26% 10733Mi 78%
rgangwar-28t3-vngb5-master-2.c.openshift-qe.internal 1071m 30% 12024Mi 87%
rgangwar-28t3-vngb5-worker-a-5qcr8.c.openshift-qe.internal 323m 9% 4066Mi 29%
rgangwar-28t3-vngb5-worker-b-pngcz.c.openshift-qe.internal 402m 11% 3403Mi 24%
There is no much spike in CPU utilisation
rgangwar-28t3-vngb5-worker-c-6qtpb.c.openshift-qe.internal 178m 5% 2115Mi 15%
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |
Description of problem: The Openshift API server maintains a cache used to scope project list and watch requests to namespaces that are visible to the requesting user. A periodic task runs in each openshift-apiserver process and updates the cache when namespaces, roles, rolebindings, clusterroles, or clusterrolebindings change. The cache can be updated in parts when a namespace, role, or rolebinding changes because the effect of these resources is limited to specific namespaces. Changes to clusterroles and clusterrolebindings perform a full invalidation, since they may impact any or all namespaces. Today, the cache sync task is always performing a full invalidation, which is particularly expensive on clusters with many namespaces. Version-Release number of selected component (if applicable): 4 How reproducible: Always Steps to Reproduce: It's difficult to observe directly, because the full invalidation still produces the correct behavior, but the secondary effect of increased CPU consumption in all openshift-apiserver processes is easy to observe. 1a. Create 100 namespaces (not necessary, but it makes the effect more obvious). 1b. Repeatedly update a namespace about once per second (suggest patching an annotation with a current timestamp as the value). $ while true; do sleep 1; kubectl annotate namespace default --overwrite "timestamp=$(date)"; done 3. While continuing to update the namespace, monitor the CPU utilization metrics for openshift-apiserver. rate(container_cpu_usage_seconds_total{namespace="openshift-apiserver",container="openshift-apiserver"}[1m]) Actual results: Significant cpu utilization increase over idle. At least doubling, and I see about a 6-7x increase on a cluster with 1000 namespaces. Expected results: Little or no cpu utilization change.