Bug 1858798 - rangeallocations.data is never updated when a project is removed
Summary: rangeallocations.data is never updated when a project is removed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.5.z
Assignee: Maciej Szulik
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On: 1808588
Blocks: 1858800
TreeView+ depends on / blocked
 
Reported: 2020-07-20 12:38 UTC by Maciej Szulik
Modified: 2020-08-10 13:50 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: UID range allocation is never updated when a project is removed. Only restarting kube-controller-manager pod was triggering repair procedure which was clearing that range. Consequence: It is possible to exhaust the UID range on cluster with high namespace create+remove turnover. Fix: Periodically run the repair job. Result: The UID range allocation should be freed periodically (currently every 8 hours) which should not require additional kube-controller-manager restarts. It should also ensure that the range is not exhausted.
Clone Of: 1808588
: 1858800 (view as bug list)
Environment:
Last Closed: 2020-08-10 13:50:20 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-policy-controller pull 32 None closed [release-4.5] Bug 1858798: add UID deallocation logic 2020-09-18 17:06:57 UTC
Red Hat Product Errata RHBA-2020:3188 None None None 2020-08-10 13:50:39 UTC

Comment 3 RamaKasturi 2020-08-02 16:56:12 UTC
Tried verifying the bug on 4.5.0-0.nightly-2020-07-29-051236 and i see that i could create more than 10k projects and when the projects are created rangeallocations.data gets updated also when deletes happen rangeallocations.data gets deleted, but i see that value for (oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l) before project creation and value after project deletion does not match i.e after 10K project deletion i see the value to be zero for the above query. 

@Tomas could you please confirm if the above is expected ? Thanks !! Raising needinfo on you since maciej is on leave.

Comment 4 Tomáš Nožička 2020-08-03 11:59:29 UTC
we've synced offline, there is 8 hours reclaim period

Comment 5 RamaKasturi 2020-08-03 16:35:29 UTC
Verified the bug with the payload below 

[ramakasturinarra@dhcp35-60 cucushift]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-07-29-051236   True        False         35h     Cluster version is 4.5.0-0.nightly-2020-07-29-051236

Below are the steps which i have followed to verify the bug:
===========================================================
1) Install 4.5 cluster with the above payload
2) check rangeallocations.data by running the command below

[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
16

3) Created 10K projects
4) Ran the command below to check the rangeallocations.data

[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l 
1707

4) Deleted all the 10K projects and i see that the rangeallocations.data goes down to original value which is 16 after 4 hours of project deletions.
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
16

P.s: Looks like reclaim period is 8 hours for rangeallocations.data to go come back to its original value once all the created projects are deleted. For more info please refer comment 4, also in QE test it took around 4-5 hours for the same.

Based on the above moving the bug to verified state, thanks !!

Comment 6 RamaKasturi 2020-08-04 15:37:34 UTC
Tried to test the bug again and this time i do not see any delay i.e need not have to wait for 8 hours for rangeallocations.data to be updated once all projects are deleted.

Before 10K project creation:
=================================
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | wc -l
58
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
16
[ramakasturinarra@dhcp35-60 cucushift]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-08-01-204100   True        False         12m     Cluster version is 4.5.0-0.nightly-2020-08-01-204100

After 10K project creation:
===================================
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | wc -l
10099
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | grep knarra | wc -l
10042
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
2011

After 10K project deletion:
=================================
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
19
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | wc -l
58

Comment 8 errata-xmlrpc 2020-08-10 13:50:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.5 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3188


Note You need to log in before you can comment on or make changes to this bug.