Bug 1858798

Summary: rangeallocations.data is never updated when a project is removed
Product: OpenShift Container Platform Reporter: Maciej Szulik <maszulik>
Component: kube-controller-managerAssignee: Maciej Szulik <maszulik>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.2.zCC: aaleman, aos-bugs, arghosh, bleanhar, bmilne, bshirren, calfonso, chuffman, fhirtz, knarra, maszulik, mfojtik, oarribas, pmuller, rhowe, tnozicka, travi, vlaad, wking, yinzhou
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: UID range allocation is never updated when a project is removed. Only restarting kube-controller-manager pod was triggering repair procedure which was clearing that range. Consequence: It is possible to exhaust the UID range on cluster with high namespace create+remove turnover. Fix: Periodically run the repair job. Result: The UID range allocation should be freed periodically (currently every 8 hours) which should not require additional kube-controller-manager restarts. It should also ensure that the range is not exhausted.
Story Points: ---
Clone Of: 1808588
: 1858800 (view as bug list) Environment:
Last Closed: 2020-08-10 13:50:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1808588    
Bug Blocks: 1858800    

Comment 3 RamaKasturi 2020-08-02 16:56:12 UTC
Tried verifying the bug on 4.5.0-0.nightly-2020-07-29-051236 and i see that i could create more than 10k projects and when the projects are created rangeallocations.data gets updated also when deletes happen rangeallocations.data gets deleted, but i see that value for (oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l) before project creation and value after project deletion does not match i.e after 10K project deletion i see the value to be zero for the above query. 

@Tomas could you please confirm if the above is expected ? Thanks !! Raising needinfo on you since maciej is on leave.

Comment 4 Tomáš Nožička 2020-08-03 11:59:29 UTC
we've synced offline, there is 8 hours reclaim period

Comment 5 RamaKasturi 2020-08-03 16:35:29 UTC
Verified the bug with the payload below 

[ramakasturinarra@dhcp35-60 cucushift]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-07-29-051236   True        False         35h     Cluster version is 4.5.0-0.nightly-2020-07-29-051236

Below are the steps which i have followed to verify the bug:
===========================================================
1) Install 4.5 cluster with the above payload
2) check rangeallocations.data by running the command below

[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
16

3) Created 10K projects
4) Ran the command below to check the rangeallocations.data

[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l 
1707

4) Deleted all the 10K projects and i see that the rangeallocations.data goes down to original value which is 16 after 4 hours of project deletions.
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
16

P.s: Looks like reclaim period is 8 hours for rangeallocations.data to go come back to its original value once all the created projects are deleted. For more info please refer comment 4, also in QE test it took around 4-5 hours for the same.

Based on the above moving the bug to verified state, thanks !!

Comment 6 RamaKasturi 2020-08-04 15:37:34 UTC
Tried to test the bug again and this time i do not see any delay i.e need not have to wait for 8 hours for rangeallocations.data to be updated once all projects are deleted.

Before 10K project creation:
=================================
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | wc -l
58
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
16
[ramakasturinarra@dhcp35-60 cucushift]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-08-01-204100   True        False         12m     Cluster version is 4.5.0-0.nightly-2020-08-01-204100

After 10K project creation:
===================================
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | wc -l
10099
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | grep knarra | wc -l
10042
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
2011

After 10K project deletion:
=================================
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
19
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | wc -l
58

Comment 8 errata-xmlrpc 2020-08-10 13:50:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.5 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3188