1858798 – rangeallocations.data is never updated when a project is removed

Bug 1858798 - rangeallocations.data is never updated when a project is removed

Summary: rangeallocations.data is never updated when a project is removed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	4.2.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.5.z
Assignee:	Maciej Szulik
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:	1808588
Blocks:	1858800
TreeView+	depends on / blocked

Reported:	2020-07-20 12:38 UTC by Maciej Szulik
Modified:	2020-08-10 13:50 UTC (History)
CC List:	20 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: UID range allocation is never updated when a project is removed. Only restarting kube-controller-manager pod was triggering repair procedure which was clearing that range. Consequence: It is possible to exhaust the UID range on cluster with high namespace create+remove turnover. Fix: Periodically run the repair job. Result: The UID range allocation should be freed periodically (currently every 8 hours) which should not require additional kube-controller-manager restarts. It should also ensure that the range is not exhausted.
Clone Of:	1808588
Clones:	1858800 (view as bug list)
Environment:
Last Closed:	2020-08-10 13:50:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-policy-controller pull 32	0	None	closed	[release-4.5] Bug 1858798: add UID deallocation logic	2020-10-19 19:53:07 UTC
Red Hat Product Errata	RHBA-2020:3188	0	None	None	None	2020-08-10 13:50:39 UTC

Comment 3 RamaKasturi 2020-08-02 16:56:12 UTC

Tried verifying the bug on 4.5.0-0.nightly-2020-07-29-051236 and i see that i could create more than 10k projects and when the projects are created rangeallocations.data gets updated also when deletes happen rangeallocations.data gets deleted, but i see that value for (oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l) before project creation and value after project deletion does not match i.e after 10K project deletion i see the value to be zero for the above query. 

@Tomas could you please confirm if the above is expected ? Thanks !! Raising needinfo on you since maciej is on leave.

Comment 4 Tomáš Nožička 2020-08-03 11:59:29 UTC

we've synced offline, there is 8 hours reclaim period

Comment 5 RamaKasturi 2020-08-03 16:35:29 UTC

Verified the bug with the payload below 

[ramakasturinarra@dhcp35-60 cucushift]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-07-29-051236   True        False         35h     Cluster version is 4.5.0-0.nightly-2020-07-29-051236

Below are the steps which i have followed to verify the bug:
===========================================================
1) Install 4.5 cluster with the above payload
2) check rangeallocations.data by running the command below

[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
16

3) Created 10K projects
4) Ran the command below to check the rangeallocations.data

[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l 
1707

4) Deleted all the 10K projects and i see that the rangeallocations.data goes down to original value which is 16 after 4 hours of project deletions.
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
16

P.s: Looks like reclaim period is 8 hours for rangeallocations.data to go come back to its original value once all the created projects are deleted. For more info please refer comment 4, also in QE test it took around 4-5 hours for the same.

Based on the above moving the bug to verified state, thanks !!

Comment 6 RamaKasturi 2020-08-04 15:37:34 UTC

Tried to test the bug again and this time i do not see any delay i.e need not have to wait for 8 hours for rangeallocations.data to be updated once all projects are deleted.

Before 10K project creation:
=================================
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | wc -l
58
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
16
[ramakasturinarra@dhcp35-60 cucushift]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-08-01-204100   True        False         12m     Cluster version is 4.5.0-0.nightly-2020-08-01-204100

After 10K project creation:
===================================
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | wc -l
10099
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | grep knarra | wc -l
10042
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
2011

After 10K project deletion:
=================================
[ramakasturinarra@dhcp35-60 cucushift]$ oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l
19
[ramakasturinarra@dhcp35-60 cucushift]$ oc get projects | wc -l
58

Comment 8 errata-xmlrpc 2020-08-10 13:50:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.5 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3188

Note You need to log in before you can comment on or make changes to this bug.

aaleman
aos-bugs
arghosh
bleanhar
bmilne
bshirren
calfonso
chuffman
fhirtz
knarra
maszulik
mfojtik
oarribas
pmuller
rhowe
tnozicka
travi
vlaad
wking
yinzhou