Bug 2042059

Summary: update discovery burst to reflect lots of CRDs on openshift clusters
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: ocAssignee: Maciej Szulik <maszulik>
oc sub component: oc QA Contact: zhou ying <yinzhou>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, apurty, augol, bhershbe, bjarolim, dansmall, david.gabrysch, ddelcian, dmoessne, eglottma, fbaudin, jkaur, jlyle, jreimann, jwang, ksathe, llopezmo, maszulik, mchebbi, mfojtik, mifiedle, mleonard, moddi, msweiker, nnosenzo, oarribas, openshift-bugs-escalate, psingour, rautenberg, rdiazgav, rkshirsa, sbhavsar, sgordon, skordas, skudupud, sponnaga, sreber, sttts, ttadala, vhernand, wsun, yprokule
Version: 4.6   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Low limit for client throttling. Consequence: Due to increasing number of CRDs installed in the cluster the requests reaching for API discovery were limited by the client code. Fix: Increase the limit number and QPS. Result: The client-side throttling should appear less frequently.
Story Points: ---
Clone Of: 1906332
: 2045008 (view as bug list) Environment:
Last Closed: 2022-09-02 15:01:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2045008    

Comment 1 Maciej Szulik 2022-01-19 14:12:23 UTC
Since I'll need to start a new chain of BZs, I'm re-targeting this at 4.10 and will create appropriate older versions as we roll this change back to 4.6.

Comment 7 Maciej Szulik 2022-02-02 09:24:18 UTC
*** Bug 2049157 has been marked as a duplicate of this bug. ***

Comment 10 Mike Fiedler 2022-02-16 13:32:02 UTC
With the current fix, I see the following behavior which I believe is correct but would like confirmation:

Before test:

# oc api-resources | wc -l
211
# oc get crd | wc -l
109

Create 300 custom CRDs and do an oc get on all of them -> no throttling
Create 325 custom CRDs and do an oc get on all of them -> throttled

One thing I see is that as the number of custom CRDs created, the response time for get slows down.  In each case below there are no instances returned for oc get <crd>.   Average is for 100 oc gets

Custom CRDs created   Avg get response
      20                  0.7s
     100                  0.9s
     200                  1.4s
     300                  2.0s

@maszulik Does this look like it is working as you would expect?

Comment 11 Maciej Szulik 2022-02-17 12:16:42 UTC
(In reply to Mike Fiedler from comment #10)

> @maszulik Does this look like it is working as you would expect?

Yes, the numbers look reasonable. What's important to note is that 4.10+ will also benefit from increased QPS, which unfortunately can't be backported to previous releases.

Comment 12 Mike Fiedler 2022-02-22 13:03:57 UTC
Marking verified on 4.10 rc2 based on comment 10 and comment 11

Comment 15 Maciej Szulik 2022-08-26 09:00:46 UTC
Spoke with Poornima Singour on slack, I'm waiting for additional information about their cluster and this particular problem.