Description of problem: In large OpenShift 4 environments we are seeing messages "Throttling request took 1.183079627s, request: GET: ..." when running simple "oc get -A pod". This is related to the amount of objects and CRD's in the OpenShift 4 - Cluster. As increasing `config.Burst` helps we would like to have this applied in all `oc` version to prevent the issue from happening. Version-Release number of selected component (if applicable): - 4.5 and 4.6 How reproducible: - Always (depending on the numbers of CRD's in the OpenShift - Cluster Steps to Reproduce: 1. Add many CRD's to OpenShift 4 and add workloads 2. Run `oc get -A pod` Actual results: The `oc` command will run but report `Throttling request took 1.183079627s, request: GET: ...` Expected results: The `oc` command should run without client side throttling. Additional info:
We are facing the same problem. Here is what happens when we delete a pv (via oc and tridentctl): I1120 15:52:30.591442 1304 request.go:621] Throttling request took 1.183797893s, request: GET:https://172.30.0.1:443/apis/machine.openshift.io/v1beta1?timeout=32s I1120 15:52:40.790566 1304 request.go:621] Throttling request took 8.796730826s, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v1alpha1?timeout=32s I1120 15:52:50.990505 1304 request.go:621] Throttling request took 18.996372382s, request: GET:https://172.30.0.1:443/apis/hco.kubevirt.io/v1alpha1?timeout=32s I1120 15:52:59.468478 1333 request.go:621] Throttling request took 1.162821518s, request: GET:https://172.30.0.1:443/apis/apiextensions.k8s.io/v1beta1?timeout=32s I1120 15:53:09.667564 1333 request.go:621] Throttling request took 6.598000689s, request: GET:https://172.30.0.1:443/apis/scheduling.k8s.io/v1beta1?timeout=32s I1120 15:53:19.866742 1333 request.go:621] Throttling request took 16.796448976s, request: GET:https://172.30.0.1:443/apis/triggers.tekton.dev/v1alpha1?timeout=32s persistentvolume "pvc-07cfc225-1a7d-41f2-996e-c9ecd37fac4f" deleted and almost every other oc command is throttled, too. Here is a list with of the CRDs: $ oc get crd I1120 15:55:58.352182 25723 request.go:621] Throttling request took 1.050227415s, request: GET:https://api.dt.ocp.tc.corp:6443/apis/caching.internal.knative.dev/v1alpha1?timeout=32s NAME CREATED AT alertmanagers.monitoring.coreos.com 2020-07-09T19:16:30Z apiservers.config.openshift.io 2020-07-09T18:58:18Z authentications.config.openshift.io 2020-07-09T18:58:19Z authentications.operator.openshift.io 2020-07-09T18:58:34Z baremetalhosts.metal3.io 2020-07-09T18:59:05Z builds.config.openshift.io 2020-07-09T18:58:19Z catalogsources.operators.coreos.com 2020-07-09T18:58:42Z cdis.cdi.kubevirt.io 2020-10-26T15:09:11Z certificatedeployments.tollcollect.de 2020-10-15T12:29:05Z certmanagers.operator.cert-manager.io 2020-08-05T13:34:20Z checlusters.org.eclipse.che 2020-10-23T07:48:10Z cloudcredentials.operator.openshift.io 2020-09-16T12:15:54Z clusterautoscalers.autoscaling.openshift.io 2020-07-09T18:58:33Z clusternetworks.network.openshift.io 2020-07-09T19:01:53Z clusteroperators.config.openshift.io 2020-07-09T18:58:17Z clusterresourceoverrides.operator.autoscaling.openshift.io 2020-11-18T14:01:23Z clusterresourcequotas.quota.openshift.io 2020-07-09T18:58:18Z clusterserviceversions.operators.coreos.com 2020-07-09T18:58:39Z clustertasks.tekton.dev 2020-08-06T09:04:22Z clustertriggerbindings.triggers.tekton.dev 2020-08-06T09:05:09Z clusterversions.config.openshift.io 2020-07-09T18:58:17Z conditions.tekton.dev 2020-08-06T09:04:22Z config.operator.tekton.dev 2020-08-06T09:03:52Z configs.imageregistry.operator.openshift.io 2020-07-09T18:58:30Z configs.operator.openshift.io 2020-07-28T11:04:19Z configs.samples.operator.openshift.io 2020-07-09T18:58:31Z consoleclidownloads.console.openshift.io 2020-07-09T18:58:31Z consoleexternalloglinks.console.openshift.io 2020-07-09T18:58:31Z consolelinks.console.openshift.io 2020-07-09T18:58:31Z consolenotifications.console.openshift.io 2020-07-09T18:58:30Z consoles.config.openshift.io 2020-07-09T18:58:19Z consoles.operator.openshift.io 2020-07-09T18:58:30Z consoleyamlsamples.console.openshift.io 2020-07-09T18:58:31Z containerruntimeconfigs.machineconfiguration.openshift.io 2020-07-09T19:03:38Z controllerconfigs.machineconfiguration.openshift.io 2020-07-09T19:03:35Z credentialsrequests.cloudcredential.openshift.io 2020-07-09T18:58:22Z csisnapshotcontrollers.operator.openshift.io 2020-07-09T18:58:33Z dmdeployments.dm.<redacted>.de 2020-11-09T09:26:37Z dnses.config.openshift.io 2020-07-09T18:58:19Z dnses.operator.openshift.io 2020-07-09T18:58:32Z dnsrecords.ingress.operator.openshift.io 2020-07-09T18:58:33Z egressnetworkpolicies.network.openshift.io 2020-07-09T19:01:53Z <redacted>-repositories.cloud.<redacted>.com 2020-09-16T11:55:40Z <redacted>-tenants.cloud.<redacted>.com 2020-09-16T11:57:30Z <redacted>-versions.cloud.<redacted>.com 2020-09-16T11:57:56Z etcds.operator.openshift.io 2020-07-09T18:58:33Z eventlisteners.triggers.tekton.dev 2020-08-06T09:05:09Z featuregates.config.openshift.io 2020-07-09T18:58:19Z gitlabs.gitlab.com 2020-07-16T13:13:51Z grafanadashboards.integreatly.org 2020-08-04T09:36:48Z grafanadatasources.integreatly.org 2020-08-04T09:36:48Z grafanas.integreatly.org 2020-08-04T09:36:48Z helloworlds.mv.<redacted>.com 2020-11-04T12:21:40Z hostpathprovisioners.hostpathprovisioner.kubevirt.io 2020-10-26T15:09:10Z hostsubnets.network.openshift.io 2020-07-09T19:01:53Z hyperconvergeds.hco.kubevirt.io 2020-10-26T15:09:10Z imagecontentsourcepolicies.operator.openshift.io 2020-07-09T18:58:20Z imagemanifestvulns.secscan.quay.redhat.com 2020-08-24T08:33:05Z imagepruners.imageregistry.operator.openshift.io 2020-07-09T18:58:34Z images.caching.internal.knative.dev 2020-08-06T09:04:22Z images.config.openshift.io 2020-07-09T18:58:20Z infrastructures.config.openshift.io 2020-07-09T18:58:20Z ingresscontrollers.operator.openshift.io 2020-07-09T18:58:22Z ingresses.config.openshift.io 2020-07-09T18:58:20Z installplans.operators.coreos.com 2020-07-09T18:58:40Z ippools.whereabouts.cni.cncf.io 2020-07-09T19:01:47Z kafkabridges.kafka.strimzi.io 2020-09-14T11:57:40Z kafkaconnectors.kafka.strimzi.io 2020-09-14T11:57:40Z kafkaconnects.kafka.strimzi.io 2020-09-14T11:57:40Z kafkaconnects2is.kafka.strimzi.io 2020-09-14T11:57:41Z kafkamirrormaker2s.kafka.strimzi.io 2020-09-14T11:57:41Z kafkamirrormakers.kafka.strimzi.io 2020-09-14T11:57:41Z kafkarebalances.kafka.strimzi.io 2020-09-14T11:57:41Z kafkas.kafka.strimzi.io 2020-09-14T11:57:41Z kafkatopics.kafka.strimzi.io 2020-09-14T11:57:41Z kafkausers.kafka.strimzi.io 2020-09-14T11:57:41Z kubeapiservers.operator.openshift.io 2020-07-09T18:58:40Z kubecontrollermanagers.operator.openshift.io 2020-07-09T18:58:31Z kubedeschedulers.operator.openshift.io 2020-09-08T14:14:58Z kubeletconfigs.machineconfiguration.openshift.io 2020-07-09T19:03:37Z kubeschedulers.operator.openshift.io 2020-07-09T18:58:33Z kubestorageversionmigrators.operator.openshift.io 2020-07-09T18:58:31Z kubevirtcommontemplatesbundles.ssp.kubevirt.io 2020-10-26T15:09:10Z kubevirtmetricsaggregations.ssp.kubevirt.io 2020-10-26T15:09:10Z kubevirtnodelabellerbundles.ssp.kubevirt.io 2020-10-26T15:09:11Z kubevirts.kubevirt.io 2020-10-26T15:09:11Z kubevirttemplatevalidators.ssp.kubevirt.io 2020-10-26T15:09:11Z machineautoscalers.autoscaling.openshift.io 2020-07-09T18:58:35Z machineconfigpools.machineconfiguration.openshift.io 2020-07-09T19:03:36Z machineconfigs.machineconfiguration.openshift.io 2020-07-09T19:03:34Z machinehealthchecks.machine.openshift.io 2020-07-09T18:59:04Z machines.machine.openshift.io 2020-07-09T18:59:04Z machinesets.machine.openshift.io 2020-07-09T18:59:04Z memcachedajs.cache.aj.t-<redacted>.com 2020-11-04T12:47:27Z memcacheds.cache.example.com 2020-11-04T08:12:29Z mvoperators.mv.<redacted>.de 2020-11-04T13:18:22Z mvservices.mv.<redacted>.de 2020-09-23T06:21:10Z netnamespaces.network.openshift.io 2020-07-09T19:01:53Z network-attachment-definitions.k8s.cni.cncf.io 2020-07-09T19:01:47Z networkaddonsconfigs.networkaddonsoperator.network.kubevirt.io 2020-10-26T15:09:10Z networks.config.openshift.io 2020-07-09T18:58:20Z networks.operator.openshift.io 2020-07-09T18:58:22Z nodemaintenances.nodemaintenance.kubevirt.io 2020-10-26T15:09:11Z nodenetworkconfigurationenactments.nmstate.io 2020-10-27T14:53:13Z nodenetworkconfigurationpolicies.nmstate.io 2020-10-27T14:53:13Z nodenetworkstates.nmstate.io 2020-10-27T14:53:13Z oauths.config.openshift.io 2020-07-09T18:58:21Z openshiftapiservers.operator.openshift.io 2020-07-09T18:58:33Z openshiftartifactoryhas.charts.helm.k8s.io 2020-07-16T15:40:29Z openshiftcontrollermanagers.operator.openshift.io 2020-07-09T18:58:34Z openshiftxrays.charts.helm.k8s.io 2020-07-30T06:01:35Z operatorconfigurations.acid.zalan.do 2020-07-10T12:18:34Z operatorgroups.operators.coreos.com 2020-07-09T18:58:49Z operatorhubs.config.openshift.io 2020-07-09T18:58:18Z operatorpkis.network.operator.openshift.io 2020-07-09T18:58:46Z operatorsources.operators.coreos.com 2020-07-09T18:58:36Z ovirtproviders.v2v.kubevirt.io 2020-10-26T15:09:10Z pgclusters.crunchydata.com 2020-11-09T07:25:21Z pgpolicies.crunchydata.com 2020-11-09T07:25:22Z pgreplicas.crunchydata.com 2020-11-09T07:25:22Z pgtasks.crunchydata.com 2020-11-09T07:25:22Z pipelineresources.tekton.dev 2020-08-06T09:04:22Z pipelineruns.tekton.dev 2020-08-06T09:04:22Z pipelines.tekton.dev 2020-08-06T09:04:22Z podmonitors.monitoring.coreos.com 2020-07-09T19:16:30Z postgresqls.acid.zalan.do 2020-07-10T12:18:34Z profiles.tuned.openshift.io 2020-07-09T18:58:35Z projects.config.openshift.io 2020-07-09T18:58:21Z prometheuses.monitoring.coreos.com 2020-07-09T19:16:30Z prometheusrules.monitoring.coreos.com 2020-07-09T19:16:30Z provisionings.<redacted>.io 2020-07-09T18:59:04Z proxies.config.openshift.io 2020-07-09T18:58:18Z rolebindingrestrictions.authorization.openshift.io 2020-07-09T18:58:17Z schedulers.config.openshift.io 2020-07-09T18:58:21Z securitycontextconstraints.security.openshift.io 2020-07-09T18:58:18Z servicecas.operator.openshift.io 2020-07-09T18:58:35Z servicecatalogapiservers.operator.openshift.io 2020-07-09T18:58:33Z servicecatalogcontrollermanagers.operator.openshift.io 2020-07-09T18:58:33Z servicemonitors.monitoring.coreos.com 2020-07-09T19:16:30Z shes.dt-av-sh.<redacted>.de 2020-10-08T10:55:26Z shes.et-av-sh.<redacted>.de 2020-10-16T07:13:15Z storagestates.migration.k8s.io 2020-07-09T18:58:36Z storageversionmigrations.migration.k8s.io 2020-07-09T18:58:34Z subscriptions.operators.coreos.com 2020-07-09T18:58:41Z taskruns.tekton.dev 2020-08-06T09:04:22Z tasks.tekton.dev 2020-08-06T09:04:22Z thanosrulers.monitoring.coreos.com 2020-07-28T11:15:20Z tridentbackends.trident.netapp.io 2020-07-09T21:00:00Z tridentnodes.trident.netapp.io 2020-07-09T21:00:00Z tridentsnapshots.trident.netapp.io 2020-07-09T21:00:01Z tridentstorageclasses.trident.netapp.io 2020-07-09T21:00:01Z tridenttransactions.trident.netapp.io 2020-07-09T21:00:01Z tridentversions.trident.netapp.io 2020-07-09T21:00:00Z tridentvolumes.trident.netapp.io 2020-07-09T21:00:00Z triggerbindings.triggers.tekton.dev 2020-08-06T09:05:09Z triggertemplates.triggers.tekton.dev 2020-08-06T09:05:09Z tuneds.tuned.openshift.io 2020-07-09T18:58:33Z v2vvmwares.v2v.kubevirt.io 2020-10-26T15:09:10Z vmimportconfigs.v2v.kubevirt.io 2020-10-26T15:09:11Z volumesnapshotclasses.snapshot.storage.k8s.io 2020-07-09T19:40:09Z volumesnapshotcontents.snapshot.storage.k8s.io 2020-07-09T19:40:08Z volumesnapshots.snapshot.storage.k8s.io 2020-07-09T19:40:07Z zdmvoperators.zd.zd-mv.<redacted>.de 2020-11-05T09:08:10Z Cluster Utilization seems okay: CPU: 90-120 / 248 RAM: 900-1200 GB/ 2550 GB Network ~100 MBps on idle / ~ 500 MBps peek Pods: 2,5 - 3,2k Setup is 3 Master / 2 Infra/ 12 Compute nodes in total. Any suggestion is welcome.
This was moved to https://github.com/kubernetes/kubernetes/pull/96763 and will be picked after next k8s bump.
This will be bumped in https://github.com/openshift/oc/pull/648
*** Bug 1894574 has been marked as a duplicate of this bug. ***
*** Bug 1902816 has been marked as a duplicate of this bug. ***
This merged in https://github.com/openshift/oc/pull/660
@sreber Can you estimate total # of CRDs in the cluster when you hit this? Looking for a reproducer to verify the fix. You can run this Python2 script to count them: https://github.com/openshift/svt/blob/master/openshift_tooling/list_all_resources/list_all.py e.g. python2 list_all.py -c -s all -o count | tee list.out It will take a while as it iterates all CRD types.
@soltysh Tested this with: Client Version: 4.7.0-0.nightly-2020-12-09-112139 Server Version: 4.7.0-0.nightly-2020-12-09-112139 It is better than 4.5 but with enough groups/CRDs it starts happening again. 4.5: create 100 CRDs in 100 groups, oc get on any of the CRDs gets throttling messages 4.7: create 100 CRDs in 100 groups, oc get succeeds, no throttling 4.7: create 200 CRDs in 200 groups, oc get on any of the CRDs gets throttling messages Do we want to call this good enough for 4.7? Raising the point where we hit this? If so, you can move this back to ON_QA
(In reply to Mike Fiedler from comment #11) > @soltysh Tested this with: > > Client Version: 4.7.0-0.nightly-2020-12-09-112139 > Server Version: 4.7.0-0.nightly-2020-12-09-112139 > > It is better than 4.5 but with enough groups/CRDs it starts happening again. > > 4.5: create 100 CRDs in 100 groups, oc get on any of the CRDs gets > throttling messages > 4.7: create 100 CRDs in 100 groups, oc get succeeds, no throttling > 4.7: create 200 CRDs in 200 groups, oc get on any of the CRDs gets > throttling messages > > Do we want to call this good enough for 4.7? Raising the point where we > hit this? If so, you can move this back to ON_QA The default cluster comes with ~100 CRDs, and getting throttling at that was a problem. If you're saying that we're hitting it at 200 I think that gives us about 100 CRDs room, which is sufficient at least for the time being. If we notice these numbers are not sufficient we can definitely bump them higher. So I'm moving this back to qa.
As this is not only annoying, but also limits throughput of requests (because throttling actually happens), what is the workaround for 4.6? I took my while to finally end up in this bugzilla issue? It would have really helped my is this was listed in the "Known Issues" section? -> https://docs.openshift.com/container-platform/4.6/release_notes/ocp-4-6-release-notes.html#ocp-4-6-known-issues
Marking verified based on comment 12 and comment 14. Verified on 4.7.0-0.nightly-2020-12-09-112139 Follow https://bugzilla.redhat.com/show_bug.cgi?id=1906332 for 4.6.z backport
*** Bug 1909280 has been marked as a duplicate of this bug. ***
maciel, my cu came back w a must-gather over the holidays: its available from the cu ticket [1] here: https://attachments.access.redhat.com/hydra/rest/cases/02774693/attachments/9e0abd3f-182b-46c5-b47b-ee5f61a44933?usePresignedUrl=true [1] https://url.corp.redhat.com/486a9c4
We're at 173 CRDs on a test cluster that has been upgraded from 4.1 through to 4.6, and uses OCS, and a few items from Operator Hub (virtualization, JBoss operator, service mesh, pipelines, metering, etc.). CLI operations are noticeably slower than when the cluster was still on 4.4, and they're slower than on our production cluster (which is still at 4.4). 200 seems like a pretty low threshold once you start enabling components that are deployed via Operator Hub.
Same here. Having a 4.1 cluster upgraded all the way to 4.6, and having things like Service Mesh, Serverless, SSO, Pipelines, etc installed via OperatorHub I am already at "221" CRDs. Assuming that people are expected to install more services/components via Operators you should probably target a much higher number.
After reconsidering this issue I'm going to bump this slightly a bit to 250.
The last bump to 250 merged in https://github.com/openshift/oc/pull/696
hello, IHAC (tkt#02774693) on 4.6 and started deleting CRDs in his cluster where this was being seen and noted that the threshold for throttling msgs apparently is 158 CRDs in his cluster: 159 CRDs, the throttling msgs started again in his output. is this something thats configurable by the user or can this threshold only be set at install or is hard-coded? thnx, m
Verified on 4.7.0-0.nightly-2021-01-18-144603 Created 200 CRD on top of OOTB CRDs and no throttling messages Created 250 CRD on top of OOTB CRDs and saw very rare throttling messages - much less seldom than when creating 200 CRDs with the limit set to 200.
*** Bug 1869847 has been marked as a duplicate of this bug. ***
we are running on 4.6.12 and have 253 CRDS and we are continually seeing this throttling issue. Can we increase the limit over 250 and will this be pulled back into 4.6.x release.
(In reply to Ann Hayes from comment #35) > we are running on 4.6.12 and have 253 CRDS and we are continually seeing > this throttling issue. > Can we increase the limit over 250 and will this be pulled back into 4.6.x > release. Yes, this was bumped to 250 and a cherry-picked into 4.6 (https://github.com/openshift/oc/pull/716) should be present in the upcoming .z stream release.
Why isn’t this user-configurable? Better yet, why is there an arbitrary threshold beyond which users get annoying warnings and delays? How does this enhance the user experience? Is there any benefit at all to this behavior, and where is it explained?
(In reply to Chet Hosey from comment #37) > Why isn’t this user-configurable? It is not expected to be user-configurable since this is one of the several layers of preventing the server from being exhausted with requests. Specifically retrieving complete discovery information requires reaching to many endpoints and thus is limited.
(In reply to Maciej Szulik from comment #36) > (In reply to Ann Hayes from comment #35) > > we are running on 4.6.12 and have 253 CRDS and we are continually seeing > > this throttling issue. > > Can we increase the limit over 250 and will this be pulled back into 4.6.x > > release. > > Yes, this was bumped to 250 and a cherry-picked into 4.6 > (https://github.com/openshift/oc/pull/716) > should be present in the upcoming .z stream release. Sorry, what is the .z stream -- will it be in the stable-4.6.x so we can apply on AMD64 Why 250 when we have problems at 253? shouldn't the number be much higher
> Sorry, what is the .z stream -- will it be in the stable-4.6.x so we can > apply on AMD64 yeah, the next stable release should contain it. > Why 250 when we have problems at 253? shouldn't the number be much higher That was a safe middle-ground number according to our tests. The actual numbers will differ from installation to installation but we didn't want to go to extremes with this setting.
(In reply to Maciej Szulik from comment #40) > > Sorry, what is the .z stream -- will it be in the stable-4.6.x so we can > > apply on AMD64 > > yeah, the next stable release should contain it. > > > Why 250 when we have problems at 253? shouldn't the number be much higher > > That was a safe middle-ground number according to our tests. The actual > numbers will differ from installation to installation but we didn't want to > go to extremes with this setting. Thanks for your reply, what is that next target release - 4.6.13? Can you please confirm
> Thanks for your reply, what is that next target release - 4.6.13? Can you > please confirm Yeah, I think that should be that.
As a user that sees `oc` invocations stalled seemingly at random I wonder if the cure is worse than the disease. Right now it still isn't clear how this behavior is supposed to be helpful. Is there documentation that would clarify?
(In reply to Maciej Szulik from comment #38) > It is not expected to be user-configurable since this is one of the several layers of > preventing the server from being exhausted with requests. Specifically retrieving > complete discovery information requires reaching to many endpoints and thus is limited. if this is not user-configurable it should be "auto-tuned"! depending of the actual numbers of CRDs I am running 4.7.0.fc-5 with less than 200 CRDs (by removing Service Mash - 31 CRDs!) and still get the error $ oc version Client Version: 4.6.8 Server Version: 4.7.0-fc.5 Kubernetes Version: v1.20.0+3b90e69 $ oc get crds|wc -l 187 $ oc get all -n demo I0203 12:28:28.294458 987490 request.go:645] Throttling request took 1.157750898s, request: GET:https://api.ocp4.openshift.freeddns.org:6443/apis/node.k8s.io/v1beta1?timeout=32s NAME READY STATUS RESTARTS AGE pod/nodejs-sample-67f458cd89-4wg9w 1/1 Running 0 18h ...
If I understood the problem correctly, it is a client side issue. I had my cluster updated to 4.6.16 and still had the issue. After updating the "oc" client to 4.6.16, the issue was gone. I guess this actually is an issue in the upstream Go client for Kubernetes. As I saw the same problem in the keycloak operator, which started to throttle as well. It might also be that "kubectl" has the same limitation.
Is it documented anywhere why “randomly slow down the client when too many Red-Hat-supported features are enabled” is a feature in the first place? It would really be nice to have an explanation around who benefits from this behavior. Why does this exist at all?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633
@maciej, will this fix be backported to 4.6?
Happen to come across here and I'd like to know whether the throttling will also impact the application deployment on OCP. Our env have 400+ CRDs and I can always see the throttling message when I run every single oc command! More worse is that one our application which includes tons of operators has failed to be deployed to this env for quite long time. Some pods were not successfully up and running. I'm suspecting this is also caused by the throttling, because I'm supposing this also issue would also happen in many Go operators' code where they may use Go client for Kubernete and could fail.
(In reply to morningspace from comment #60) > Happen to come across here and I'd like to know whether the throttling will > also impact the application deployment on OCP. Our env have 400+ CRDs and I > can always see the throttling message when I run every single oc command! > More worse is that one our application which includes tons of operators has > failed to be deployed to this env for quite long time. Some pods were not > successfully up and running. I'm suspecting this is also caused by the > throttling, because I'm supposing this also issue would also happen in many > Go operators' code where they may use Go client for Kubernete and could fail. This is indeed the case. I already saw a few solutions, building on top of the Go based Kubernetes client to throttle as well. ArgoCD, the Keycloak operator, but also tools like the Helm command line tool. Using tools not created on top of this library doesn't suffer from this behavior. So maybe using Rust or Java is an alternative :D I also noticed that the message that gets written out has improved a bit. It now specifically mentions "client side" throttling, which is an improvement I guess.
Alright, so, with that, I guess just to upgrade OCP to a newer version of 4.6.x, such as 4.6.37, which appears to be the latest 4.6 release at the moment, would not resolve the issue, because this issue comes from the client side, where I see the fix went into oc, can probably resolve the issue from command line. However, those Kubernetes go clients are being referenced inside each go operator code, the issue will still happen, even both OCP and oc are upgraded, unless we modify the go operator code.
Don't forget to update you oc binary. I was having the issue with oc 4.5.0 and a cluster on 4.7.22 with a few additional CRD (argo, jaeger, ipam, etc...). I've updated oc to 4.7.0-202107261701.p0.git.8b4b094.assembly.stream-8b4b094 and the problem disappeared.