Bug 1906332 - update discovery burst to reflect lots of CRDs on openshift clusters [NEEDINFO]
Summary: update discovery burst to reflect lots of CRDs on openshift clusters
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.6.z
Assignee: Maciej Szulik
QA Contact: Mike Fiedler
URL:
Whiteboard:
: 1918675 (view as bug list)
Depends On: 1899575
Blocks: 2049157
TreeView+ depends on / blocked
 
Reported: 2020-12-10 09:49 UTC by Maciej Szulik
Modified: 2022-02-01 16:37 UTC (History)
36 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Low limit for client throttling. Consequence: Due to increasing number of CRDs installed in the cluster the requests reaching for API discovery were limited by the client code. Fix: Increase the limit number twice the current limit. Result: The client-side throttling should appear less frequently.
Clone Of: 1899575
: 2042059 2049157 (view as bug list)
Environment:
Last Closed: 2022-01-18 18:55:21 UTC
Target Upstream Version:
mifiedle: needinfo? (sgordon)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 716 0 None closed [release-4.6] Bug 1906332: bump discovery burst to 250 2021-02-17 13:12:13 UTC
Red Hat Knowledge Base (Solution) 5587221 0 None None None 2021-03-09 08:41:39 UTC
Red Hat Product Errata RHSA-2021:0308 0 None None None 2021-02-08 13:51:10 UTC

Comment 1 david.gabrysch 2020-12-22 12:31:34 UTC
We are hitting this issue on 4.6.8 with ~ 160 CRDs

Comment 2 david.gabrysch 2020-12-22 12:31:48 UTC
We are hitting this issue on 4.6.8 with ~ 160 CRDs

Comment 3 david.gabrysch 2020-12-22 12:55:17 UTC
Sorry, I had browser issues when uploading my first comment. 
We did a oc get crd -A and got 160 CustomResourceDefinitions but not that much CustomResources. Here is the output of the python script:
python2 list_all.py -c -s all -o count
hostsubnets.network.openshift.io: 6
operators.operators.coreos.com: 6
kibanas.logging.openshift.io: 0
securitycontextconstraints.security.openshift.io: 9
servicemeshcontrolplanes.maistra.io: 0
clusterserviceversions.operators.coreos.com: 2
elasticsearches.logging.openshift.io: 0
consoleclidownloads.console.openshift.io: 3
clusterversions.config.openshift.io: 1
dnsrecords.ingress.operator.openshift.io: 0
tridentversions.trident.netapp.io: 0
machinehealthchecks.machine.openshift.io: 0
clusterautoscalers.autoscaling.openshift.io: 0
configs.imageregistry.operator.openshift.io: 1
knativeeventings.operator.knative.dev: 0
profiles.tuned.openshift.io: 0
etcds.operator.openshift.io: 1
tridenttransactions.trident.netapp.io: 0
servicecas.operator.openshift.io: 1
kubeletconfigs.machineconfiguration.openshift.io: 0
storageversionmigrations.migration.k8s.io: 0
storages.operator.openshift.io: 1
catalogsources.operators.coreos.com: 0
consoleyamlsamples.console.openshift.io: 0
clusteroperators.config.openshift.io: 30
configs.samples.operator.openshift.io: 1
servicemeshmemberrolls.maistra.io: 0
volumesnapshots.snapshot.storage.k8s.io: 0
credentialsrequests.cloudcredential.openshift.io: 0
ingresses.networking.internal.knative.dev: 0
tridentvolumes.trident.netapp.io: 0
imagepruners.imageregistry.operator.openshift.io: 1
services.serving.knative.dev: 0
operatorgroups.operators.coreos.com: 0
dnses.config.openshift.io: 1
consoles.operator.openshift.io: 1
authentications.operator.openshift.io: 1
provisionings.metal3.io: 0
configs.operator.openshift.io: 1
apiservers.config.openshift.io: 1
openshiftapiservers.operator.openshift.io: 1
projects.config.openshift.io: 1
revisions.serving.knative.dev: 0
networks.config.openshift.io: 1
cloudcredentials.operator.openshift.io: 1
clusterlogforwarders.logging.openshift.io: 0
egressnetworkpolicies.network.openshift.io: 0
featuregates.config.openshift.io: 1
imagecontentsourcepolicies.operator.openshift.io: 0
alertmanagers.monitoring.coreos.com: 0
consoleexternalloglinks.console.openshift.io: 1
authentications.config.openshift.io: 1
ippools.whereabouts.cni.cncf.io: 0
machineconfigs.machineconfiguration.openshift.io: 20
rangeallocations.security.internal.openshift.io: 1
volumesnapshotclasses.snapshot.storage.k8s.io: 1
overlappingrangeipreservations.whereabouts.cni.cncf.io: 0
serverlessservices.networking.internal.knative.dev: 0
tuneds.tuned.openshift.io: 0
servicemonitors.monitoring.coreos.com: 0
machines.machine.openshift.io: 0
volumesnapshotcontents.snapshot.storage.k8s.io: 0
metrics.autoscaling.internal.knative.dev: 0
knativeservings.operator.knative.dev: 0
tridentsnapshots.trident.netapp.io: 0
operatorhubs.config.openshift.io: 1
kubeschedulers.operator.openshift.io: 1
thanosrulers.monitoring.coreos.com: 0
consolelinks.console.openshift.io: 3
netnamespaces.network.openshift.io: 68
subscriptions.operators.coreos.com: 0
csisnapshotcontrollers.operator.openshift.io: 1
machineautoscalers.autoscaling.openshift.io: 0
openshiftcontrollermanagers.operator.openshift.io: 1
images.caching.internal.knative.dev: 0
clusterresourcequotas.quota.openshift.io: 0
installplans.operators.coreos.com: 0
images.config.openshift.io: 1
controllerconfigs.machineconfiguration.openshift.io: 1
clusterloggings.logging.openshift.io: 0
configurations.serving.knative.dev: 0
servicemeshmembers.maistra.io: 0
builds.config.openshift.io: 1
tridentnodes.trident.netapp.io: 0
proxies.config.openshift.io: 1
consoles.config.openshift.io: 1
helmchartrepositories.helm.openshift.io: 1
clustercsidrivers.operator.openshift.io: 0
probes.monitoring.coreos.com: 0
routes.serving.knative.dev: 0
tridentbackends.trident.netapp.io: 0
schedulers.config.openshift.io: 1
prometheuses.monitoring.coreos.com: 0
ingresses.config.openshift.io: 1
kubecontrollermanagers.operator.openshift.io: 1
tridentprovisioners.trident.netapp.io: 0
clusternetworks.network.openshift.io: 1
kubeapiservers.operator.openshift.io: 1
operatorpkis.network.operator.openshift.io: 0
dnses.operator.openshift.io: 1
oauths.config.openshift.io: 1
machineconfigpools.machineconfiguration.openshift.io: 2
tridentstorageclasses.trident.netapp.io: 0
storagestates.migration.k8s.io: 0
kubestorageversionmigrators.operator.openshift.io: 1
baremetalhosts.metal3.io: 0
infrastructures.config.openshift.io: 1
networks.operator.openshift.io: 1
prometheusrules.monitoring.coreos.com: 0
ingresscontrollers.operator.openshift.io: 0
rolebindingrestrictions.authorization.openshift.io: 0
certificates.networking.internal.knative.dev: 0
machinesets.machine.openshift.io: 0
containerruntimeconfigs.machineconfiguration.openshift.io: 0
podautoscalers.autoscaling.internal.knative.dev: 0
network-attachment-definitions.k8s.cni.cncf.io: 0
knativekafkas.operator.serverless.openshift.io: 0
podmonitors.monitoring.coreos.com: 0
consolenotifications.console.openshift.io: 0

We are always hitting the throttling and have oc hanging for ~10 seconds. We have a support ticket open which led me here :)

Comment 12 Maciej Szulik 2021-01-21 11:46:30 UTC
*** Bug 1918675 has been marked as a duplicate of this bug. ***

Comment 13 mchebbi@redhat.com 2021-01-22 11:40:32 UTC
Hello,

Could you please give any estimation on the date the fix will be available for 4.6.z?
Thanks in advance.

Comment 14 Maciej Szulik 2021-01-22 12:30:36 UTC
(In reply to mchebbi@redhat.com from comment #13)
> Hello,
> 
> Could you please give any estimation on the date the fix will be available
> for 4.6.z?
> Thanks in advance.

I'm hoping to put together a PR today so it will most likely be one of the next .z streams.

Comment 15 mchebbi@redhat.com 2021-01-25 18:52:49 UTC
(In reply to Maciej Szulik from comment #14)
> (In reply to mchebbi@redhat.com from comment #13)
> > Hello,
> > 
> > Could you please give any estimation on the date the fix will be available
> > for 4.6.z?
> > Thanks in advance.
> 
> I'm hoping to put together a PR today so it will most likely be one of the
> next .z streams.

Thanks for your reply.

Comment 18 Mike Fiedler 2021-02-03 16:49:05 UTC
Verified on 4.6.16 

My OOTB cluster (no optional operators) has 93 CRD.   When I add 175 additional custom CRD with 5 resources each (268 CRD) I do not see throttling messages.

When I add more than 175 additional CRD I start to see throttling messages on get/delete especially.

Comment 20 david.gabrysch 2021-02-04 11:54:12 UTC
[xxx@xxx ~]$ oc get operators
I0204 12:52:11.814344 2467516 request.go:645] Throttling request took 1.17338979s, request: GET:https://api.xxxx.xxxxx.xxxxx:6443/apis/autoscaling.openshift.io/v1?timeout=32s
NAME                                                         AGE
cluster-logging.openshift-logging                            51d
elasticsearch-operator.openshift-operators-redhat            51d
grafana-operator.tkstede                                     8d
jaeger-product.openshift-operators                           51d
kiali.openshift-operators                                    51d
openshiftartifactoryha-operator.p224075-test23               36d
percona-server-mongodb-operator-certified.intranet-mongodb   43d
serverless-operator.openshift-serverless                     51d
servicemeshoperator.openshift-operators                      51d
[xxx@xxx ~]$ oc get crd | wc -l
168

This happens on one of our rather "fresh" clusters which is on 4.6.13

Comment 27 errata-xmlrpc 2021-02-08 13:50:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.6.16 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0308

Comment 28 Dan Small 2021-11-30 15:18:14 UTC
Reopening this bug. It appears customers are still hitting this bug in 4.7.24:

$ oc adm must-gather
[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:040f48c020420ff93b
227216469f6c2971cf10fac2b0b52ea9853e88ec1964a6
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested informat
ion.
ClusterID: 670fc965-c512-44dd-b005-d8e418008e33
ClusterVersion: Stable at "4.7.24"
ClusterOperators:
        All healthy and stable


[must-gather      ] OUT namespace/openshift-must-gather-mwvtr created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-wwxrm created
[must-gather      ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:040f48c020420ff93b227216469f6
c2971cf10fac2b0b52ea9853e88ec1964a6 created
[must-gather-h6fd7] POD 2021-11-29T18:52:10.344745676Z I1129 18:52:10.344469      52 request.go:655] Throttling request took 1.18
9284656s, request: GET:https://198.223.0.1:443/apis/hostpathprovisioner.kubevirt.io/v1alpha1?timeout=32s
[must-gather-h6fd7] POD 2021-11-29T18:52:14.564983722Z Gathering data for ns/openshift-cluster-version...
[must-gather-h6fd7] POD 2021-11-29T18:52:20.389993859Z I1129 18:52:20.389874      52 request.go:655] Throttling request took 4.79
4315983s, request: GET:https://198.223.0.1:443/apis/machine.openshift.io/v1beta1?timeout=32s
[must-gather-h6fd7] POD 2021-11-29T18:52:21.236575249Z Gathering data for ns/default...
[must-gather-h6fd7] POD 2021-11-29T18:52:26.904035255Z Gathering data for ns/openshift...
[must-gather-h6fd7] POD 2021-11-29T18:52:30.535553160Z I1129 18:52:30.535507      52 request.go:655] Throttling request took 3.59
5756696s, request: GET:https://198.223.0.1:443/apis/cdi.kubevirt.io/v1beta1?timeout=32s
[must-gather-h6fd7] POD 2021-11-29T18:52:32.603482444Z Gathering data for ns/kube-system...
[must-gather-h6fd7] POD 2021-11-29T18:52:51.331727171Z I1129 18:52:51.331690      52 request.go:655] Throttling request took 1.19
5777303s, request: GET:https://198.223.0.1:443/apis/autoscaling.openshift.io/v1?timeout=32s
[must-gather-h6fd7] POD 2021-11-29T18:52:56.133105402Z Gathering data for ns/openshift-etcd...
[must-gather-h6fd7] POD 2021-11-29T18:53:03.028319229Z I1129 18:53:03.028274      52 request.go:655] Throttling request took 1.19
3998933s, request: GET:https://198.223.0.1:443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
[must-gather-h6fd7] POD 2021-11-29T18:53:07.715532155Z Gathering data for ns/openshift-kni-infra...
[must-gather-h6fd7] POD 2021-11-29T18:54:55.419257191Z I1129 18:54:55.419206      52 request.go:655] Throttling request took 1.19
5567744s, request: GET:https://198.223.0.1:443/apis/ingress.operator.openshift.io/v1?timeout=32s


This is one of their smaller clusters with 172 crds:
$ omg get crds | wc -l
172

@maszulik@redhat.com

Comment 29 Maciej Szulik 2021-11-30 16:05:36 UTC
(In reply to Dan Small from comment #28)
> This is one of their smaller clusters with 172 crds:
> $ omg get crds | wc -l
> 172

Discovery is built-in resources and CRDs. You can easily verify the total number by running:

$ oc api-resources |wc -l
237
$ oc get crds|wc -l
135

Those are data from my 4.9 cluster, as you see with barely 135 CRDs, I'm at 250 limit. We are bumping that limit even higher in 4.10, but I'm not envisioning backporting that.

Comment 39 Mike Fiedler 2022-01-18 18:55:21 UTC
This should be tracked by a new bug, not reopening a bug already included in a shipped errata.   The current code change is tested and shipped.  Moving this back to closed->errata.   I will clone this bug to track any additional code changes.  cc: @dansmall@redhat.com  @sgordon@redhat.com


Note You need to log in before you can comment on or make changes to this bug.