Bug 1906332
| Summary: | update discovery burst to reflect lots of CRDs on openshift clusters | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Maciej Szulik <maszulik> | |
| Component: | oc | Assignee: | Maciej Szulik <maszulik> | |
| Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.5 | CC: | aos-bugs, apurty, augol, bhershbe, bjarolim, dansmall, david.gabrysch, ddelcian, dmoessne, fbaudin, jkaur, jlyle, jreimann, jwang, llopezmo, mchebbi, mfojtik, mifiedle, mleonard, moddi, msweiker, nnosenzo, oarribas, openshift-bugs-escalate, rautenberg, rdiazgav, rkshirsa, sbhavsar, sgordon, skordas, sponnaga, sreber, sttts, vhernand, wsun, yprokule | |
| Target Milestone: | --- | Keywords: | Reopened | |
| Target Release: | 4.6.z | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause:
Low limit for client throttling.
Consequence:
Due to increasing number of CRDs installed in the cluster the requests reaching for API discovery were limited by the client code.
Fix:
Increase the limit number twice the current limit.
Result:
The client-side throttling should appear less frequently.
|
Story Points: | --- | |
| Clone Of: | 1899575 | |||
| : | 2042059 2049157 (view as bug list) | Environment: | ||
| Last Closed: | 2022-01-18 18:55:21 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1899575 | |||
| Bug Blocks: | 2049157 | |||
|
Comment 1
david.gabrysch
2020-12-22 12:31:34 UTC
We are hitting this issue on 4.6.8 with ~ 160 CRDs Sorry, I had browser issues when uploading my first comment. We did a oc get crd -A and got 160 CustomResourceDefinitions but not that much CustomResources. Here is the output of the python script: python2 list_all.py -c -s all -o count hostsubnets.network.openshift.io: 6 operators.operators.coreos.com: 6 kibanas.logging.openshift.io: 0 securitycontextconstraints.security.openshift.io: 9 servicemeshcontrolplanes.maistra.io: 0 clusterserviceversions.operators.coreos.com: 2 elasticsearches.logging.openshift.io: 0 consoleclidownloads.console.openshift.io: 3 clusterversions.config.openshift.io: 1 dnsrecords.ingress.operator.openshift.io: 0 tridentversions.trident.netapp.io: 0 machinehealthchecks.machine.openshift.io: 0 clusterautoscalers.autoscaling.openshift.io: 0 configs.imageregistry.operator.openshift.io: 1 knativeeventings.operator.knative.dev: 0 profiles.tuned.openshift.io: 0 etcds.operator.openshift.io: 1 tridenttransactions.trident.netapp.io: 0 servicecas.operator.openshift.io: 1 kubeletconfigs.machineconfiguration.openshift.io: 0 storageversionmigrations.migration.k8s.io: 0 storages.operator.openshift.io: 1 catalogsources.operators.coreos.com: 0 consoleyamlsamples.console.openshift.io: 0 clusteroperators.config.openshift.io: 30 configs.samples.operator.openshift.io: 1 servicemeshmemberrolls.maistra.io: 0 volumesnapshots.snapshot.storage.k8s.io: 0 credentialsrequests.cloudcredential.openshift.io: 0 ingresses.networking.internal.knative.dev: 0 tridentvolumes.trident.netapp.io: 0 imagepruners.imageregistry.operator.openshift.io: 1 services.serving.knative.dev: 0 operatorgroups.operators.coreos.com: 0 dnses.config.openshift.io: 1 consoles.operator.openshift.io: 1 authentications.operator.openshift.io: 1 provisionings.metal3.io: 0 configs.operator.openshift.io: 1 apiservers.config.openshift.io: 1 openshiftapiservers.operator.openshift.io: 1 projects.config.openshift.io: 1 revisions.serving.knative.dev: 0 networks.config.openshift.io: 1 cloudcredentials.operator.openshift.io: 1 clusterlogforwarders.logging.openshift.io: 0 egressnetworkpolicies.network.openshift.io: 0 featuregates.config.openshift.io: 1 imagecontentsourcepolicies.operator.openshift.io: 0 alertmanagers.monitoring.coreos.com: 0 consoleexternalloglinks.console.openshift.io: 1 authentications.config.openshift.io: 1 ippools.whereabouts.cni.cncf.io: 0 machineconfigs.machineconfiguration.openshift.io: 20 rangeallocations.security.internal.openshift.io: 1 volumesnapshotclasses.snapshot.storage.k8s.io: 1 overlappingrangeipreservations.whereabouts.cni.cncf.io: 0 serverlessservices.networking.internal.knative.dev: 0 tuneds.tuned.openshift.io: 0 servicemonitors.monitoring.coreos.com: 0 machines.machine.openshift.io: 0 volumesnapshotcontents.snapshot.storage.k8s.io: 0 metrics.autoscaling.internal.knative.dev: 0 knativeservings.operator.knative.dev: 0 tridentsnapshots.trident.netapp.io: 0 operatorhubs.config.openshift.io: 1 kubeschedulers.operator.openshift.io: 1 thanosrulers.monitoring.coreos.com: 0 consolelinks.console.openshift.io: 3 netnamespaces.network.openshift.io: 68 subscriptions.operators.coreos.com: 0 csisnapshotcontrollers.operator.openshift.io: 1 machineautoscalers.autoscaling.openshift.io: 0 openshiftcontrollermanagers.operator.openshift.io: 1 images.caching.internal.knative.dev: 0 clusterresourcequotas.quota.openshift.io: 0 installplans.operators.coreos.com: 0 images.config.openshift.io: 1 controllerconfigs.machineconfiguration.openshift.io: 1 clusterloggings.logging.openshift.io: 0 configurations.serving.knative.dev: 0 servicemeshmembers.maistra.io: 0 builds.config.openshift.io: 1 tridentnodes.trident.netapp.io: 0 proxies.config.openshift.io: 1 consoles.config.openshift.io: 1 helmchartrepositories.helm.openshift.io: 1 clustercsidrivers.operator.openshift.io: 0 probes.monitoring.coreos.com: 0 routes.serving.knative.dev: 0 tridentbackends.trident.netapp.io: 0 schedulers.config.openshift.io: 1 prometheuses.monitoring.coreos.com: 0 ingresses.config.openshift.io: 1 kubecontrollermanagers.operator.openshift.io: 1 tridentprovisioners.trident.netapp.io: 0 clusternetworks.network.openshift.io: 1 kubeapiservers.operator.openshift.io: 1 operatorpkis.network.operator.openshift.io: 0 dnses.operator.openshift.io: 1 oauths.config.openshift.io: 1 machineconfigpools.machineconfiguration.openshift.io: 2 tridentstorageclasses.trident.netapp.io: 0 storagestates.migration.k8s.io: 0 kubestorageversionmigrators.operator.openshift.io: 1 baremetalhosts.metal3.io: 0 infrastructures.config.openshift.io: 1 networks.operator.openshift.io: 1 prometheusrules.monitoring.coreos.com: 0 ingresscontrollers.operator.openshift.io: 0 rolebindingrestrictions.authorization.openshift.io: 0 certificates.networking.internal.knative.dev: 0 machinesets.machine.openshift.io: 0 containerruntimeconfigs.machineconfiguration.openshift.io: 0 podautoscalers.autoscaling.internal.knative.dev: 0 network-attachment-definitions.k8s.cni.cncf.io: 0 knativekafkas.operator.serverless.openshift.io: 0 podmonitors.monitoring.coreos.com: 0 consolenotifications.console.openshift.io: 0 We are always hitting the throttling and have oc hanging for ~10 seconds. We have a support ticket open which led me here :) *** Bug 1918675 has been marked as a duplicate of this bug. *** Hello, Could you please give any estimation on the date the fix will be available for 4.6.z? Thanks in advance. (In reply to mchebbi from comment #13) > Hello, > > Could you please give any estimation on the date the fix will be available > for 4.6.z? > Thanks in advance. I'm hoping to put together a PR today so it will most likely be one of the next .z streams. (In reply to Maciej Szulik from comment #14) > (In reply to mchebbi from comment #13) > > Hello, > > > > Could you please give any estimation on the date the fix will be available > > for 4.6.z? > > Thanks in advance. > > I'm hoping to put together a PR today so it will most likely be one of the > next .z streams. Thanks for your reply. Verified on 4.6.16 My OOTB cluster (no optional operators) has 93 CRD. When I add 175 additional custom CRD with 5 resources each (268 CRD) I do not see throttling messages. When I add more than 175 additional CRD I start to see throttling messages on get/delete especially. [xxx@xxx ~]$ oc get operators I0204 12:52:11.814344 2467516 request.go:645] Throttling request took 1.17338979s, request: GET:https://api.xxxx.xxxxx.xxxxx:6443/apis/autoscaling.openshift.io/v1?timeout=32s NAME AGE cluster-logging.openshift-logging 51d elasticsearch-operator.openshift-operators-redhat 51d grafana-operator.tkstede 8d jaeger-product.openshift-operators 51d kiali.openshift-operators 51d openshiftartifactoryha-operator.p224075-test23 36d percona-server-mongodb-operator-certified.intranet-mongodb 43d serverless-operator.openshift-serverless 51d servicemeshoperator.openshift-operators 51d [xxx@xxx ~]$ oc get crd | wc -l 168 This happens on one of our rather "fresh" clusters which is on 4.6.13 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.6.16 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0308 Reopening this bug. It appears customers are still hitting this bug in 4.7.24:
$ oc adm must-gather
[must-gather ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:040f48c020420ff93b
227216469f6c2971cf10fac2b0b52ea9853e88ec1964a6
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested informat
ion.
ClusterID: 670fc965-c512-44dd-b005-d8e418008e33
ClusterVersion: Stable at "4.7.24"
ClusterOperators:
All healthy and stable
[must-gather ] OUT namespace/openshift-must-gather-mwvtr created
[must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-wwxrm created
[must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:040f48c020420ff93b227216469f6
c2971cf10fac2b0b52ea9853e88ec1964a6 created
[must-gather-h6fd7] POD 2021-11-29T18:52:10.344745676Z I1129 18:52:10.344469 52 request.go:655] Throttling request took 1.18
9284656s, request: GET:https://198.223.0.1:443/apis/hostpathprovisioner.kubevirt.io/v1alpha1?timeout=32s
[must-gather-h6fd7] POD 2021-11-29T18:52:14.564983722Z Gathering data for ns/openshift-cluster-version...
[must-gather-h6fd7] POD 2021-11-29T18:52:20.389993859Z I1129 18:52:20.389874 52 request.go:655] Throttling request took 4.79
4315983s, request: GET:https://198.223.0.1:443/apis/machine.openshift.io/v1beta1?timeout=32s
[must-gather-h6fd7] POD 2021-11-29T18:52:21.236575249Z Gathering data for ns/default...
[must-gather-h6fd7] POD 2021-11-29T18:52:26.904035255Z Gathering data for ns/openshift...
[must-gather-h6fd7] POD 2021-11-29T18:52:30.535553160Z I1129 18:52:30.535507 52 request.go:655] Throttling request took 3.59
5756696s, request: GET:https://198.223.0.1:443/apis/cdi.kubevirt.io/v1beta1?timeout=32s
[must-gather-h6fd7] POD 2021-11-29T18:52:32.603482444Z Gathering data for ns/kube-system...
[must-gather-h6fd7] POD 2021-11-29T18:52:51.331727171Z I1129 18:52:51.331690 52 request.go:655] Throttling request took 1.19
5777303s, request: GET:https://198.223.0.1:443/apis/autoscaling.openshift.io/v1?timeout=32s
[must-gather-h6fd7] POD 2021-11-29T18:52:56.133105402Z Gathering data for ns/openshift-etcd...
[must-gather-h6fd7] POD 2021-11-29T18:53:03.028319229Z I1129 18:53:03.028274 52 request.go:655] Throttling request took 1.19
3998933s, request: GET:https://198.223.0.1:443/apis/apiextensions.k8s.io/v1beta1?timeout=32s
[must-gather-h6fd7] POD 2021-11-29T18:53:07.715532155Z Gathering data for ns/openshift-kni-infra...
[must-gather-h6fd7] POD 2021-11-29T18:54:55.419257191Z I1129 18:54:55.419206 52 request.go:655] Throttling request took 1.19
5567744s, request: GET:https://198.223.0.1:443/apis/ingress.operator.openshift.io/v1?timeout=32s
This is one of their smaller clusters with 172 crds:
$ omg get crds | wc -l
172
@maszulik
(In reply to Dan Small from comment #28) > This is one of their smaller clusters with 172 crds: > $ omg get crds | wc -l > 172 Discovery is built-in resources and CRDs. You can easily verify the total number by running: $ oc api-resources |wc -l 237 $ oc get crds|wc -l 135 Those are data from my 4.9 cluster, as you see with barely 135 CRDs, I'm at 250 limit. We are bumping that limit even higher in 4.10, but I'm not envisioning backporting that. This should be tracked by a new bug, not reopening a bug already included in a shipped errata. The current code change is tested and shipped. Moving this back to closed->errata. I will clone this bug to track any additional code changes. cc: @dansmall @sgordon The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |