1894574 – deployment issue and throttling request alerts

Bug 1894574 - deployment issue and throttling request alerts

Summary: deployment issue and throttling request alerts

Keywords:
Status:	CLOSED DUPLICATE of bug 1899575
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	oc
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Maciej Szulik
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-04 14:29 UTC by mchebbi@redhat.com
Modified:	2024-06-13 23:20 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-11-26 14:03:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description mchebbi@redhat.com 2020-11-04 14:29:05 UTC

must-gather : shorturl.at/azI49

the customer is facing intermittent issues where Application deployments to OpenShift are failing. Additional some of the get commands give output randomly indicating issue.
They are observing below messages on deployment failures. Some of the OpenShift commands also sometime throw similar messages.
Msg1:
[Action] Executing Apply Operation...
command: oc apply --server=https://api.awsnp1.ocpnp.travp.net:6443 --namespace=ics-pipeline-dev --token=**** --filename=pythonhello-redis-deployment.yaml
[Error] Apply operation failed.
I1026 14:26:50.575471   30421 request.go:621] Throttling request took 1.090510627s, request: GET:https://api.awsnp1.ocpnp.travp.net:6443/apis/rbac.istio.io/v1alpha1?timeout=32s

Msg2:
command: oc apply --server=https://api.awsnp1.ocpnp.travp.net:6443 --namespace=bi-scbp-trvpasdatamapper-dev '--token=**** ' --insecure-skip-tls-verify --filename=appdeploy.yaml
[Error] Apply operation failed.
I1026 12:55:59.410679 7068 request.go:621] Throttling request took 1.084130682s, request: GET:https://api.awsnp1.ocpnp.travp.net:6443/apis/machine.openshift.io/v1beta1?timeout=32s

Msg3:
[Action] Executing Apply Operation...
command: oc apply --server=https://api.awsnp1.ocpnp.travp.net:6443 --namespace=ics-pipeline-dev --token=**** --filename=pythonhello-redis-deployment.yaml
[Error] Apply operation failed.
I1026 13:39:36.705902   15598 request.go:621] Throttling request took 1.094693524s, request: GET:https://api.awsnp1.ocpnp.travp.net:6443/apis/cloudcredential.openshift.io/v1?timeout=32s

Msg4:
command: oc apply --server=https://api.awsnp1.ocpnp.travp.net:6443 --namespace=ics-pipeline-dev --token=**** --filename=pythonhello-redis-deployment.yaml
[Error] Apply operation failed.
I1026 14:15:49.845254   24133 request.go:621] Throttling request took 1.101490539s, request: GET:https://api.awsnp1.ocpnp.travp.net:6443/apis/console.openshift.io/v1?timeout=32s
=================================
They made an upgrade of their development cluster to 4.5.15 and they are still seeing deployment failures. The main root cause for these failures seem to be the api request message that is coming out while executing oc cli commands. Is this some change with new version ? 
See below examples:
Example:
[Action] Executing Apply Operation...
command: oc apply --server=https://api.awsnp1.ocpnp.travp.net:6443 --namespace=ics-pipeline-dev --token=**** --filename=pythonhello-redis-deployment.yaml
[Error] Apply operation failed.
I1031 12:28:16.114611   13520 request.go:621] Throttling request took 1.076172663s, request: GET:https://api.awsnp1.ocpnp.travp.net:6443/apis/networking.internal.knative.dev/v1alpha1?timeout=32s

Command line execution-
Using project "default" on server "https://api.awsnp1.ocpnp.travp.net:6443".
n4ed4f@k0005403 ~ $ oc get co
I1031 12:17:06.814332    5851 request.go:621] Throttling request took 1.017160755s, request: GET:https://api.awsnp1.ocpnp.travp.net:6443/apis/admissionregistration.k8s.io/v1beta1?timeout=32s
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.5.15    True        False         False      242d

after checking the must-gather, I have found "Throttling request" log entries for the kubernetes-apiserver-operator and openshift-apiserver-operator
-------------------------
2020-10-31T17:01:20.735370107Z I1031 17:01:20.735324       1 request.go:621] Throttling request took 1.195878707s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods?labelSelector=apiserver%3Dtrue
2020-10-31T17:01:21.735380902Z I1031 17:01:21.735336       1 request.go:621] Throttling request took 1.197386863s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-kube-apiserver
2020-10-31T17:01:22.935376312Z I1031 17:01:22.935332       1 request.go:621] Throttling request took 1.193540774s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods?labelSelector=apiserver%3Dtrue
2020-10-31T17:01:24.135340486Z I1031 17:01:24.135284       1 request.go:621] Throttling request took 1.195530242s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/secrets/encryption-config-78
2020-10-31T17:01:25.135360723Z I1031 17:01:25.135311       1 request.go:621] Throttling request took 1.19475135s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/secrets/encryption-config-78
2020-10-31T17:01:26.135364273Z I1031 17:01:26.135322       1 request.go:621] Throttling request took 1.188613098s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/configmaps/config
2020-10-31T17:01:27.335372106Z I1031 17:01:27.335333       1 request.go:621] Throttling request took 1.394951857s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/secrets/encryption-config-78
2020-10-31T17:01:28.5353632Z I1031 17:01:28.535322       1 request.go:621] Throttling request took 1.395701677s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?

I have recommended the customer to apply the workaround 1 then 2 in the KCS: https://access.redhat.com/solutions/5448851

is it really safe to upgrade to 4.5.16 and would this fix the original issue and the Throttling request ? the customer is using OC CLI 4.5.15 to perform all deployment activities. he can still continue to use CLI 4.5.15 or you recommend to use 4.5.16 after cluster upgrade to 4.5.16 ?

Comment 1 Stefan Schimanski 2020-11-05 13:06:52 UTC

Throttling requests is normal and to be expected, and to be handled by the client. The operator do that. If oc doesn't, this is a bug. Changing component.

Comment 2 Maciej Szulik 2020-11-09 12:54:14 UTC

As Stefan mentioned in the previous comment the throttling is perfectly normal behavior and oc is also respecting 
those since it's built on the same primitives as the operators are, namely client-go library. 

For the client oc apply command, I'd need logs from the execution with -v=8 to be able to diagnose what's preventing
oc apply from succeeding. It will also be helpful to have the yaml definitions of those deployments.

Comment 3 mchebbi@redhat.com 2020-11-19 10:38:18 UTC

shorturl.at/azI49 ==> requested information.

I have attached all requested information on the link above.

Looks like throttling messages coming on screen was bug : https://bugzilla.redhat.com/show_bug.cgi?id=1894574. Could you confirm ?

the customer have upgraded Dev cluster to 4.5.16 and still seeing throttling messages on stdout. he don't see those messages in other clusters which are on 4.5.16.

Dev Cluster has more load than any other clusters as this point.

I have attached must-gather for the upgraded Dev cluster, yaml file used for apply and the apply command output which fails the IBM UCD plugin because Plugin don't expect any messages on stdout / stderr.

Is there way to suppress these messages ?

customer also wanted to update my observation if it helps for investigation. It looks like oc cli may have some role displaying throttling messages as well - he has a cluster which is on 4.4.16.
he used 2 versions of oc clis to execute commands. he tried to get an "invalid resource" because he can consistently generate throttling message with that.

See below:
1. oc cli version 4.4
$./oc version
Client Version: openshift-clients-4.4.0-202006211643.p0-2-gd89e458c3
Server Version: 4.4.16
Kubernetes Version: v1.17.1+b83bc57
$./oc get ppv <=== invalid resource
error: the server doesn't have a resource type "ppv"

2. oc cli version 4.5.16
$oc version
Client Version: 4.5.16
Server Version: 4.4.16
Kubernetes Version: v1.17.1+b83bc57
$ oc get ppv <=== invalid resource
I1117 10:55:20.750698 9336 request.go:621] Throttling request took 1.023522471s, request: GET:https://api.aws2.ocplb.travp.net:6443/apis/kibana.k8s.elastic.co/v1?timeout=32s
error: the server doesn't have a resource type "ppv"

With the oc cli 4.5.16 I am consistently getting throttling message. Another observation is this message comes up for the first time for particular api call but then for subsequent api calls it doesn't come up for sometime if the previous call was successful.

Comment 4 mchebbi@redhat.com 2020-11-25 16:17:05 UTC

hello,

Could you please check the requested information shorturl.at/azI49 and the description on my previous comment.

Comment 5 Maciej Szulik 2020-11-26 14:03:36 UTC

The throttling is caused by big amounts of CRDs during discovery, compare https://bugzilla.redhat.com/show_bug.cgi?id=1899575
Currently, there is no way to bypass these messages, in 4.7 we will are increasing the limit such that these should not happen
that frequently.

*** This bug has been marked as a duplicate of bug 1899575 ***

Note You need to log in before you can comment on or make changes to this bug.