2091902 – unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server has received too many requests and has asked us to try again later

Bug 2091902 - unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server has received too many requests and has asked us to try again later

Summary: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.10
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Simon Pasquier
QA Contact:	hongyan li
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2091211 (view as bug list)
Depends On:
Blocks:	2098505
TreeView+	depends on / blocked

Reported:	2022-05-31 09:33 UTC by diflores
Modified:	2023-08-30 06:58 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:15:13 UTC
Target Upstream Version:
Embargoed:
Flags:	hasun: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 1692	0	None	Merged	Bug 2091902: Improve performance of Prometheus Adapter	2022-07-16 04:00:28 UTC
Red Hat Bugzilla	2091211	1	high	CLOSED	prometheus adapter tokens are not getting validated and causing the load on API server	2022-09-06 14:26:52 UTC
Red Hat Knowledge Base (Solution)	6961619	0	None	None	None	2022-06-10 12:36:34 UTC
Red Hat Knowledge Base (Solution)	6964075	0	None	None	None	2022-07-28 17:08:42 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 11:15:34 UTC

Internal Links: 2091211

Description diflores 2022-05-31 09:33:36 UTC

Description of problem:

Unable to delete any project/ns with following error:

unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server has received too many requests and has asked us to try again later


kube-apiserver-master01
...
2022-05-25T13:53:36.576108316Z E0525 13:53:36.575877      16 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 429, Body: Too many requests, please try again later.
2022-05-25T13:53:36.576108316Z I0525 13:53:36.575912      16 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
2022-05-25T13:54:36.585753602Z E0525 13:54:36.585663      16 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 429, Body: Too many requests, please try again later.
2022-05-25T13:54:36.585753602Z I0525 13:54:36.585680      16 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.

kube-controller-manager-master01
...
2022-05-28T03:54:34.326992167Z E0528 03:54:34.326928       1 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server has received too many requests and has asked us to try again later
2022-05-28T03:54:35.175781100Z E0528 03:54:35.175731       1 memcache.go:101] couldn't get resource list for metrics.k8s.io/v1beta1: the server has received too many requests and has asked us to try again later


# Unable to access raw pod metrics API endpoint

+ oc get --v=9 --raw /apis/metrics.k8s.io/v1beta1/pods
I0530 11:53:35.984102   24912 loader.go:372] Config loaded from file:  .../kubeconfig
I0530 11:53:35.986361   24912 round_trippers.go:435] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.8.0 (linux/amd64) kubernetes/a0c12be" 'https://api.ocp1-example.se:6443/apis/metrics.k8s.io/v1beta1/pods'
I0530 11:54:36.001605   24912 round_trippers.go:454] GET https://api.ocp1-example.se:6443/apis/metrics.k8s.io/v1beta1/pods  in 60015 milliseconds
I0530 11:54:36.001641   24912 round_trippers.go:460] Response Headers:
I0530 11:54:36.001773   24912 helpers.go:234] Connection error: Get https://api.ocp1-example.se:6443/apis/metrics.k8s.io/v1beta1/pods: stream error: stream ID 1; INTERNAL_ERROR
F0530 11:54:36.001818   24912 helpers.go:115] Unable to connect to the server: stream error: stream ID 1; INTERNAL_ERROR
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc000010001, 0xc000d92840, 0x79, 0x28c)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:1021 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x5c17580, 0xc000000003, 0x0, 0x0, 0xc00077ad90, 0x4a41f6c, 0xa, 0x73, 0x11e0400)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:970 +0x191
k8s.io/klog/v2.(*loggingT).printDepth(0x5c17580, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x2, 0xc000a4e290, 0x1, 0x1)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:733 +0x16f
k8s.io/klog/v2.FatalDepth(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:1495
k8s.io/kubectl/pkg/cmd/util.fatal(0xc000e1c8c0, 0x4a, 0x1)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:93 +0x288
k8s.io/kubectl/pkg/cmd/util.checkErr(0x42bc240, 0xc000542a80, 0x3fb1878)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:188 +0x935
k8s.io/kubectl/pkg/cmd/util.CheckErr(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:115
k8s.io/kubectl/pkg/cmd/get.NewCmdGet.func1(0xc000bc9180, 0xc000d95950, 0x0, 0x3)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/get/get.go:167 +0x159
github.com/spf13/cobra.(*Command).execute(0xc000bc9180, 0xc000d95920, 0x3, 0x3, 0xc000bc9180, 0xc000d95920)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0xc000724000, 0x2, 0xc000724000, 0x2)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:960 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:897
main.main()
        /go/src/github.com/openshift/oc/cmd/oc/oc.go:93 +0x645

# Unable to get podmetrics from all namespaces

+ oc get podmetrics --all-namespaces
Unable to connect to the server: stream error: stream ID 1; INTERNAL_ERROR

# Access the nodes metrics API endpoint is  OK  


+ oc get --raw /apis/metrics.k8s.io/v1beta1/nodes
{"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{},"items":[{"metadata":{"name":"master01-ocp.com","creationTimestamp":"2022-05-30T09:54:36Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"master01-ocp.com","kubernetes.io/os":"linux","node-role.kubernetes.io/master":"","node-role.kubernetes.io/worker":"","node.openshift.io/os_id":"rhcos","pod":"controller","pod":"non-controller"}},"timestamp":"2022-05-30T09:54:36Z","window":"1m0s","usage":{"cpu":"9922m","memory":"67077808Ki"}},{"metadata":{"name":"master02-ocp.com","creationTimestamp":"2022-05-30T09:54:36Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"master02-ocp.com","kubernetes.io/os":"linux","node-role.kubernetes.io/master":"","node-role.kubernetes.io/worker":"","node.openshift.io/os_id":"rhcos","pod":"controller","pod":"non-controller"}},"timestamp":"2022-05-30T09:54:36Z","window":"1m0s","usage":{"cpu":"7112m","memory":"44586456Ki"}},{"metadata":{"name":"master03-ocp.com","creationTimestamp":"2022-05-30T09:54:36Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"master03-ocp.com","kubernetes.io/os":"linux","node-role.kubernetes.io/master":"","node-role.kubernetes.io/worker":"","node.openshift.io/os_id":"rhcos","pcc-mm-pod":"non-controller","pcc-sm-pod":"non-controller"}},"timestamp":"2022-05-30T09:54:36Z","window":"1m0s","usage":{"cpu":"9436m","memory":"51704216Ki"}},{"metadata":{"name":"worker01-ocp.com","creationTimestamp":"2022-05-30T09:54:36Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"worker01-ocp.com","kubernetes.io/os":"linux","node-role.kubernetes.io/standard":"","node-role.kubernetes.io/worker":"","node.openshift.io/os_id":"rhcos","pod":"non-controller","pod":"non-controller"}},"timestamp":"2022-05-30T09:54:36Z","window":"1m0s","usage":{"cpu":"5223m","memory":"49409820Ki"}}....

After restart of `prometheus-adapter` Pods the deletion of project/ns is working OK however still unable to access podmetrics raw endpoint.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
Undefined

Actual results:


Expected results:

Metrics API do not impact project/ns creation/deletion process


Additional info:

Comment 40 hongyan li 2022-06-22 01:14:19 UTC

test with payload 4.11.0-0.nightly-2022-06-21-151125

% oc delete ns lhytest 
namespace "lhytest" deleted
% oc -n openshift-monitoring get deployment prometheus-adapter -oyaml|grep prometheus-url
        - --prometheus-url=https://prometheus-k8s.openshift-monitoring.svc:9091
% oc -n openshift-monitoring get cm prometheus-adapter-prometheus-config -oyaml  
apiVersion: v1
data:
  prometheus-config.yaml: |
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority: /etc/ssl/certs/service-ca.crt
        server: https://prometheus-k8s.openshift-monitoring.svc:9091
      name: prometheus-k8s
    contexts:
    - context:
        cluster: prometheus-k8s
        user: prometheus-k8s
      name: prometheus-k8s
    current-context: prometheus-k8s
    kind: Config
    preferences: {}
    users:
    - name: prometheus-k8s
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
  creationTimestamp: "2022-06-21T23:23:21Z"
  name: prometheus-adapter-prometheus-config
  namespace: openshift-monitoring
  resourceVersion: "20964"
  uid: 813b7aff-b484-4dd5-b8d5-bdf722a7d3c5
% oc adm top node
NAME                                         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
ip-10-0-143-43.eu-west-1.compute.internal    126m         3%     2000Mi          13%       
ip-10-0-154-180.eu-west-1.compute.internal   441m         12%    6213Mi          43%       
ip-10-0-168-91.eu-west-1.compute.internal    204m         5%     3407Mi          23%       
ip-10-0-183-252.eu-west-1.compute.internal   635m         18%    8983Mi          61%       
ip-10-0-208-213.eu-west-1.compute.internal   203m         5%     3675Mi          25%       
ip-10-0-210-94.eu-west-1.compute.internal    585m         16%    7397Mi          51%       
% oc adm top pod -A
NAMESPACE                                          NAME                                                                        CPU(cores)   MEMORY(bytes)   
openshift-apiserver                                apiserver-86c6b9b4-6hlng                                                    31m          240Mi           
openshift-apiserver                                apiserver-86c6b9b4-97wfl                                                    25m          237Mi           
openshift-apiserver                                apiserver-86c6b9b4-mddjz                                                    25m          262Mi           
openshift-apiserver-operator                       openshift-apiserver-operator-7f8d964886-vpnjp                               3m           124Mi           
openshift-authentication                           oauth-openshift-6889cf667d-5zw8k                                            2m           38Mi            
openshift-authentication                           oauth-openshift-6889cf667d-8cp5n                                            4m           36Mi            
openshift-authentication                           oauth-openshift-6889cf667d-jsxc4                                            3m           42Mi

Comment 47 Simon Pasquier 2022-09-06 14:26:53 UTC

*** Bug 2091211 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.