Bug 1947829
Summary: | When kube-apiserver rolls out, many other component containers hit many "Failed to watch ***: the server has received too many requests and has asked us to try again later" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Xingxing Xia <xxia> |
Component: | kube-apiserver | Assignee: | Abu Kashem <akashem> |
Status: | CLOSED DUPLICATE | QA Contact: | Ke Wang <kewang> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 4.8 | CC: | akashem, aos-bugs, juzhao, lszaszki, mfojtik, mgugino, xxia |
Target Milestone: | --- | Flags: | mfojtik:
needinfo?
|
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-06-09 08:25:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Xingxing Xia
2021-04-09 10:59:43 UTC
This is expected and not an error, but rather due to priority and fairness now also covering watches (with 429 http responses). It only happens during kas rollout. After finishing rollout, no new such occurrences. Should rollout handle it better? xxia, yes, it is expected. On the other hand, if you see any watch request being throttled for any objects in `openshift-*` namespace, then it should be a bug. Comment 0 only listed 3 containers. Actually many containers had the logs. Below is the sort of "the server has received too many requests and has asked us to try again later" lines in various containers: 336 lines: openshift-monitoring/prometheus-k8s-1/prometheus 331 lines: openshift-monitoring/prometheus-k8s-0/prometheus 59 lines: openshift-cluster-storage-operator/cluster-storage-operator-cb7d49d6b-xszgb/cluster-storage-operator 51 lines: openshift-apiserver-operator/openshift-apiserver-operator-67986b6969-hstjc/openshift-apiserver-operator 44 lines: openshift-service-ca-operator/service-ca-operator-5867c6548f-4w8nt/service-ca-operator 39 lines: openshift-kube-apiserver-operator/kube-apiserver-operator-c98cfbb94-g6vsl/kube-apiserver-operator 31 lines: openshift-etcd-operator/etcd-operator-67764474db-ftbz8/etcd-operator 29 lines: openshift-controller-manager-operator/openshift-controller-manager-operator-c6d7bd565-j4mjn/openshift-controller-manager-operator 26 lines: openshift-kube-controller-manager-operator/kube-controller-manager-operator-5985c9666f-gd9pv/kube-controller-manager-operator 21 lines: openshift-console-operator/console-operator-649cfdc674-cdbnx/console-operator 20 lines: openshift-image-registry/cluster-image-registry-operator-6594d4c4d6-tp7vf/cluster-image-registry-operator 13 lines: openshift-kube-storage-version-migrator-operator/kube-storage-version-migrator-operator-67f67f4c7b-bg4j5/kube-storage-version-migrator-operator 13 lines: openshift-config-operator/openshift-config-operator-dc8fbc9d5-nb55l/openshift-config-operator 11 lines: openshift-dns-operator/dns-operator-7c5676c5d7-zjkwk/dns-operator 9 lines: openshift-network-operator/network-operator-54687bc749-hmw4k/network-operator 9 lines: openshift-machine-api/machine-api-controllers-85b876d7cf-gk4dm/machineset-controller 9 lines: openshift-cluster-storage-operator/csi-snapshot-controller-operator-5776df6f9f-nbdkn/csi-snapshot-controller-operator 8 lines: openshift-cluster-node-tuning-operator/tuned-nw775/tuned 8 lines: openshift-apiserver/apiserver-6bbb8b6b4d-nqhnf/openshift-apiserver 7 lines: openshift-machine-api/machine-api-operator-c9b896d5-w4xmz/machine-api-operator 7 lines: openshift-machine-api/machine-api-controllers-85b876d7cf-gk4dm/machine-controller 7 lines: openshift-apiserver/apiserver-6bbb8b6b4d-blctx/openshift-apiserver 6 lines: openshift-kube-apiserver/kube-apiserver-ip-10-0-49-193.ap-northeast-1.compute.internal/kube-apiserver-check-endpoints 6 lines: openshift-cluster-samples-operator/cluster-samples-operator-664ddc5bfd-cjq5b/cluster-samples-operator 6 lines: openshift-cluster-node-tuning-operator/tuned-txrsz/tuned 6 lines: openshift-cluster-node-tuning-operator/tuned-px9f5/tuned 6 lines: openshift-apiserver/apiserver-6bbb8b6b4d-nqhnf/openshift-apiserver-check-endpoints 5 lines: openshift-monitoring/prometheus-operator-8547b964f6-gjdg5/prometheus-operator 5 lines: openshift-machine-api/machine-api-controllers-85b876d7cf-gk4dm/nodelink-controller 5 lines: openshift-kube-apiserver/kube-apiserver-ip-10-0-60-8.ap-northeast-1.compute.internal/kube-apiserver-check-endpoints 5 lines: openshift-authentication-operator/authentication-operator-df848bcb6-sqcmd/authentication-operator 4 lines: openshift-operator-lifecycle-manager/packageserver-56d8c4d89-ztks8/packageserver 4 lines: openshift-machine-config-operator/machine-config-controller-645f45745b-6lvsd/machine-config-controller 4 lines: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-49-193.ap-northeast-1.compute.internal/kube-scheduler-recovery-controller 4 lines: openshift-insights/insights-operator-64cc56b55-7sn5d/insights-operator 4 lines: openshift-authentication/oauth-openshift-688fbdbbb9-vk2kp/oauth-openshift 3 lines: openshift-oauth-apiserver/apiserver-ffd476c66-79bgn/oauth-apiserver 3 lines: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-79-135.ap-northeast-1.compute.internal/kube-scheduler 3 lines: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-60-8.ap-northeast-1.compute.internal/kube-scheduler 3 lines: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-49-193.ap-northeast-1.compute.internal/kube-scheduler-cert-syncer 3 lines: openshift-cluster-node-tuning-operator/cluster-node-tuning-operator-8c99b949f-mfjcq/cluster-node-tuning-operator 3 lines: openshift-cluster-csi-drivers/aws-ebs-csi-driver-operator-66454d5dd5-wdmgb/aws-ebs-csi-driver-operator 2 lines: openshift-oauth-apiserver/apiserver-ffd476c66-dkz24/oauth-apiserver 2 lines: openshift-network-diagnostics/network-check-source-5585685c7c-f5k58/check-endpoints 2 lines: openshift-machine-api/cluster-autoscaler-operator-695748744b-lwxmd/cluster-autoscaler-operator 2 lines: openshift-cluster-node-tuning-operator/tuned-2p6mb/tuned 2 lines: openshift-authentication/oauth-openshift-688fbdbbb9-nsvh6/oauth-openshift 2 lines: openshift-apiserver/apiserver-6bbb8b6b4d-blctx/openshift-apiserver-check-endpoints 1 lines: openshift-service-ca/service-ca-6df7b46945-46fxp/service-ca-controller 1 lines: openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-49-193.ap-northeast-1.compute.internal/kube-scheduler 1 lines: openshift-authentication/oauth-openshift-688fbdbbb9-xbg6d/oauth-openshift 1 lines: openshift-apiserver/apiserver-6bbb8b6b4d-8n4mb/openshift-apiserver-check-endpoints 1 lines: openshift-apiserver/apiserver-6bbb8b6b4d-8n4mb/openshift-apiserver http://file.rdu.redhat.com/~xxia/bug/1947829_c.log is the full content of comment 0 script. Below is some abstract of the top ones: 331 lines: openshift-monitoring/prometheus-k8s-0/prometheus: level=error ts=2021-04-09T09:38:08.609Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:427: Failed to watch *v1.Endpoints: the server has received too many requests and has asked us to try again later (get endpoints)" 59 lines: openshift-cluster-storage-operator/cluster-storage-operator-cb7d49d6b-xszgb/cluster-storage-operator: E0409 09:38:08.696989 1 reflector.go:138] k8s.io/client-go.0+incompatible/tools/cache/reflector.go:167: Failed to watch *v1.Secret: the server has received too many requests and has asked us to try again later (get secrets) 51 lines: openshift-apiserver-operator/openshift-apiserver-operator-67986b6969-hstjc/openshift-apiserver-operator: E0409 09:38:08.651854 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Secret: the server has received too many requests and has asked us to try again later (get secrets) Below are the grep of the *.go in the error messages, and the sort of their occurrences, I'M SURE it is bug somewhere in kas or these clients: $ grep -o "[^ ]*go:.*.go" t.log | sed 's/:[0-9]*/:***/' | sort | uniq -c | sort -nr 3408 caller=klog.go:*** component=k8s_client_runtime func=ErrorDepth msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go 2186 caller=klog.go:*** component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go 446 reflector.go:***] k8s.io/client-go/informers/factory.go 404 reflector.go:***] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go 304 reflector.go:***] k8s.io/client-go.0-rc.0/tools/cache/reflector.go 176 reflector.go:***] k8s.io/client-go.0+incompatible/tools/cache/reflector.go 168 reflector.go:***] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go 90 reflector.go:***] k8s.io/client-go.0/tools/cache/reflector.go 85 reflector.go:***] k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader_controller.go 72 reflector.go:***] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go 69 reflector.go:***] k8s.io/client-go.4/tools/cache/reflector.go 69 reflector.go:***] k8s.io/client-go.1/tools/cache/reflector.go 66 reflector.go:***] k8s.io/client-go.5/tools/cache/reflector.go 30 reflector.go:***] github.com/openshift/cluster-monitoring-operator/pkg/operator/operator.go 21 reflector.go:***] k8s.io/client-go.2/tools/cache/reflector.go 20 reflector.go:***] github.com/openshift/client-go/route/informers/externalversions/factory.go 11 reflector.go:***] k8s.io/kube-state-metrics/internal/store/builder.go 5 reflector.go:***] github.com/openshift/machine-api-operator/pkg/generated/informers/externalversions/factory.go this should be a regression issue for 4.8, since we did not meet it in 4.7 Node link controller (runs in openshift-machine-api namespace, watches machines in openshift-machine-api) is hitting this: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_kubernetes/656/pull-ci-openshift-kubernetes-master-e2e-agnostic-cmd/1382280733116076032/artifacts/e2e-agnostic-cmd/gather-extra/artifacts/pods/openshift-machine-api_machine-api-controllers-c4f9b7478-q2nrb_nodelink-controller.log This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that. *** This bug has been marked as a duplicate of bug 1948311 *** |