Hide Forgot
{client="cluster-autoscaler-operator/v0.0.0 (linux/amd64) kubernetes/$Format",resource="configmaps",scope="namespace",verb="GET"} 0.5 {client="cluster-autoscaler-operator/v0.0.0 (linux/amd64) kubernetes/$Format",resource="configmaps",scope="namespace",verb="PUT"} 0.5 This looks like you have an overly aggressive leader election (2 seconds?). Operators should be on 10-20s refresh intervals (with 90s-120s timeouts) because handoff is not important. Please correct your tuning before GA because this drives write load to the cluster.
https://github.com/kubernetes/kubernetes/pull/77204 https://github.com/kubernetes-sigs/controller-runtime/pull/412 Currently controller-runtime repo runs leader election with hard-coded, very agressive values. With above PRs, leader election configuration will become configurable. Then we would pass higher time durations from cluster-autoscaler-operator using the options which above PRs are adding.
Meanwhile until upstream PR merges at controller-runtime, this is a stop-gap/workaround fix https://github.com/openshift/cluster-autoscaler-operator/pull/96
Locally patched in vendor.
Query from clayton: topk(20, sum without (instance) (rate(apiserver_request_count[5m])))
Verified on 4.1.0-0.nightly-2019-05-03-093152. Lower by 10x {client="cluster-autoscaler-operator/v0.0.0 (linux/amd64) kubernetes/$Format",code="200",contentType="application/json",endpoint="https",instance="172.31.128.217:6443",job="apiserver",namespace="default",resource="configmaps",scope="namespace",service="kubernetes",verb="GET"} 0.05185185185185186 {client="cluster-autoscaler-operator/v0.0.0 (linux/amd64) kubernetes/$Format",code="200",contentType="application/json",endpoint="https",instance="172.31.128.217:6443",job="apiserver",namespace="default",resource="configmaps",scope="namespace",service="kubernetes",verb="PUT"} 0.05185185185185186
rate(apiserver_request_count{client=~"cluster-autoscaler-operator.*"}[5m])
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758