The metric etcd_request_duration_seconds_bucket in 4.7 has 25k series on an empty cluster. It exposes 41(!) buckets and includes every resource (150) and every verb (10). This cannot have such extensive cardinality. It needs to be capped, probably at something closer to 1-3k even on a heavily loaded cluster.
We opened a PR upstream to reduce the number of buckets for 'etcd_request_duration_seconds' metric - https://github.com/kubernetes/kubernetes/pull/96754 This takes us to what we had before (around 11 buckets). According to Clayton: > so this takes us from 40k entries to 10k > 10k is a lot > (on an idle cluster, which was 20% of total series out of 200k) An open question is: 'apiserver_request_duration_seconds' metric has around 37 buckets and with all the labels looks like it is also a cardinality explosion - https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L103-L104 Do we want to reduce the number of buckets for this metric? Clayton's thoughts: > in general these two metrics are 1/4 of all series on a normal cluster > they probably should be more like 1/16 or lower > if we don't do that upstream we should do that when we scrape those clusters (getting david or stefan to weigh in on which parts of cardinality in those metrics we don't need) I did some further investigation on the "etcd_request_duration_seconds" metric. It has two labels > "operation", "type" > https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go#L46 Here the type label represents the underlying etcd type and also etcd key names. I checked the number of values for the 'type' label has on my 4.7 dev cluster > count(count by (type) (etcd_request_duration_seconds_bucket)) > 257 so 'type' does not correlate to kubernetes 'resource' which we have > count (count by (resource) (apiserver_request_duration_seconds_bucket)) > 145 'type' seems to be unbounded as it includes the etcd key names for the CRDs as well. Suggestions/Comments from Clayton: > we could do the reduction on the scrape side if we had to > or select only a subset of key resources to track > pods, events, namespaces, a few representative crds > operation type definitely is useful > resource type seems more arbitrary, certainly crd vs not > maybe all crds should be using the same resource type > maybe it should be by apigroup
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
upstream PR: https://github.com/kubernetes/kubernetes/pull/96754
The LifecycleStale keyword was removed because the bug got commented on recently. The bug assignee was notified.
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-01-19-095812 True False 58m Cluster version is 4.7.0-0.nightly-2021-01-19-095812 On OCP 4.6 , ran etcd_request_duration_seconds_bucket query with time range 1h in Prometheus Total time series: 25707 On OCP 4.7, ran etcd_request_duration_seconds_bucket query with time range 1h in Prometheus Result series: 9786 The series number is down than 4.6. Per the PR https://github.com/openshift/kubernetes/pull/515, the metrics etcd_request_duration_seconds_bucket shows buckets data should belong to range Buckets: []float64{0.005, 0.025, 0.1, 0.25, 0.5, 1.0, 2.0, 4.0, 15.0, 30.0, 60.0}, queried the metrics etcd_request_duration_seconds_bucket from web-console, the value of label ‘le’ indeed only in Buckets slice, so move the VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633