1955483 – [4.6] Drop high-cardinality metrics from kube-state-metrics which aren't used

Bug 1955483 - [4.6] Drop high-cardinality metrics from kube-state-metrics which aren't used

Summary: [4.6] Drop high-cardinality metrics from kube-state-metrics which aren't used

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Arunprasad Rajkumar
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:	1955482
Blocks:	1955452
TreeView+	depends on / blocked

Reported:	2021-04-30 08:24 UTC by Simon Pasquier
Modified:	2021-09-20 13:25 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1955482
Environment:
Last Closed:	2021-06-29 06:26:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 1221	None	open	Bug 1955483: [4.6] Drop high-cardinality metrics from kube-state-metrics which aren't used	2021-06-15 14:02:59 UTC
Red Hat Knowledge Base (Solution)	6273601	None	None	None	2021-08-20 15:43:25 UTC
Red Hat Product Errata	RHBA-2021:2498	None	None	None	2021-06-29 06:26:39 UTC

Internal Links: 1966104

Description Simon Pasquier 2021-04-30 08:24:20 UTC

+++ This bug was initially created as a clone of Bug #1955482 +++

+++ This bug was initially created as a clone of Bug #1955478 +++

Description of problem:
By default, kube-state-metrics collects metrics about all Kubernetes resources but some of these metrics aren't used in any rule or dashboard.

Storing them in Prometheus increases memory usage for no good reason.

Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always

Steps to Reproduce:

Run the following query in the Prometheus UI:
sort_desc(count by(__name__) ({job="kube-state-metrics"}))

Actual results:
It returns > 200 metrics with a high count of series.

Expected results:
Metrics that aren't used in rules and dashboards aren't present.

Additional info:

There's a jsonnet addon [1] in kube-prometheus upstream which configures a list of metrics that can safely be dropped. 

[1] https://github.com/prometheus-operator/kube-prometheus/pull/1076

Comment 5 Junqi Zhao 2021-06-17 02:04:56 UTC

tested with 4.6.0-0.nightly-2021-06-16-122936, issue is fixed
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep created
no result

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep metadata_resource_version
no result

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "kube_secret_labels|kube_replicaset_metadata_generation|kube_replicaset_status_observed_generation|kube_pod_restart_policy|kube_pod_init_container_status_terminated|kube_pod_init_container_status_running|kube_pod_container_status_terminated|kube_pod_container_status_running|kube_pod_completion_time|kube_pod_status_scheduled"
no result

Comment 8 errata-xmlrpc 2021-06-29 06:26:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.36 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2498

Note You need to log in before you can comment on or make changes to this bug.