Bug 1955482 - [4.7] Drop high-cardinality metrics from kube-state-metrics which aren't used
Summary: [4.7] Drop high-cardinality metrics from kube-state-metrics which aren't used
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.z
Assignee: Simon Pasquier
QA Contact: hongyan li
URL:
Whiteboard:
Depends On: 1955478
Blocks: 1954016 1955483
TreeView+ depends on / blocked
 
Reported: 2021-04-30 08:21 UTC by Simon Pasquier
Modified: 2021-08-20 15:43 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1955478
: 1955483 (view as bug list)
Environment:
Last Closed: 2021-06-15 09:27:08 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1141 0 None closed [4.7] Bug 1955482: Drop high-cardinality metrics from kube-state-metrics which aren't used 2021-05-25 10:18:10 UTC
Github openshift cluster-monitoring-operator pull 1163 0 None open Bug 1955482: fix kube-state-metrics regexp patterns 2021-05-25 10:18:11 UTC
Red Hat Knowledge Base (Solution) 6273601 0 None None None 2021-08-20 15:43:15 UTC
Red Hat Product Errata RHSA-2021:2286 0 None None None 2021-06-15 09:27:55 UTC

Internal Links: 1966104

Description Simon Pasquier 2021-04-30 08:21:26 UTC
+++ This bug was initially created as a clone of Bug #1955478 +++

Description of problem:
By default, kube-state-metrics collects metrics about all Kubernetes resources but some of these metrics aren't used in any rule or dashboard.

Storing them in Prometheus increases memory usage for no good reason.

Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always

Steps to Reproduce:

Run the following query in the Prometheus UI:
sort_desc(count by(__name__) ({job="kube-state-metrics"}))

Actual results:
It returns > 200 metrics with a high count of series.

Expected results:
Metrics that aren't used in rules and dashboards aren't present.

Additional info:

There's a jsonnet addon [1] in kube-prometheus upstream which configures a list of metrics that can safely be dropped. 

[1] https://github.com/prometheus-operator/kube-prometheus/pull/1076

Comment 1 hongyan li 2021-05-12 06:06:24 UTC
Test with PR
#oc -n openshift-monitoring get deployment kube-state-metrics -oyaml|grep metric-blacklist
        - --metric-blacklist=kube_secret_labels,kube_*_created,kube_*_metadata_resource_version,kube_replicaset_metadata_generation,kube_replicaset_status_observed_generation,kube_pod_restart_policy,kube_pod_init_container_status_terminated,kube_pod_init_container_status_running,kube_pod_container_status_terminated,kube_pod_container_status_running,kube_pod_completion_time,kube_pod_status_scheduled

#token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
#oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_secret_labels -e kube_*_created -e kube_*_metadata_resource_version -e kube_replicaset_metadata_generation -e kube_replicaset_status_observed_generation -e kube_pod_restart_policy -e kube_pod_init_container_status_terminated -e kube_pod_init_container_status_running -e kube_pod_container_status_terminated -e kube_pod_container_status_running -e kube_pod_completion_time -e kube_pod_status_schedule
 
no result

Comment 2 hongyan li 2021-05-13 03:51:14 UTC
Correct #c1
 Metrics 'kube_*_created' and 'kube_*_metadata_resource_version' in backlist don't take effect.

token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
[hongyli@hongyli-fed Downloads]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_replicaset_metadata_generation -e kube_replicaset_status_observed_generation -e kube_pod_restart_policy -e kube_pod_init_container_status_terminated -e kube_pod_init_container_status_running -e kube_pod_container_status_terminated -e kube_pod_container_status_running -e kube_pod_completion_time -e kube_pod_status_scheduled
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 66089    0 66089    0     0  1698k      0 --:--:-- --:--:-- --:--:-- 1698k
[hongyli@hongyli-fed Downloads]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_|grep created
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 66089    0 66089    0     0  2933k      0 --:--:-- --:--:-- --:--:-- 2933k
    "kube_certificatesigningrequest_created",
    "kube_configmap_created",
    "kube_cronjob_created",
    "kube_daemonset_created",
    "kube_deployment_created",
    "kube_endpoint_created",
    "kube_mutatingwebhookconfiguration_created",
    "kube_namespace_created",
    "kube_node_created",
    "kube_pod_created",
    "kube_poddisruptionbudget_created",
    "kube_replicaset_created",
    "kube_secret_created",
    "kube_service_created",
    "kube_statefulset_created",
    "kube_storageclass_created",
    "kube_validatingwebhookconfiguration_created",
[hongyli@hongyli-fed Downloads]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_|grep _metadata_resource_version
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 66089    0 66089    0     0  3227k      0 --:--:-- --:--:-- --:--:-- 3227k
    "kube_configmap_metadata_resource_version",
    "kube_mutatingwebhookconfiguration_metadata_resource_version",
    "kube_secret_metadata_resource_version",
    "kube_validatingwebhookconfiguration_metadata_resource_version",

Comment 4 Junqi Zhao 2021-05-17 02:29:55 UTC
checked with 4.7.0-0.nightly-2021-05-16-105214,metrics 'kube_*_created' and 'kube_*_metadata_resource_version' in backlist don't take effect, other metrics are removed, since 4.8 bug 1955478 is not fixed, move to ASSIGNED
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep created
    "kube_certificatesigningrequest_created",
    "kube_configmap_created",
    "kube_cronjob_created",
    "kube_daemonset_created",
    "kube_deployment_created",
    "kube_endpoint_created",
    "kube_job_created",
    "kube_limitrange_created",
    "kube_mutatingwebhookconfiguration_created",
    "kube_namespace_created",
    "kube_networkpolicy_created",
    "kube_node_created",
    "kube_pod_created",
    "kube_poddisruptionbudget_created",
    "kube_replicaset_created",
    "kube_replicationcontroller_created",
    "kube_resourcequota_created",
    "kube_secret_created",
    "kube_service_created",
    "kube_statefulset_created",
    "kube_storageclass_created",
    "kube_validatingwebhookconfiguration_created",
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep metadata_resource_version
    "kube_configmap_metadata_resource_version",
    "kube_mutatingwebhookconfiguration_metadata_resource_version",
    "kube_secret_metadata_resource_version",
    "kube_validatingwebhookconfiguration_metadata_resource_version",

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "kube_replicaset_status_observed_generation|kube_pod_restart_policy|kube_pod_init_container_status_terminated|kube_pod_init_container_status_running|kube_pod_container_status_terminated|kube_pod_container_status_running|kube_pod_completion_time|kube_pod_status_scheduled"
no result

Comment 7 Junqi Zhao 2021-05-31 01:47:31 UTC
tested with 4.7.0-0.nightly-2021-05-29-015423, issue is fixed
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep created
no result

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep metadata_resource_version
no result

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "kube_replicaset_status_observed_generation|kube_pod_restart_policy|kube_pod_init_container_status_terminated|kube_pod_init_container_status_running|kube_pod_container_status_terminated|kube_pod_container_status_running|kube_pod_completion_time|kube_pod_status_scheduled"
no result

Comment 8 hongyan li 2021-05-31 02:56:23 UTC
verified with payload 4.7.0-0.nightly-2021-05-29-015423

regexp patterns kube_.+_created and kube_.+_metadata_resource_version take effect

$ token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_|grep created
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 64918    0 64918    0     0  1761k      0 --:--:-- --:--:-- --:--:-- 1761k
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_|grep _metadata_resource_version
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 64918    0 64918    0     0  1921k      0 --:--:-- --:--:-- --:--:-- 1921k

Comment 10 Siddharth Sharma 2021-06-04 18:39:04 UTC
This bug will be shipped as part of next z-stream release 4.7.15 on June 14th, as 4.7.14 was dropped due to a regression https://bugzilla.redhat.com/show_bug.cgi?id=1967614

Comment 14 errata-xmlrpc 2021-06-15 09:27:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.16 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2286


Note You need to log in before you can comment on or make changes to this bug.