+++ This bug was initially created as a clone of Bug #1955478 +++ Description of problem: By default, kube-state-metrics collects metrics about all Kubernetes resources but some of these metrics aren't used in any rule or dashboard. Storing them in Prometheus increases memory usage for no good reason. Version-Release number of selected component (if applicable): 4.6 How reproducible: Always Steps to Reproduce: Run the following query in the Prometheus UI: sort_desc(count by(__name__) ({job="kube-state-metrics"})) Actual results: It returns > 200 metrics with a high count of series. Expected results: Metrics that aren't used in rules and dashboards aren't present. Additional info: There's a jsonnet addon [1] in kube-prometheus upstream which configures a list of metrics that can safely be dropped. [1] https://github.com/prometheus-operator/kube-prometheus/pull/1076
Test with PR #oc -n openshift-monitoring get deployment kube-state-metrics -oyaml|grep metric-blacklist - --metric-blacklist=kube_secret_labels,kube_*_created,kube_*_metadata_resource_version,kube_replicaset_metadata_generation,kube_replicaset_status_observed_generation,kube_pod_restart_policy,kube_pod_init_container_status_terminated,kube_pod_init_container_status_running,kube_pod_container_status_terminated,kube_pod_container_status_running,kube_pod_completion_time,kube_pod_status_scheduled #token=`oc sa get-token prometheus-k8s -n openshift-monitoring` #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_secret_labels -e kube_*_created -e kube_*_metadata_resource_version -e kube_replicaset_metadata_generation -e kube_replicaset_status_observed_generation -e kube_pod_restart_policy -e kube_pod_init_container_status_terminated -e kube_pod_init_container_status_running -e kube_pod_container_status_terminated -e kube_pod_container_status_running -e kube_pod_completion_time -e kube_pod_status_schedule no result
Correct #c1 Metrics 'kube_*_created' and 'kube_*_metadata_resource_version' in backlist don't take effect. token=`oc sa get-token prometheus-k8s -n openshift-monitoring` [hongyli@hongyli-fed Downloads]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_replicaset_metadata_generation -e kube_replicaset_status_observed_generation -e kube_pod_restart_policy -e kube_pod_init_container_status_terminated -e kube_pod_init_container_status_running -e kube_pod_container_status_terminated -e kube_pod_container_status_running -e kube_pod_completion_time -e kube_pod_status_scheduled % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 66089 0 66089 0 0 1698k 0 --:--:-- --:--:-- --:--:-- 1698k [hongyli@hongyli-fed Downloads]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_|grep created % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 66089 0 66089 0 0 2933k 0 --:--:-- --:--:-- --:--:-- 2933k "kube_certificatesigningrequest_created", "kube_configmap_created", "kube_cronjob_created", "kube_daemonset_created", "kube_deployment_created", "kube_endpoint_created", "kube_mutatingwebhookconfiguration_created", "kube_namespace_created", "kube_node_created", "kube_pod_created", "kube_poddisruptionbudget_created", "kube_replicaset_created", "kube_secret_created", "kube_service_created", "kube_statefulset_created", "kube_storageclass_created", "kube_validatingwebhookconfiguration_created", [hongyli@hongyli-fed Downloads]$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_|grep _metadata_resource_version % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 66089 0 66089 0 0 3227k 0 --:--:-- --:--:-- --:--:-- 3227k "kube_configmap_metadata_resource_version", "kube_mutatingwebhookconfiguration_metadata_resource_version", "kube_secret_metadata_resource_version", "kube_validatingwebhookconfiguration_metadata_resource_version",
checked with 4.7.0-0.nightly-2021-05-16-105214,metrics 'kube_*_created' and 'kube_*_metadata_resource_version' in backlist don't take effect, other metrics are removed, since 4.8 bug 1955478 is not fixed, move to ASSIGNED # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep created "kube_certificatesigningrequest_created", "kube_configmap_created", "kube_cronjob_created", "kube_daemonset_created", "kube_deployment_created", "kube_endpoint_created", "kube_job_created", "kube_limitrange_created", "kube_mutatingwebhookconfiguration_created", "kube_namespace_created", "kube_networkpolicy_created", "kube_node_created", "kube_pod_created", "kube_poddisruptionbudget_created", "kube_replicaset_created", "kube_replicationcontroller_created", "kube_resourcequota_created", "kube_secret_created", "kube_service_created", "kube_statefulset_created", "kube_storageclass_created", "kube_validatingwebhookconfiguration_created", # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep metadata_resource_version "kube_configmap_metadata_resource_version", "kube_mutatingwebhookconfiguration_metadata_resource_version", "kube_secret_metadata_resource_version", "kube_validatingwebhookconfiguration_metadata_resource_version", # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "kube_replicaset_status_observed_generation|kube_pod_restart_policy|kube_pod_init_container_status_terminated|kube_pod_init_container_status_running|kube_pod_container_status_terminated|kube_pod_container_status_running|kube_pod_completion_time|kube_pod_status_scheduled" no result
tested with 4.7.0-0.nightly-2021-05-29-015423, issue is fixed # token=`oc sa get-token prometheus-k8s -n openshift-monitoring` # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep created no result # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep metadata_resource_version no result # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "kube_replicaset_status_observed_generation|kube_pod_restart_policy|kube_pod_init_container_status_terminated|kube_pod_init_container_status_running|kube_pod_container_status_terminated|kube_pod_container_status_running|kube_pod_completion_time|kube_pod_status_scheduled" no result
verified with payload 4.7.0-0.nightly-2021-05-29-015423 regexp patterns kube_.+_created and kube_.+_metadata_resource_version take effect $ token=`oc sa get-token prometheus-k8s -n openshift-monitoring` $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_|grep created % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 64918 0 64918 0 0 1761k 0 --:--:-- --:--:-- --:--:-- 1761k $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_|grep _metadata_resource_version % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 64918 0 64918 0 0 1921k 0 --:--:-- --:--:-- --:--:-- 1921k
This bug will be shipped as part of next z-stream release 4.7.15 on June 14th, as 4.7.14 was dropped due to a regression https://bugzilla.redhat.com/show_bug.cgi?id=1967614
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.16 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2286