Bug 1955478
| Summary: | Drop high-cardinality metrics from kube-state-metrics which aren't used | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Simon Pasquier <spasquie> | |
| Component: | Monitoring | Assignee: | Pawel Krupa <pkrupa> | |
| Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.6 | CC: | alegrand, anpicker, erooth, juzhao, kakkoyun, lcosic, oarribas, pkrupa, pmagotra | |
| Target Milestone: | --- | |||
| Target Release: | 4.8.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1955482 (view as bug list) | Environment: | ||
| Last Closed: | 2021-07-27 23:05:13 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1951052, 1955482 | |||
Test with payload 4.8.0-0.nightly-2021-05-06-003426
# oc get deployment kube-state-metrics -oyaml|grep -A15 metric-denylist
- --metric-denylist=kube_secret_labels
- |
--metric-denylist=
kube_*_created,
kube_*_metadata_resource_version,
kube_replicaset_metadata_generation,
kube_replicaset_status_observed_generation,
kube_pod_restart_policy,
kube_pod_init_container_status_terminated,
kube_pod_init_container_status_running,
kube_pod_container_status_terminated,
kube_pod_container_status_running,
kube_pod_completion_time,
kube_pod_status_scheduled
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399c5c73581f5792cddb909ebfdbc2c353d80e7e2535a4d7a7d2f633b7b60207
imagePullPolicy: IfNotPresent
name: kube-state-metrics
resources:
#token=`oc sa get-token prometheus-k8s -n openshift-monitoring` #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_replicaset_metadata_generation -e kube_replicaset_status_observed_generation -e kube_pod_restart_policy -e kube_pod_init_container_status_terminated -e kube_pod_init_container_status_running -e kube_pod_container_status_terminated -e kube_pod_container_status_running -e kube_pod_completion_time -e kube_pod_status_scheduled no result #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_|grep created "kube_certificatesigningrequest_created", "kube_configmap_created", "kube_cronjob_created", "kube_daemonset_created", "kube_deployment_created", "kube_endpoint_created", "kube_job_created", "kube_mutatingwebhookconfiguration_created", "kube_namespace_created", "kube_node_created", "kube_pod_created", "kube_poddisruptionbudget_created", "kube_replicaset_created", "kube_replicationcontroller_created", "kube_resourcequota_created", "kube_secret_created", "kube_service_created", "kube_statefulset_created", "kube_storageclass_created", "kube_validatingwebhookconfiguration_created", #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -e kube_|grep _metadata_resource_version "kube_configmap_metadata_resource_version", "kube_mutatingwebhookconfiguration_metadata_resource_version", "kube_secret_metadata_resource_version", "kube_validatingwebhookconfiguration_metadata_resource_version", the denylist which contains '*' doesn't take effect. Test with pr, issue is fixed, detail refer https://github.com/openshift/cluster-monitoring-operator/pull/1173#issuecomment-844980732 tested with 4.8.0-0.nightly-2021-05-21-101954 and later builds, kube_.+_created and kube_.+_metadata_resource_version metrics are removed
# oc -n openshift-monitoring get deploy kube-state-metrics -oyaml
...
--metric-denylist=
kube_.+_created,
kube_.+_metadata_resource_version,
...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |
Description of problem: By default, kube-state-metrics collects metrics about all Kubernetes resources but some of these metrics aren't used in any rule or dashboard. Storing them in Prometheus increases memory usage for no good reason. Version-Release number of selected component (if applicable): 4.6 How reproducible: Always Steps to Reproduce: Run the following query in the Prometheus UI: sort_desc(count by(__name__) ({job="kube-state-metrics"})) Actual results: It returns > 200 metrics with a high count of series. Expected results: Metrics that aren't used in rules and dashboards aren't present. Additional info: There's a jsonnet addon [1] in kube-prometheus upstream which configures a list of metrics that can safely be dropped. [1] https://github.com/prometheus-operator/kube-prometheus/pull/1076