Bug 2050120 - Missing metrics in kube-state-metrics
Summary: Missing metrics in kube-state-metrics
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Haoyu Sun
QA Contact: Junqi Zhao
Brian Burt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-03 09:58 UTC by Florian Gleizes
Modified: 2022-11-21 06:32 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: kube-state-metrics started with argument `--metric-denylist=kube_secret_labels,kube_.*_annotations`, which filtered out the metrics: - kube_pod_container_status_terminated_reason - kube_pod_init_container_status_terminated_reason - kube_pod_status_scheduled_time Consequence: kube-state-metrics does not expose the metrics in question. Fix: Change kube-state-metrics argument `--metric-denylist` to `^kube_secret_labels$,^kube_.+_annotations$`. Result: The 3 missing metrics are exposed again by kube-state-metrics.
Clone Of:
Environment:
Last Closed: 2022-08-10 10:46:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1556 0 None open Update jsonnet dependencies and prometheus-operator version 2022-02-15 10:51:51 UTC
Github openshift cluster-monitoring-operator pull 1574 0 None open Bug 2050120: Sanitize all regex allow/denylist used in KSM component 2022-03-02 06:33:50 UTC
Github prometheus-operator kube-prometheus pull 1613 0 None Merged Sanitize regex denylist in ksm-lite addon 2022-02-03 13:26:30 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:47:06 UTC

Description Florian Gleizes 2022-02-03 09:58:43 UTC
Description of problem:

The following metrics are missing from kube-state-metrics:
- kube_pod_container_status_terminated_reason
- kube_pod_init_container_status_terminated_reason
- kube_pod_status_scheduled_time

And the following PR explains that users should use kube_pod_container_status_terminated_reason instead of kube_pod_container_status_terminated
https://github.com/prometheus-operator/kube-prometheus/pull/1076#issuecomment-814878652

Previously, some metrics were removed from kube-state-metrics by adding the following --metric-denylist argument to the kube-state-metrics container

--metric-denylist=
kube_.+_created,
kube_.+_metadata_resource_version,
kube_replicaset_metadata_generation,
kube_replicaset_status_observed_generation,
kube_pod_restart_policy,
kube_pod_init_container_status_terminated,
kube_pod_init_container_status_running,
kube_pod_container_status_terminated,
kube_pod_container_status_running,
kube_pod_completion_time,
kube_pod_status_scheduled

--metric-denylist: Comma-separated list of metrics not to be enabled. This list comprises of exact metric names and/or regex patterns. The allowlist and denylist are mutually exclusive.

However, all the list of metrics is managed as RegEx, thus "kube_pod_container_status_terminated" denies .*kube_pod_container_status_terminated.*, that's why kube_pod_init_container_status_terminated_reason is missing

An easy way to fix it is to add a '$' at the end of the metric name:
--metric-denylist=
kube_.+_created,
kube_.+_metadata_resource_version,
kube_replicaset_metadata_generation,
kube_replicaset_status_observed_generation,
kube_pod_restart_policy,
kube_pod_init_container_status_terminated$,
kube_pod_init_container_status_running,
kube_pod_container_status_terminated$,
kube_pod_container_status_running,
kube_pod_completion_time,
kube_pod_status_scheduled$

Version-Release number of selected component (if applicable):
Openshift 4.9.17
$ /usr/bin/kube-state-metrics --version                                                                                          │
version.Version{GitCommit:"e7c95f2", BuildDate:"2021-12-15T01:41:23Z", Release:"v2.0.0", GoVersion:"go1.16.6", Compiler:"gc", Platform:│"linux/amd64"}


How reproducible: Easy


Steps to Reproduce:
1) Look for kube_pod_container_status_terminated metric
$ oc project openshift-monitoring
$ oc exec -ti $(oc get pod -l app.kubernetes.io/name=kube-state-metrics -o name) -c kube-state-metrics -- curl http://localhost:8081/metrics | grep kube_pod_container_status_terminated
No result

2) Edit the kube-state-metrics deployment and add '$' to the metric names 
--metric-denylist=
kube_.+_created,
kube_.+_metadata_resource_version,
kube_replicaset_metadata_generation,
kube_replicaset_status_observed_generation,
kube_pod_restart_policy,
kube_pod_init_container_status_terminated$, <====
kube_pod_init_container_status_running,
kube_pod_container_status_terminated$,      <====
kube_pod_container_status_running,
kube_pod_completion_time,
kube_pod_status_scheduled$                  <====

3) Wait for the pod to be re-created and look again for kube_pod_container_status_terminated
Metrics kube_pod_container_status_terminated_reason are displayed
Metrics kube_pod_container_status_terminated are still correctly filtered

4) After 10-20 seconds, the monitoring cluster operator reverts the modifications performed on the deployment


Actual results:
Metrics kube_pod_container_status_terminated and kube_pod_container_status_terminated_reason are filtered

Expected results:
Metrics kube_pod_container_status_terminated_reason should be available
Metrics kube_pod_container_status_terminated should be filtered

- Additional informations:
Same thing for kube_pod_init_container_status_terminated_reason and kube_pod_status_scheduled_time

Comment 6 Junqi Zhao 2022-03-14 09:26:18 UTC
tested with 4.11.0-0.nightly-2022-03-13-055724, metrics "kube_pod_container_status_terminated_reason|kube_pod_init_container_status_terminated_reason|kube_pod_status_scheduled_time" are shown, the "kube_pod_container_status_terminated|kube_pod_init_container_status_terminated|kube_pod_status_scheduled" are denied, this is expected
# oc -n openshift-monitoring get deploy  kube-state-metrics -oyaml
....
        - |
          --metric-denylist=
          ^kube_secret_labels$,
          ^kube_.+_annotations$
        - --metric-labels-allowlist=pods=[*],nodes=[*],namespaces=[*],persistentvolumes=[*],persistentvolumeclaims=[*],poddisruptionbudgets=[*],poddisruptionbudget=[*]
        - |
          --metric-denylist=
          ^kube_.+_created$,
          ^kube_.+_metadata_resource_version$,
          ^kube_replicaset_metadata_generation$,
          ^kube_replicaset_status_observed_generation$,
          ^kube_pod_restart_policy$,
          ^kube_pod_init_container_status_terminated$,
          ^kube_pod_init_container_status_running$,
          ^kube_pod_container_status_terminated$,
          ^kube_pod_container_status_running$,
          ^kube_pod_completion_time$,
          ^kube_pod_status_scheduled$
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "kube_pod_container_status_terminated_reason|kube_pod_init_container_status_terminated_reason|kube_pod_status_scheduled_time"
    "kube_pod_container_status_terminated_reason",
    "kube_pod_init_container_status_terminated_reason",
    "kube_pod_status_scheduled_time",

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep -E "created|metadata_resource_version|annotations"
no result

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "kube_replicaset_metadata_generation|kube_replicaset_status_observed_generation|kube_pod_restart_policy|kube_pod_init_container_status_terminated|kube_pod_init_container_status_running|kube_pod_container_status_terminated|kube_pod_container_status_running|kube_pod_completion_time|kube_pod_status_scheduled"
    "kube_pod_container_status_terminated_reason",
    "kube_pod_init_container_status_terminated_reason",
    "kube_pod_status_scheduled_time",

Comment 11 errata-xmlrpc 2022-08-10 10:46:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.