Description of problem: The following metrics are missing from kube-state-metrics: - kube_pod_container_status_terminated_reason - kube_pod_init_container_status_terminated_reason - kube_pod_status_scheduled_time And the following PR explains that users should use kube_pod_container_status_terminated_reason instead of kube_pod_container_status_terminated https://github.com/prometheus-operator/kube-prometheus/pull/1076#issuecomment-814878652 Previously, some metrics were removed from kube-state-metrics by adding the following --metric-denylist argument to the kube-state-metrics container --metric-denylist= kube_.+_created, kube_.+_metadata_resource_version, kube_replicaset_metadata_generation, kube_replicaset_status_observed_generation, kube_pod_restart_policy, kube_pod_init_container_status_terminated, kube_pod_init_container_status_running, kube_pod_container_status_terminated, kube_pod_container_status_running, kube_pod_completion_time, kube_pod_status_scheduled --metric-denylist: Comma-separated list of metrics not to be enabled. This list comprises of exact metric names and/or regex patterns. The allowlist and denylist are mutually exclusive. However, all the list of metrics is managed as RegEx, thus "kube_pod_container_status_terminated" denies .*kube_pod_container_status_terminated.*, that's why kube_pod_init_container_status_terminated_reason is missing An easy way to fix it is to add a '$' at the end of the metric name: --metric-denylist= kube_.+_created, kube_.+_metadata_resource_version, kube_replicaset_metadata_generation, kube_replicaset_status_observed_generation, kube_pod_restart_policy, kube_pod_init_container_status_terminated$, kube_pod_init_container_status_running, kube_pod_container_status_terminated$, kube_pod_container_status_running, kube_pod_completion_time, kube_pod_status_scheduled$ Version-Release number of selected component (if applicable): Openshift 4.9.17 $ /usr/bin/kube-state-metrics --version │ version.Version{GitCommit:"e7c95f2", BuildDate:"2021-12-15T01:41:23Z", Release:"v2.0.0", GoVersion:"go1.16.6", Compiler:"gc", Platform:│"linux/amd64"} How reproducible: Easy Steps to Reproduce: 1) Look for kube_pod_container_status_terminated metric $ oc project openshift-monitoring $ oc exec -ti $(oc get pod -l app.kubernetes.io/name=kube-state-metrics -o name) -c kube-state-metrics -- curl http://localhost:8081/metrics | grep kube_pod_container_status_terminated No result 2) Edit the kube-state-metrics deployment and add '$' to the metric names --metric-denylist= kube_.+_created, kube_.+_metadata_resource_version, kube_replicaset_metadata_generation, kube_replicaset_status_observed_generation, kube_pod_restart_policy, kube_pod_init_container_status_terminated$, <==== kube_pod_init_container_status_running, kube_pod_container_status_terminated$, <==== kube_pod_container_status_running, kube_pod_completion_time, kube_pod_status_scheduled$ <==== 3) Wait for the pod to be re-created and look again for kube_pod_container_status_terminated Metrics kube_pod_container_status_terminated_reason are displayed Metrics kube_pod_container_status_terminated are still correctly filtered 4) After 10-20 seconds, the monitoring cluster operator reverts the modifications performed on the deployment Actual results: Metrics kube_pod_container_status_terminated and kube_pod_container_status_terminated_reason are filtered Expected results: Metrics kube_pod_container_status_terminated_reason should be available Metrics kube_pod_container_status_terminated should be filtered - Additional informations: Same thing for kube_pod_init_container_status_terminated_reason and kube_pod_status_scheduled_time
tested with 4.11.0-0.nightly-2022-03-13-055724, metrics "kube_pod_container_status_terminated_reason|kube_pod_init_container_status_terminated_reason|kube_pod_status_scheduled_time" are shown, the "kube_pod_container_status_terminated|kube_pod_init_container_status_terminated|kube_pod_status_scheduled" are denied, this is expected # oc -n openshift-monitoring get deploy kube-state-metrics -oyaml .... - | --metric-denylist= ^kube_secret_labels$, ^kube_.+_annotations$ - --metric-labels-allowlist=pods=[*],nodes=[*],namespaces=[*],persistentvolumes=[*],persistentvolumeclaims=[*],poddisruptionbudgets=[*],poddisruptionbudget=[*] - | --metric-denylist= ^kube_.+_created$, ^kube_.+_metadata_resource_version$, ^kube_replicaset_metadata_generation$, ^kube_replicaset_status_observed_generation$, ^kube_pod_restart_policy$, ^kube_pod_init_container_status_terminated$, ^kube_pod_init_container_status_running$, ^kube_pod_container_status_terminated$, ^kube_pod_container_status_running$, ^kube_pod_completion_time$, ^kube_pod_status_scheduled$ # token=`oc sa get-token prometheus-k8s -n openshift-monitoring` # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "kube_pod_container_status_terminated_reason|kube_pod_init_container_status_terminated_reason|kube_pod_status_scheduled_time" "kube_pod_container_status_terminated_reason", "kube_pod_init_container_status_terminated_reason", "kube_pod_status_scheduled_time", # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep kube | grep -E "created|metadata_resource_version|annotations" no result # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "kube_replicaset_metadata_generation|kube_replicaset_status_observed_generation|kube_pod_restart_policy|kube_pod_init_container_status_terminated|kube_pod_init_container_status_running|kube_pod_container_status_terminated|kube_pod_container_status_running|kube_pod_completion_time|kube_pod_status_scheduled" "kube_pod_container_status_terminated_reason", "kube_pod_init_container_status_terminated_reason", "kube_pod_status_scheduled_time",
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069