Bug 2015386

Summary: Possibility to add labels to the built-in OCP alerts
Product: OpenShift Container Platform Reporter: Vedanti Jaypurkar <vjaypurk>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: medium    
Version: 4.8CC: amuller, anpicker, aos-bugs, david.karlsen, erooth, pgough, spasquie
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-12 04:39:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 David J. M. Karlsen 2021-10-19 10:06:40 UTC
This will be particularly useful for PDBs. Today these just get the namespace label, but if arbitrary labels could be picked up from the PDB, this will make it easier to route-based alerting in alertmanager.

Comment 2 Philip Gough 2021-10-20 13:14:30 UTC
Hi David, I've created a PR (linked) that exposes the PDB metrics via the allow-list.

However if you take a look upstream https://github.com/kubernetes/kube-state-metrics/blob/master/docs/poddisruptionbudget-metrics.md you can see that this particular resource has no additional labels available to bolt on.

We could treat this as the first step of a RFE and I can propose upstream that we add "kube_poddisruptionbudget_labels" and "kube_poddisruptionbudget_annotations" metrics which is what I believe is the first step in solving your requirements. These are available already on many other resources already such as namespace. https://github.com/kubernetes/kube-state-metrics/blob/master/docs/namespace-metrics.md

Comment 4 David J. M. Karlsen 2021-10-27 18:57:28 UTC
OK, so what you are saying is that https://github.com/openshift/cluster-monitoring-operator/pull/1439/files#diff-b61f7d6e3529525eef15693c9529b4e065ac3e9d1af6308573e42e825fc1218bR37 won't expose the labels on the PDB, because KSM does not expose them: https://github.com/kubernetes/kube-state-metrics/blob/master/docs/poddisruptionbudget-metrics.md ?

Having kube_poddisruptionbudget_labels/annotations makes sense, as one can then provide labels and hence routing in alert mgr based on these, which is very useful and how we do routing.

Another option would be to "join" this metric with the namespace labels, so that one could simply label the namespace to obtain the routing - but that's not how the other alerts are designed in OCP, so I guess we don't want to go down that route?

Comment 5 Philip Gough 2021-11-01 10:14:50 UTC
Hi David, yes you are correct, there will be no additional series or labels other than https://github.com/kubernetes/kube-state-metrics/blob/v2.2.3/docs/poddisruptionbudget-metrics.md exposed via KSM.

No we have already merged https://github.com/kubernetes/kube-state-metrics/pull/1623 to move this RFE forward and expose those additional series. I'll merge https://github.com/openshift/cluster-monitoring-operator/pull/1439 also.

As per the comment around the join, that would indeed work, but the majority of the alerts are being pulled from upstream and I don't think that is the road we want to go down to fill individual use cases. Hopefully that is understandable. I think the above changes in combination with https://issues.redhat.com/browse/OBSDA-2 will allow you to tweak the alerts according to specific needs.

Let me know if that satisfies this RFE and we can close it.

Thanks

Comment 6 David J. M. Karlsen 2021-11-01 10:27:35 UTC
I think this is as good as it can get at this stage, thanks!
This can be closed.

Comment 10 Philip Gough 2021-11-04 10:02:41 UTC
As mentioned, we need to wait for a release of KSM to be cut that includes https://github.com/kubernetes/kube-state-metrics/pull/1623 and pull it to our downstream fork before verifying this change.

Comment 11 Philip Gough 2021-11-18 14:37:50 UTC
Reassigning to @filip since the final piece of this ticket requires cutting a new release of KSM which is scheduled for mid December. That in conjunction with the ability to override the default alerts (https://github.com/openshift/enhancements/pull/958) and https://github.com/openshift/cluster-monitoring-operator/pull/1439 should provide customer with the ability to achieve what they want and close the RFE.

Comment 13 Junqi Zhao 2021-12-20 11:20:17 UTC
tested with 4.10.0-0.nightly-2021-12-18-034942, kube_poddisruptionbudget_annotations and kube_poddisruptionbudget_labels is added, but we only could see pdb labels for kube_poddisruptionbudget_labels, can't see the pdb annotations from kube_poddisruptionbudget_annotations
# oc -n openshift-monitoring get deploy kube-state-metrics -oyaml | grep metric-labels-allowlist
        - --metric-labels-allowlist=pods=[*],nodes=[*],namespaces=[*],persistentvolumes=[*],persistentvolumeclaims=[*],poddisruptionbudgets=[*],poddisruptionbudget=[*]


# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep poddisruptionbudget
    "kube_poddisruptionbudget_annotations",
    "kube_poddisruptionbudget_labels",
    "kube_poddisruptionbudget_status_current_healthy",
    "kube_poddisruptionbudget_status_desired_healthy",
    "kube_poddisruptionbudget_status_expected_pods",
    "kube_poddisruptionbudget_status_observed_generation",
    "kube_poddisruptionbudget_status_pod_disruptions_allowed",

pdb file
**********************
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: zk-cm
  annotations:
    imageregistry: "https://hub.docker.com/"
    contactor: help
  labels:
    app.kubernetes.io/component: zookeeper
    app.kubernetes.io/instance: main
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: zookeeper
**********************
# oc -n default get pdb zk-pdb -oyaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  annotations:
    contactor: help
    imageregistry: https://hub.docker.com/
  creationTimestamp: "2021-12-20T10:50:51Z"
  generation: 1
  labels:
    app.kubernetes.io/component: zookeeper
    app.kubernetes.io/instance: main
  name: zk-pdb
  namespace: default
  resourceVersion: "211532"
  uid: ef4b4060-314c-46de-85fb-592b098c8c93
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: zookeeper
status:
  conditions:
  - lastTransitionTime: "2021-12-20T10:50:51Z"
    message: ""
    observedGeneration: 1
    reason: InsufficientPods
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 0
  desiredHealthy: 2
  disruptionsAllowed: 0
  expectedPods: 0
  observedGeneration: 1
**********************
could see pdb labels for kube_poddisruptionbudget_labels
kube_poddisruptionbudget_labels{container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", label_app_kubernetes_io_component="zookeeper", label_app_kubernetes_io_instance="main", namespace="default", poddisruptionbudget="zk-pdb", service="kube-state-metrics"} 1

can't find the pdb annotations from kube_poddisruptionbudget_annotations
kube_poddisruptionbudget_annotations{container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="default", poddisruptionbudget="zk-pdb", service="kube-state-metrics"}  1

also find, we can not get annotations from kube_*_annotations, example: kube_daemonset_annotations, kube_deployment_annotations
# oc -n openshift-monitoring get ds node-exporter -o jsonpath="{.metadata.annotations}"
{"deprecated.daemonset.template.generation":"1"}

result from prometheus
kube_daemonset_annotations{container="kube-rbac-proxy-main", daemonset="node-exporter", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", service="kube-state-metrics"} 1

# oc -n openshift-monitoring get deploy cluster-monitoring-operator -o jsonpath="{.metadata.annotations}"
{"deployment.kubernetes.io/revision":"1","include.release.openshift.io/self-managed-high-availability":"true","include.release.openshift.io/single-node-developer":"true"}

result from prometheus
kube_deployment_annotations{container="kube-rbac-proxy-main", deployment="cluster-monitoring-operator", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", prometheus="openshift-monitoring/k8s", service="kube-state-metrics"}  1

Comment 15 Simon Pasquier 2021-12-22 09:56:02 UTC
@Junqi it is expected that kube_poddisruptionbudget_annotations isn't present, we choose to expose kube_poddisruptionbudget_labels only which should be enough to filter .

Comment 16 Junqi Zhao 2021-12-22 11:03:15 UTC
based on Comment 14 and 15, set to VERIFIED

Comment 17 Junqi Zhao 2021-12-22 11:04:00 UTC
(In reply to Junqi Zhao from comment #16)
> based on Comment 14 and 15, set to VERIFIED

change to
based on Comment 13 and 15, set to VERIFIED

Comment 21 errata-xmlrpc 2022-03-12 04:39:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056