Bug 2073112
Summary: | Prometheus (uwm) externalLabels not showing always in alerts. | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | German Parente <gparente> |
Component: | Monitoring | Assignee: | Joao Marcal <jmarcal> |
Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> |
Severity: | low | Docs Contact: | Brian Burt <bburt> |
Priority: | medium | ||
Version: | 4.10 | CC: | amuller, anpicker, aos-bugs, bburt, clasohm, cruhm, gekis, hongyli, jfajersk, jmarcal, juzhao, spasquie |
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Before this update, UWM users would sometimes not see certain external labels even though they had configured UWM Prometheus to add those external labels this was caused by this configuration not being propagated to Thanos querier so if a user queried a metric not provided by the UWM Prometheus instance he would not see the external label. With this update, CMO now properly propagates the external labels configured in UWM Prometheus to Thanos ruler which resolves the issue.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 11:05:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2118303 |
Description
German Parente
2022-04-07 16:19:19 UTC
After investigating we have discovered that, the customer can update their PrometheusRule resources to have in the "by" aggregation, the external label that they want to see in the alert. Change from this: sum by (endpoint,instance,job,namespace,pod,prometheus,service) (up{job="prometheus-example-app"}) == 1 To this: sum by (endpoint,instance,job,namespace,pod,prometheus,service,labelmy) (up{job="prometheus-example-app"}) == 1 The "by" aggregation is discarding the external label. By reading the documentation this behavior is indeed misleading as one would expect the external labels to always show if configured. This is a potential area of improvement for the monitoring stack. Also good to know, is that, external labels will only show on an alert if the alert is using metrics that come from a Prometheus instance that is configured to add the external label. For instance, if I configure UWM to add the label "labelmy: test" this label will only appear in alerts that query the UWM Prometheus instance, like "up{job="prometheus-example-app"} == 1". An alert with an expression, "kube_deployment_status_replicas{job="prometheus-example-app"} == 1" will not show the external labels configured for UWM, since the data for this query is provided by the in-cluster Prometheus instance. TL;DR Update the rule expression to have the external label, since by takes it away. Test with payload 4.11.0-0.nightly-2022-04-22-002610 Enable user workload monitoring Deploy example app Configure external label of user workload prometheus Create alert rule with expression about data provided by in-cluster prometheus Configuration yaml, see attachment Query alert, can see external label oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v1/alerts' | jq |grep -A10 KubeAlert % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 5701 0 5701 0 0 428k 0 --:--:-- --:--:-- --:--:-- 428k "alertname": "KubeAlert", "container": "kube-rbac-proxy-main", "deployment": "prometheus-example-app", "endpoint": "https-main", "job": "kube-state-metrics", "namespace": "ns1", "prometheus": "openshift-monitoring/k8s", "service": "kube-state-metrics" }, Added test case OCP-50241 - Prometheus (uwm) externalLabels not showing always in alerts Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 Backport was merged today to 4.10 https://github.com/openshift/cluster-monitoring-operator/pull/1742 |