Description of problem: Some clarification should be needed in the following situation: 1) define externalLabels at UWM level: oc get cm user-workload-monitoring-config -n openshift-user-workload-monitoring -o yaml apiVersion: v1 data: config.yaml: | prometheus: externalLabels: labelmy: test kind: ConfigMap 2) define PrometheusRules as this one: oc get PrometheusRules -n ns1 -o yaml apiVersion: v1 items: - apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: "2022-04-01T09:25:32Z" generation: 1 name: example-alert namespace: ns1 resourceVersion: "492473" uid: a8f58819-1131-40bb-995a-eafc62978cc5 spec: groups: - name: oneexample rules: - alert: VersionAlert expr: version{job="prometheus-example-app"} == 1 labels: mylabel: nada severity: critical 3) once the former alert is firing, check the alert labels: oc exec alertmanager-main-0 -- amtool --alertmanager.url http://localhost:9093 alert query VersionAlert --output=json | jq we can see the labels ( at prometheus level and rule level): "labels": { "alertname": "VersionAlert", "endpoint": "web", "instance": "10.128.2.111:8080", "job": "prometheus-example-app", "labelmy": "test", "mylabel": "nada", "namespace": "ns1", "pod": "prometheus-example-app-7ffcdd457c-4b5hm", "prometheus": "openshift-user-workload-monitoring/user-workload", "service": "prometheus-example-app", "severity": "critical", "version": "v0.1.0" } 4) use an expression like this: sum by (endpoint,instance,job,namespace,pod,prometheus,service) (up{job="prometheus-example-app"}) ==1 we can see the label as: "labels": { "alertname": "AlertTestTest", "endpoint": "web", "instance": "10.128.2.111:8080", "job": "prometheus-example-app", "mylabel": "nada", "namespace": "ns1", "pod": "prometheus-example-app-7ffcdd457c-4b5hm", "prometheus": "openshift-user-workload-monitoring/user-workload", "service": "prometheus-example-app", "severity": "critical" So, externalLabels at prometheus level are not shown. It seems there's a documentation bug upstream reflecting this: https://github.com/openshift/openshift-docs/issues/44324 We need to clarify if this is indeed a documentation bug and we need to explain the reason why and in which cases this is not happening consistently. Version-Release number of selected component (if applicable): 4.10
After investigating we have discovered that, the customer can update their PrometheusRule resources to have in the "by" aggregation, the external label that they want to see in the alert. Change from this: sum by (endpoint,instance,job,namespace,pod,prometheus,service) (up{job="prometheus-example-app"}) == 1 To this: sum by (endpoint,instance,job,namespace,pod,prometheus,service,labelmy) (up{job="prometheus-example-app"}) == 1 The "by" aggregation is discarding the external label. By reading the documentation this behavior is indeed misleading as one would expect the external labels to always show if configured. This is a potential area of improvement for the monitoring stack. Also good to know, is that, external labels will only show on an alert if the alert is using metrics that come from a Prometheus instance that is configured to add the external label. For instance, if I configure UWM to add the label "labelmy: test" this label will only appear in alerts that query the UWM Prometheus instance, like "up{job="prometheus-example-app"} == 1". An alert with an expression, "kube_deployment_status_replicas{job="prometheus-example-app"} == 1" will not show the external labels configured for UWM, since the data for this query is provided by the in-cluster Prometheus instance. TL;DR Update the rule expression to have the external label, since by takes it away.
Test with payload 4.11.0-0.nightly-2022-04-22-002610 Enable user workload monitoring Deploy example app Configure external label of user workload prometheus Create alert rule with expression about data provided by in-cluster prometheus Configuration yaml, see attachment Query alert, can see external label oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v1/alerts' | jq |grep -A10 KubeAlert % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 5701 0 5701 0 0 428k 0 --:--:-- --:--:-- --:--:-- 428k "alertname": "KubeAlert", "container": "kube-rbac-proxy-main", "deployment": "prometheus-example-app", "endpoint": "https-main", "job": "kube-state-metrics", "namespace": "ns1", "prometheus": "openshift-monitoring/k8s", "service": "kube-state-metrics" },
Added test case OCP-50241 - Prometheus (uwm) externalLabels not showing always in alerts
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069
Backport was merged today to 4.10 https://github.com/openshift/cluster-monitoring-operator/pull/1742