Hide Forgot
Description of problem: When using this expression in an alert rule: expr: count_over_time(openshift_apps_deploymentconfigs_last_failed_rollout_time{exported_namespace="ns1",name="prometheus-example-app",namespace="openshift-kube-controller-manager"}[1m]) > 0 to trigger when a deployment config has been unavailable, the rule is re-written to: expr: count_over_time(openshift_apps_deploymentconfigs_last_failed_rollout_time{exported_namespace="ns1",name="prometheus-example-app",namespace="ns1"}[1m]) > 0 After discussion with monitoring team, the issue is that the service monitor in openshift controller manager operator should have "honor_labels: true" Version-Release number of selected component (if applicable): 4.8
talking DC metrics ... transferring
- when testing this, it is necessary to make sure that the Prometheus is not overrideHonorLabels: true - the alert rule can be simplified to expr: count_over_time(openshift_apps_deploymentconfigs_last_failed_rollout_time{name="prometheus-example-app",namespace="ns1"}[1m]) > 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056