2012770 – when using expression metric openshift_apps_deploymentconfigs_last_failed_rollout_time namespace label is re-written

Bug 2012770 - when using expression metric openshift_apps_deploymentconfigs_last_failed_rollout_time namespace label is re-written

Summary: when using expression metric openshift_apps_deploymentconfigs_last_failed_rol...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	openshift-controller-manager
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Filip Krepinsky
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-10-11 09:10 UTC by German Parente
Modified:	2022-03-10 16:19 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause & Consequence: openshift_apps_deploymentconfigs_last_failed_rollout_time metric has wrong namespace label and extra exported_namespace label Fix & Result: openshift_apps_deploymentconfigs_last_failed_rollout_time metric has correct namespace label and exported_namespace label is missing
Clone Of:
Environment:
Last Closed:	2022-03-10 16:18:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-openshift-controller-manager-operator pull 230	0	None	open	Bug 2012770: honor labels in openshift-controller-manager metrics	2021-11-15 20:25:08 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:19:10 UTC

Description German Parente 2021-10-11 09:10:05 UTC

Description of problem:

When using this expression in an alert rule:

expr: count_over_time(openshift_apps_deploymentconfigs_last_failed_rollout_time{exported_namespace="ns1",name="prometheus-example-app",namespace="openshift-kube-controller-manager"}[1m]) > 0 

to trigger when a deployment config has been unavailable, the rule is re-written to:


expr: count_over_time(openshift_apps_deploymentconfigs_last_failed_rollout_time{exported_namespace="ns1",name="prometheus-example-app",namespace="ns1"}[1m]) > 0 


After discussion with monitoring team, the issue is that the service monitor in openshift controller manager operator should have "honor_labels: true"


Version-Release number of selected component (if applicable): 4.8

Comment 1 Gabe Montero 2021-10-14 19:36:41 UTC

talking DC metrics ... transferring

Comment 2 Filip Krepinsky 2021-11-15 20:30:34 UTC

- when testing this, it is necessary to make sure that the Prometheus is not overrideHonorLabels: true
- the alert rule can be simplified to

expr: count_over_time(openshift_apps_deploymentconfigs_last_failed_rollout_time{name="prometheus-example-app",namespace="ns1"}[1m]) > 0

Comment 8 errata-xmlrpc 2022-03-10 16:18:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.