Bug 2074583 - Alerts that use kube* and node* metrics are not working
Summary: Alerts that use kube* and node* metrics are not working
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Dhruv Bindra
QA Contact: Filip Balák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-12 14:44 UTC by Dhruv Bindra
Modified: 2023-08-09 17:00 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-28 06:49:06 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-osd-deployer pull 166 0 None open Updated labelSelector for k8s-metrics-service-monitor 2022-04-12 16:16:16 UTC

Description Dhruv Bindra 2022-04-12 14:44:27 UTC
Description of problem:
The alerts that use kube_* and node_* are not working because the label which k8s-metrics-service-monitor was using to select the service in openshift-monitoring namespace was removed. 
The label that service monitor was using is "prometheus": "k8s"
But this label was removed in recent OCP upgrades so serviceMonitor can't select the service. 

Version-Release number of selected component (if applicable):
OCP v4.10.z
ODF v4.10.0
ocs-osd-deployer v2.0.0

How reproducible:
Install the ODF addon and check for the alert which uses kube_* and node_* metrics such as PersistentVolumeUsageNearFull, PersistentVolumeUsageCritical and CephMgrIsMissingReplicas

Steps to Reproduce:
1. Install the ODF addon
2. check for the alert which uses kube_* and node_* metrics such as PersistentVolumeUsageNearFull, PersistentVolumeUsageCritical and CephMgrIsMissingReplicas
3. the alerts won't fire or the metrics(kube_* and node_*) can't be fetched on prometheus UI

Actual results:
the alerts won't fire or the metrics(kube_* and node_*) can't be fetched on prometheus UI

Expected results:
alerts should fire and metrics(kube_* and node_*) should be fetched on prometheus UI

Additional info:

Comment 5 Filip Balák 2022-04-19 17:11:26 UTC
PersistentVolumeUsageNearFull and PersistentVolumeUsageCritical alerts are working. Based on comment 4, I move this BZ to VERIFIED.

Tested with:
ocs-operator.v4.10.0
OCP 4.10.8


Note You need to log in before you can comment on or make changes to this bug.