Bug 2074583

Summary: Alerts that use kube* and node* metrics are not working
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Dhruv Bindra <dbindra>
Component: odf-managed-serviceAssignee: Dhruv Bindra <dbindra>
Status: CLOSED CURRENTRELEASE QA Contact: Filip Balák <fbalak>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: aeyal, mmuench, nberry, ocs-bugs, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-28 06:49:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dhruv Bindra 2022-04-12 14:44:27 UTC
Description of problem:
The alerts that use kube_* and node_* are not working because the label which k8s-metrics-service-monitor was using to select the service in openshift-monitoring namespace was removed. 
The label that service monitor was using is "prometheus": "k8s"
But this label was removed in recent OCP upgrades so serviceMonitor can't select the service. 

Version-Release number of selected component (if applicable):
OCP v4.10.z
ODF v4.10.0
ocs-osd-deployer v2.0.0

How reproducible:
Install the ODF addon and check for the alert which uses kube_* and node_* metrics such as PersistentVolumeUsageNearFull, PersistentVolumeUsageCritical and CephMgrIsMissingReplicas

Steps to Reproduce:
1. Install the ODF addon
2. check for the alert which uses kube_* and node_* metrics such as PersistentVolumeUsageNearFull, PersistentVolumeUsageCritical and CephMgrIsMissingReplicas
3. the alerts won't fire or the metrics(kube_* and node_*) can't be fetched on prometheus UI

Actual results:
the alerts won't fire or the metrics(kube_* and node_*) can't be fetched on prometheus UI

Expected results:
alerts should fire and metrics(kube_* and node_*) should be fetched on prometheus UI

Additional info:

Comment 5 Filip Balák 2022-04-19 17:11:26 UTC
PersistentVolumeUsageNearFull and PersistentVolumeUsageCritical alerts are working. Based on comment 4, I move this BZ to VERIFIED.

Tested with:
ocs-operator.v4.10.0
OCP 4.10.8