Description of problem: The alerts that use kube_* and node_* are not working because the label which k8s-metrics-service-monitor was using to select the service in openshift-monitoring namespace was removed. The label that service monitor was using is "prometheus": "k8s" But this label was removed in recent OCP upgrades so serviceMonitor can't select the service. Version-Release number of selected component (if applicable): OCP v4.10.z ODF v4.10.0 ocs-osd-deployer v2.0.0 How reproducible: Install the ODF addon and check for the alert which uses kube_* and node_* metrics such as PersistentVolumeUsageNearFull, PersistentVolumeUsageCritical and CephMgrIsMissingReplicas Steps to Reproduce: 1. Install the ODF addon 2. check for the alert which uses kube_* and node_* metrics such as PersistentVolumeUsageNearFull, PersistentVolumeUsageCritical and CephMgrIsMissingReplicas 3. the alerts won't fire or the metrics(kube_* and node_*) can't be fetched on prometheus UI Actual results: the alerts won't fire or the metrics(kube_* and node_*) can't be fetched on prometheus UI Expected results: alerts should fire and metrics(kube_* and node_*) should be fetched on prometheus UI Additional info:
PersistentVolumeUsageNearFull and PersistentVolumeUsageCritical alerts are working. Based on comment 4, I move this BZ to VERIFIED. Tested with: ocs-operator.v4.10.0 OCP 4.10.8