Description of problem: Alerts CephMgrIsAbsent and CephMgrIsMissingReplicas are triggered right after installation. According to monitoring it seems that MGR is missing but deployment rook-ceph-mgr-a has a pod ready. The pod is not in error state. Version-Release number of selected component (if applicable): OCS 4.7.2 OCP 4.8.5 How reproducible: 2/2 Steps to Reproduce: 1. Install OCP Managed Service 2. Install ODF Managed Service Addon 3. Navigate Monitoring-> Alerting in OCP Console 4. Uncheck Platform and Firing in filters so that all triggered alerts are shown. Actual results: Alerts CephMgrIsAbsent and CephMgrIsMissingReplicas are triggered but MGR is up: $ oc get deployments -n openshift-storage|grep mgr rook-ceph-mgr-a 1/1 1 1 108m Expected results: MGR should work correctly and there should be no alerts that MGR is missing. Additional info:
@asachan Anmol, can you check this?
Kesavan, can you update your analysis on this bug?
These alerts (CephMgrIsAbsent and CephMgrIsMissingReplicas) are raised from in cluster monitoring stack even though openshift-storage namespace is excluded for scraping (no openshift.io/cluster-monitoring: "true" label). I observed that the prometheus rules present in openshift-storage is being alerted by the in-cluster alertmanager.Ideally, it should not as these Prometheus rules are mapped to the ODF-MS dedicated monitoring stack that runs in openshift-storage namespace
The Parent issue https://issues.redhat.com/browse/MON-1633 has been fixed, In order to opt out from being monitored by user workload monitoring, the addon manifest must by updated to include the label "openshift.io/user-monitoring:'false'" on openshift-storage namespace.
I have updated the manifests for dev addon with the label "openshift.io/user-monitoring:'false'" on the openshift-storage namespace and now the UWM namespace doesn't monitor the openshift-storage namespace.
With Fresh deployment I confirmed the value of "openshift.io/user-monitoring: "false"" ========Command o/p Below ================================================= $ oc get namespace openshift-storage -o yaml apiVersion: v1 kind: Namespace metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{"openshift.io/node-selector":""},"labels":{"hive.openshift.io/managed":"true","managed.openshift.io/storage-pv-quota-exempt":"true","odf-managed-service":"true","openshift.io/user-monitoring":"false"},"name":"openshift-storage"}} openshift.io/node-selector: "" openshift.io/sa.scc.mcs: s0:c30,c15 openshift.io/sa.scc.supplemental-groups: 1000900000/10000 openshift.io/sa.scc.uid-range: 1000900000/10000 creationTimestamp: "2021-12-31T10:09:20Z" labels: hive.openshift.io/managed: "true" kubernetes.io/metadata.name: openshift-storage managed.openshift.io/storage-pv-quota-exempt: "true" odf-managed-service: "true" olm.operatorgroup.uid/45b1eebb-1aa4-4d20-acb2-fa5345571157: "" openshift.io/user-monitoring: "false" name: openshift-storage resourceVersion: "99996" uid: dd25c058-d2be-41b9-8908-2208d63e2e06 spec: finalizers: - kubernetes status: phase: Active ========Command o/p End =================================================
Based on Comment 10 and comment 11 , moved this BZ to verified status