Bug 1998056

Summary: Alerts CephMgrIsAbsent and CephMgrIsMissingReplicas are triggered right after installation
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Filip Balák <fbalak>
Component: odf-managed-serviceAssignee: Dhruv Bindra <dbindra>
Status: CLOSED CURRENTRELEASE QA Contact: suchita <sgatfane>
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: aeyal, ebondare, ocs-bugs, omitrani, owasserm, rperiyas, sabose, sgatfane
Target Milestone: ---Keywords: AutomationBackLog, Tracking
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-22 15:56:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Filip Balák 2021-08-26 11:17:08 UTC
Description of problem:
Alerts CephMgrIsAbsent and CephMgrIsMissingReplicas are triggered right after installation. According to monitoring it seems that MGR is missing but deployment rook-ceph-mgr-a has a pod ready. The pod is not in error state.

Version-Release number of selected component (if applicable):
OCS 4.7.2
OCP 4.8.5

How reproducible:
2/2

Steps to Reproduce:
1. Install OCP Managed Service
2. Install ODF Managed Service Addon
3. Navigate Monitoring-> Alerting in OCP Console
4. Uncheck Platform and Firing in filters so that all triggered alerts are shown.

Actual results:
Alerts CephMgrIsAbsent and CephMgrIsMissingReplicas are triggered but MGR is up:

$ oc get deployments -n openshift-storage|grep mgr
rook-ceph-mgr-a                                     1/1     1            1           108m


Expected results:
MGR should work correctly and there should be no alerts that MGR is missing.

Additional info:

Comment 1 Sahina Bose 2021-09-07 07:19:30 UTC
@asachan Anmol, can you check this?

Comment 2 Sahina Bose 2021-09-07 10:29:24 UTC
Kesavan, can you update your analysis on this bug?

Comment 3 Kesavan 2021-09-09 06:30:24 UTC
These alerts (CephMgrIsAbsent and CephMgrIsMissingReplicas) are raised from in cluster monitoring stack even though openshift-storage namespace is excluded for scraping (no openshift.io/cluster-monitoring: "true" label). 
I observed that the prometheus rules present in openshift-storage is being alerted by the in-cluster alertmanager.Ideally, it should not as these Prometheus rules are mapped to the ODF-MS dedicated monitoring stack that runs in openshift-storage namespace

Comment 8 Kesavan 2021-12-15 13:16:24 UTC
The Parent issue https://issues.redhat.com/browse/MON-1633 has been fixed,
In order to opt out from being monitored by user workload monitoring, the addon manifest must by updated to include the label "openshift.io/user-monitoring:'false'" on openshift-storage namespace.

Comment 9 Dhruv Bindra 2021-12-21 14:31:02 UTC
I have updated the manifests for dev addon with the label "openshift.io/user-monitoring:'false'" on the openshift-storage namespace and now the UWM namespace doesn't monitor the openshift-storage namespace.

Comment 10 suchita 2021-12-31 12:25:49 UTC
With Fresh deployment I confirmed the value of  "openshift.io/user-monitoring: "false""


========Command o/p Below =================================================

 $ oc get namespace openshift-storage -o yaml
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{"openshift.io/node-selector":""},"labels":{"hive.openshift.io/managed":"true","managed.openshift.io/storage-pv-quota-exempt":"true","odf-managed-service":"true","openshift.io/user-monitoring":"false"},"name":"openshift-storage"}}
    openshift.io/node-selector: ""
    openshift.io/sa.scc.mcs: s0:c30,c15
    openshift.io/sa.scc.supplemental-groups: 1000900000/10000
    openshift.io/sa.scc.uid-range: 1000900000/10000
  creationTimestamp: "2021-12-31T10:09:20Z"
  labels:
    hive.openshift.io/managed: "true"
    kubernetes.io/metadata.name: openshift-storage
    managed.openshift.io/storage-pv-quota-exempt: "true"
    odf-managed-service: "true"
    olm.operatorgroup.uid/45b1eebb-1aa4-4d20-acb2-fa5345571157: ""
    openshift.io/user-monitoring: "false"
  name: openshift-storage
  resourceVersion: "99996"
  uid: dd25c058-d2be-41b9-8908-2208d63e2e06
spec:
  finalizers:
  - kubernetes
status:
  phase: Active
========Command o/p End =================================================

Comment 12 suchita 2021-12-31 18:14:41 UTC
Based on Comment 10 and comment 11 , moved this BZ to verified status