Bug 1998056 - Alerts CephMgrIsAbsent and CephMgrIsMissingReplicas are triggered right after installation
Summary: Alerts CephMgrIsAbsent and CephMgrIsMissingReplicas are triggered right after...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Dhruv Bindra
QA Contact: suchita
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-26 11:17 UTC by Filip Balák
Modified: 2022-05-26 08:04 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-22 15:56:36 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker MON-1633 0 None None None 2021-12-15 12:26:52 UTC

Internal Links: 2004478

Description Filip Balák 2021-08-26 11:17:08 UTC
Description of problem:
Alerts CephMgrIsAbsent and CephMgrIsMissingReplicas are triggered right after installation. According to monitoring it seems that MGR is missing but deployment rook-ceph-mgr-a has a pod ready. The pod is not in error state.

Version-Release number of selected component (if applicable):
OCS 4.7.2
OCP 4.8.5

How reproducible:
2/2

Steps to Reproduce:
1. Install OCP Managed Service
2. Install ODF Managed Service Addon
3. Navigate Monitoring-> Alerting in OCP Console
4. Uncheck Platform and Firing in filters so that all triggered alerts are shown.

Actual results:
Alerts CephMgrIsAbsent and CephMgrIsMissingReplicas are triggered but MGR is up:

$ oc get deployments -n openshift-storage|grep mgr
rook-ceph-mgr-a                                     1/1     1            1           108m


Expected results:
MGR should work correctly and there should be no alerts that MGR is missing.

Additional info:

Comment 1 Sahina Bose 2021-09-07 07:19:30 UTC
@asachan Anmol, can you check this?

Comment 2 Sahina Bose 2021-09-07 10:29:24 UTC
Kesavan, can you update your analysis on this bug?

Comment 3 Kesavan 2021-09-09 06:30:24 UTC
These alerts (CephMgrIsAbsent and CephMgrIsMissingReplicas) are raised from in cluster monitoring stack even though openshift-storage namespace is excluded for scraping (no openshift.io/cluster-monitoring: "true" label). 
I observed that the prometheus rules present in openshift-storage is being alerted by the in-cluster alertmanager.Ideally, it should not as these Prometheus rules are mapped to the ODF-MS dedicated monitoring stack that runs in openshift-storage namespace

Comment 8 Kesavan 2021-12-15 13:16:24 UTC
The Parent issue https://issues.redhat.com/browse/MON-1633 has been fixed,
In order to opt out from being monitored by user workload monitoring, the addon manifest must by updated to include the label "openshift.io/user-monitoring:'false'" on openshift-storage namespace.

Comment 9 Dhruv Bindra 2021-12-21 14:31:02 UTC
I have updated the manifests for dev addon with the label "openshift.io/user-monitoring:'false'" on the openshift-storage namespace and now the UWM namespace doesn't monitor the openshift-storage namespace.

Comment 10 suchita 2021-12-31 12:25:49 UTC
With Fresh deployment I confirmed the value of  "openshift.io/user-monitoring: "false""


========Command o/p Below =================================================

 $ oc get namespace openshift-storage -o yaml
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Namespace","metadata":{"annotations":{"openshift.io/node-selector":""},"labels":{"hive.openshift.io/managed":"true","managed.openshift.io/storage-pv-quota-exempt":"true","odf-managed-service":"true","openshift.io/user-monitoring":"false"},"name":"openshift-storage"}}
    openshift.io/node-selector: ""
    openshift.io/sa.scc.mcs: s0:c30,c15
    openshift.io/sa.scc.supplemental-groups: 1000900000/10000
    openshift.io/sa.scc.uid-range: 1000900000/10000
  creationTimestamp: "2021-12-31T10:09:20Z"
  labels:
    hive.openshift.io/managed: "true"
    kubernetes.io/metadata.name: openshift-storage
    managed.openshift.io/storage-pv-quota-exempt: "true"
    odf-managed-service: "true"
    olm.operatorgroup.uid/45b1eebb-1aa4-4d20-acb2-fa5345571157: ""
    openshift.io/user-monitoring: "false"
  name: openshift-storage
  resourceVersion: "99996"
  uid: dd25c058-d2be-41b9-8908-2208d63e2e06
spec:
  finalizers:
  - kubernetes
status:
  phase: Active
========Command o/p End =================================================

Comment 12 suchita 2021-12-31 18:14:41 UTC
Based on Comment 10 and comment 11 , moved this BZ to verified status


Note You need to log in before you can comment on or make changes to this bug.