Bug 1940312 - [OCS 4.8]Persistent Storage Dashboard throws Alert - Ceph Manager has disappeared from Prometheus target discovery and Object Service dashboard has Unknown Data Resiliency
Summary: [OCS 4.8]Persistent Storage Dashboard throws Alert - Ceph Manager has disappe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: OCS 4.8.0
Assignee: Travis Nielsen
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-18 07:05 UTC by Neha Berry
Modified: 2021-08-03 18:15 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-03 18:15:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift rook pull 197 0 None closed Sync the latest master to release-4.8 2021-03-21 14:27:30 UTC
Github rook rook pull 7440 0 None open ceph: Set active mgr label on mgr services 2021-03-18 22:38:59 UTC
Red Hat Product Errata RHBA-2021:3003 0 None None None 2021-08-03 18:15:46 UTC

Description Neha Berry 2021-03-18 07:05:41 UTC
Description of problem:
==================================
Installed OCS in Internal Attached Mode in OCS 4.8 on VMware and post deployment , following Alert is seen in the Persistent Storage Dashboard

>> Ceph Manager has disappeared from Prometheus target discovery.

Not sure of the impact of this as the ceph MGR is up in ceph status.

POD
======
rook-ceph-mgr-a-6f79896dbf-qxpvx                                  2/2     Running     0          12m   10.131.0.33    compute-0   <none>           <none>




Version-Release number of selected component (if applicable):
================================================================
OCP = 4.8.0-0.nightly-2021-03-18-000857
OCS = ocs-operator.v4.8.0-303.ci and ocs-operator.v4.8.0-302.ci

"mgr": {
        "ceph version 14.2.11-133.el8cp (b35842cdf727a690afe60d0a32cdbca7da7171c8) nautilus (stable)": 1
    },


How reproducible:
====================
Always

Steps to Reproduce:
========================
1. Install OCP 4.8 nightly
2. Install OCS 4.8 latest In Internal Attached mode (need to confirm if similar issue is seen in dynamic mode too)

3. Once OCS is installed, check the Overview-> Persistent Storage Dashboard

Actual results:
==================
Following Alert is seen in the Status page

Mar 18, 12:16 pm
Ceph Manager has disappeared from Prometheus target discovery.


Expected results:
=====================
No Alert should be seen.



Additional info:
=======================
ceph status
---------------

=====ceph status ====
Thu Mar 18 07:00:31 UTC 2021
  cluster:
    id:     bab7a0f7-41bb-4de0-8ff6-526f4ce8b58f
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 14m)
    mgr: a(active, since 14m)
    mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
    osd: 3 osds: 3 up (since 14m), 3 in (since 14m)
    rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a)
 


Namespace labelling

  labels:
    olm.operatorgroup.uid/9466e021-e5c1-4452-b9f4-0a734f5bf99b: ""
    olm.operatorgroup.uid/79202b46-7af8-4645-a0df-36daf35ce36e: ""
    openshift.io/cluster-monitoring: "true"

Comment 6 Travis Nielsen 2021-03-18 14:40:39 UTC
This looks related to the Rook change to support multiple mgrs... If there is only a single mgr, for consistency the label with the mgr name also needs to be added.

Comment 7 Travis Nielsen 2021-03-18 22:39:01 UTC
Now adding the active mgr name to the labels as required by the service monitor...
https://github.com/rook/rook/pull/7440

Comment 10 Travis Nielsen 2021-03-21 14:27:33 UTC
Will be picked up with the next 4.8 build since the sync from rook master:
https://github.com/openshift/rook/pull/197

Comment 15 errata-xmlrpc 2021-08-03 18:15:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3003


Note You need to log in before you can comment on or make changes to this bug.