1940312 – [OCS 4.8]Persistent Storage Dashboard throws Alert - Ceph Manager has disappeared from Prometheus target discovery and Object Service dashboard has Unknown Data Resiliency

Bug 1940312 - [OCS 4.8]Persistent Storage Dashboard throws Alert - Ceph Manager has disappeared from Prometheus target discovery and Object Service dashboard has Unknown Data Resiliency

Summary: [OCS 4.8]Persistent Storage Dashboard throws Alert - Ceph Manager has disappe...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	OCS 4.8.0
Assignee:	Travis Nielsen
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-18 07:05 UTC by Neha Berry
Modified:	2021-08-03 18:15 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-03 18:15:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift rook pull 197	None	closed	Sync the latest master to release-4.8	2021-03-21 14:27:30 UTC
Github	rook rook pull 7440	None	open	ceph: Set active mgr label on mgr services	2021-03-18 22:38:59 UTC
Red Hat Product Errata	RHBA-2021:3003	None	None	None	2021-08-03 18:15:46 UTC

Description Neha Berry 2021-03-18 07:05:41 UTC

Description of problem:
==================================
Installed OCS in Internal Attached Mode in OCS 4.8 on VMware and post deployment , following Alert is seen in the Persistent Storage Dashboard

>> Ceph Manager has disappeared from Prometheus target discovery.

Not sure of the impact of this as the ceph MGR is up in ceph status.

POD
======
rook-ceph-mgr-a-6f79896dbf-qxpvx                                  2/2     Running     0          12m   10.131.0.33    compute-0   <none>           <none>




Version-Release number of selected component (if applicable):
================================================================
OCP = 4.8.0-0.nightly-2021-03-18-000857
OCS = ocs-operator.v4.8.0-303.ci and ocs-operator.v4.8.0-302.ci

"mgr": {
        "ceph version 14.2.11-133.el8cp (b35842cdf727a690afe60d0a32cdbca7da7171c8) nautilus (stable)": 1
    },


How reproducible:
====================
Always

Steps to Reproduce:
========================
1. Install OCP 4.8 nightly
2. Install OCS 4.8 latest In Internal Attached mode (need to confirm if similar issue is seen in dynamic mode too)

3. Once OCS is installed, check the Overview-> Persistent Storage Dashboard

Actual results:
==================
Following Alert is seen in the Status page

Mar 18, 12:16 pm
Ceph Manager has disappeared from Prometheus target discovery.


Expected results:
=====================
No Alert should be seen.



Additional info:
=======================
ceph status
---------------

=====ceph status ====
Thu Mar 18 07:00:31 UTC 2021
  cluster:
    id:     bab7a0f7-41bb-4de0-8ff6-526f4ce8b58f
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 14m)
    mgr: a(active, since 14m)
    mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
    osd: 3 osds: 3 up (since 14m), 3 in (since 14m)
    rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a)
 


Namespace labelling

  labels:
    olm.operatorgroup.uid/9466e021-e5c1-4452-b9f4-0a734f5bf99b: ""
    olm.operatorgroup.uid/79202b46-7af8-4645-a0df-36daf35ce36e: ""
    openshift.io/cluster-monitoring: "true"

Comment 6 Travis Nielsen 2021-03-18 14:40:39 UTC

This looks related to the Rook change to support multiple mgrs... If there is only a single mgr, for consistency the label with the mgr name also needs to be added.

Comment 7 Travis Nielsen 2021-03-18 22:39:01 UTC

Now adding the active mgr name to the labels as required by the service monitor...
https://github.com/rook/rook/pull/7440

Comment 10 Travis Nielsen 2021-03-21 14:27:33 UTC

Will be picked up with the next 4.8 build since the sync from rook master:
https://github.com/openshift/rook/pull/197

Comment 15 errata-xmlrpc 2021-08-03 18:15:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3003

Note You need to log in before you can comment on or make changes to this bug.