Description of problem: ================================== Installed OCS in Internal Attached Mode in OCS 4.8 on VMware and post deployment , following Alert is seen in the Persistent Storage Dashboard >> Ceph Manager has disappeared from Prometheus target discovery. Not sure of the impact of this as the ceph MGR is up in ceph status. POD ====== rook-ceph-mgr-a-6f79896dbf-qxpvx 2/2 Running 0 12m 10.131.0.33 compute-0 <none> <none> Version-Release number of selected component (if applicable): ================================================================ OCP = 4.8.0-0.nightly-2021-03-18-000857 OCS = ocs-operator.v4.8.0-303.ci and ocs-operator.v4.8.0-302.ci "mgr": { "ceph version 14.2.11-133.el8cp (b35842cdf727a690afe60d0a32cdbca7da7171c8) nautilus (stable)": 1 }, How reproducible: ==================== Always Steps to Reproduce: ======================== 1. Install OCP 4.8 nightly 2. Install OCS 4.8 latest In Internal Attached mode (need to confirm if similar issue is seen in dynamic mode too) 3. Once OCS is installed, check the Overview-> Persistent Storage Dashboard Actual results: ================== Following Alert is seen in the Status page Mar 18, 12:16 pm Ceph Manager has disappeared from Prometheus target discovery. Expected results: ===================== No Alert should be seen. Additional info: ======================= ceph status --------------- =====ceph status ==== Thu Mar 18 07:00:31 UTC 2021 cluster: id: bab7a0f7-41bb-4de0-8ff6-526f4ce8b58f health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 14m) mgr: a(active, since 14m) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 14m), 3 in (since 14m) rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a) Namespace labelling labels: olm.operatorgroup.uid/9466e021-e5c1-4452-b9f4-0a734f5bf99b: "" olm.operatorgroup.uid/79202b46-7af8-4645-a0df-36daf35ce36e: "" openshift.io/cluster-monitoring: "true"
This looks related to the Rook change to support multiple mgrs... If there is only a single mgr, for consistency the label with the mgr name also needs to be added.
Now adding the active mgr name to the labels as required by the service monitor... https://github.com/rook/rook/pull/7440
Will be picked up with the next 4.8 build since the sync from rook master: https://github.com/openshift/rook/pull/197
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3003