This bug was initially created as a copy of Bug #2101497 I am copying this bug because: Description of problem (please be detailed as possible and provide log snippests): ceph_mon_metada metrics are not collected properly/correctly. This was noticed when alert, CephMonVersionMismatch was not fired properly when one of the mon's image was changed. Here we can see that 'ceph versions' show that one of the mon's version is not the same as the other 2. ``` sh-4.4$ ceph mon versions { "ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable)": 2, "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 1 } ``` But for the `ceph_mon_metadata` query below, ``` count by (ceph_daemon, namespace, ceph_version) (ceph_mon_metadata{job="rook-ceph-mgr", ceph_version != ""}) ``` > mon.a ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable) openshift-storage 1 > mon.b ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable) openshift-storage 1 > mon.c ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable) openshift-storage 1 we could see that all the mons are in the same ceph version (when using 'ceph_mon_metadata' query). Another misreporting is noticed while we change an image of an OSD, then 'ceph_mon_metadata' is showing multiple mon versions (even though we haven't touched mon images and 'ceph versions' shows clearly all the mon versions are same). This is depicted in BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1786696 Version of all relevant components (if applicable): OCP : 4.11.0-0.nightly-2022-06-15-222801 ODF : 4.10.4-2 Ceph: 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Not sure Steps to Reproduce: 1. Created an AWS Openshift cluster version: 4.11 stable 2. Installed (through operator-hub) ODF operator (default which is in the hub) 3. Created a storagecluster with default configs 4. Through command line, changed one of the mon image to an old one ``` oc set -n openshift-storage image deployment/rook-ceph-mon-a mon=quay.io/rhceph-dev/rhceph@sha256:e909b345d88459d49b691b7d484f604653fcba53b37bbc00e86fb09b26ed5205 ``` 5. Once that is complete, checked through OCP-Console-UI->Observer->Metrics and checked the `ceph-mon-metadata` query Actual results: ceph_mon_metadata query gives a false result, stating all the mons are in the same version Expected results: ceph_mon_metadata should provide exact version information including that of changed mon Additional info:
The BZ##2008524 has been fixed in RHCS 5.3z1 (ceph-16.2.10-138) release (refer errata https://access.redhat.com/errata/RHSA-2023:0980 for more details). Moving this BZ to 4.12 release.
QE efforts here is regression only and the RHCS 5.3z1 was already shipped in 4.12.1. so, closing the bug as closed current release.