Bug 2181119 - ceph_mon_metadata metrics are not collected properly
Summary: ceph_mon_metadata metrics are not collected properly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Neha Ojha
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-23 06:51 UTC by Sunil Kumar Acharya
Modified: 2023-08-09 16:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-04-13 11:11:30 UTC
Embargoed:


Attachments (Terms of Use)

Description Sunil Kumar Acharya 2023-03-23 06:51:10 UTC
This bug was initially created as a copy of Bug #2101497

I am copying this bug because: 



Description of problem (please be detailed as possible and provide log
snippests):
ceph_mon_metada metrics are not collected properly/correctly. This was noticed when alert, CephMonVersionMismatch was not fired properly when one of the mon's image was changed.

Here we can see that 'ceph versions' show that one of the mon's version is not the same as the other 2.
```
sh-4.4$ ceph mon versions
{
    "ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable)": 2,
    "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 1
}
```

But for the `ceph_mon_metadata` query below,

```
count by (ceph_daemon, namespace, ceph_version) (ceph_mon_metadata{job="rook-ceph-mgr", ceph_version != ""})
```

> mon.a	ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable)	openshift-storage	1
> mon.b	ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable)	openshift-storage	1
> mon.c	ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable)	openshift-storage	1

we could see that all the mons are in the same ceph version (when using 'ceph_mon_metadata' query).

Another misreporting is noticed while we change an image of an OSD, then 'ceph_mon_metadata' is showing multiple mon versions (even though we haven't touched mon images and 'ceph versions' shows clearly all the mon versions are same). This is depicted in BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1786696

Version of all relevant components (if applicable):
OCP :     4.11.0-0.nightly-2022-06-15-222801
ODF :     4.10.4-2
Ceph:     16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
Not sure

Steps to Reproduce:
1. Created an AWS Openshift cluster version: 4.11 stable
2. Installed (through operator-hub) ODF operator (default which is in the hub)
3. Created a storagecluster with default configs
4. Through command line, changed one of the mon image to an old one
```
oc set -n openshift-storage image deployment/rook-ceph-mon-a mon=quay.io/rhceph-dev/rhceph@sha256:e909b345d88459d49b691b7d484f604653fcba53b37bbc00e86fb09b26ed5205
```
5. Once that is complete, checked through OCP-Console-UI->Observer->Metrics and checked the `ceph-mon-metadata` query

Actual results:
ceph_mon_metadata query gives a false result, stating all the mons are in the same version

Expected results:
ceph_mon_metadata should provide exact version information including that of changed mon

Additional info:

Comment 1 Prashant Dhange 2023-03-23 21:02:40 UTC
The BZ##2008524 has been fixed in RHCS 5.3z1 (ceph-16.2.10-138) release (refer errata https://access.redhat.com/errata/RHSA-2023:0980 for more details). Moving this BZ to 4.12 release.

Comment 3 krishnaram Karthick 2023-04-13 11:11:30 UTC
QE efforts here is regression only and the RHCS 5.3z1 was already shipped in 4.12.1. 
so, closing the bug as closed current release.


Note You need to log in before you can comment on or make changes to this bug.