Bug 2255491 - Add 'managedBy' label to rook-ceph-exporter metrics and alerts
Summary: Add 'managedBy' label to rook-ceph-exporter metrics and alerts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph-monitoring
Version: 4.14
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ODF 4.15.0
Assignee: arun kumar mohan
QA Contact: akarsha
URL:
Whiteboard:
Depends On:
Blocks: 2246375
TreeView+ depends on / blocked
 
Reported: 2023-12-21 08:58 UTC by arun kumar mohan
Modified: 2024-03-19 15:26 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
.Add 'managedBy' label to rook-ceph-exporter metrics and alerts Previously, the metrics generated by `rook-ceph-exporter` did not have the 'managedBy' label. So, it was not possible for OpenShift console user interface to identify from which StorageSystem the metrics is generated. With this fix, the `managedBy` label, which has the name of the StorageSystem as a value is added through the OCS operator to the storage cluster's `Monitoring` spec. This spec is read by the Rook operator and it relabels the ceph-exporter's ServiceMonitor endpoint labels. As a result, all the metrics generated from this exporter will have the new label `managedBy`.
Clone Of:
Environment:
Last Closed: 2024-03-19 15:26:38 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2436 0 None open Bug 2255491: [release-4.15] Add 'managedBy' label to rook-ceph-exporter metrics 2024-02-06 05:09:09 UTC
Red Hat Product Errata RHSA-2024:1383 0 None None None 2024-03-19 15:26:45 UTC

Description arun kumar mohan 2023-12-21 08:58:59 UTC
Description of problem (please be detailed as possible and provide log
snippests):
We need to add 'managedBy' label to all the metrics, generated from rook-ceph-exporter.
This label is needed for displaying results/alerts in openshift console UI.


Version of all relevant components (if applicable):
ODF 4.14 and above

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Not a show stopper impact, but this will hinter the alert information provided to customer in multi storagecluster scenario and customer won't understand from which cluster the alert has come.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:
No

Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:

Comment 3 Mudit Agarwal 2024-01-21 14:46:55 UTC
Is this must for 4.15?

Comment 4 arun kumar mohan 2024-01-22 14:34:38 UTC
Yes Mudit, it is a blocker for 4.15

Comment 7 arun kumar mohan 2024-01-30 14:18:06 UTC
PR: https://github.com/red-hat-storage/ocs-operator/pull/2433, added

Comment 12 arun kumar mohan 2024-02-14 05:04:53 UTC
The fix is to add `managedBy` label to all the metrics produced/generated by `rook-ceph-exporter` pod.

Confirm these TWO things,

a. ServiceMonitor endpoint check
`rook-ceph-exporter` ServiceMonitor should have the following 'spec' entry,

spec -> endpoints -> relabeling -> {action: replace, replacement: ocs-storagecluster, targetLabel: managedBy}


b. Check the metrics generated by `rook-ceph-exporter`
In the ODF Console interface, under 'Observe' (left side) tab, select/click-on 'Metrics' label/link.
One of the metrics generated by ceph-exporter is "ceph_mon_num_elections".
Add this metric-name to the query text-space and execute the query.
We should see 'managedBy' label in the query result, with the value of the current storagesystem name.

This should verify the fix.

Comment 14 arun kumar mohan 2024-02-20 11:49:07 UTC
Adding the doc text, please take a look.
Thanks

Comment 16 errata-xmlrpc 2024-03-19 15:26:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383


Note You need to log in before you can comment on or make changes to this bug.