Bug 2256637 - [Tracker: Bug #2256731] change priority of mds rss perf counter to useful
Summary: [Tracker: Bug #2256731] change priority of mds rss perf counter to useful
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.15
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ODF 4.15.0
Assignee: Venky Shankar
QA Contact: Nagendra Reddy
URL:
Whiteboard:
Depends On: 2256731
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-01-03 14:58 UTC by Santosh Pillai
Modified: 2024-03-19 15:30 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-03-19 15:30:33 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2024:1383 0 None None None 2024-03-19 15:30:34 UTC

Description Santosh Pillai 2024-01-03 14:58:25 UTC
This bug was initially created as a copy of Bug #2256560

I am copying this bug because: 



Description of problem:
mds_cache_mem_rss perf counter can be a useful metric to detect the usage of mds cache and detect mds cache oversize before it happens. Requesting to change the priority of the mds_rss metric from Debug to useful



Version-Release number of selected component (if applicable):
NA

How reproducible:
NA

Steps to Reproduce:
1.
2.
3.

Actual results:
NA

Expected results:
NA

Additional info:

Comment 2 Nagendra Reddy 2024-01-10 10:59:26 UTC
 

The below workaround has been given by development team for the same:

1. Bring down the rook-ceph-operator.
2. Edit Ceph exporter deployment yaml.
3. Reduce Prio Limit from 5 to 0.
4. Ceph exporter pods will be restarted with the updated values and the metrics will be generated.

The above work around is not recommended in customer environments as it needs to bring down the rook-ceph-operator.

So far, we have used the above workaround to test the alert feature in the happy path, but not a recommended one for further testing. Mainly the negative scenarios and resiliency testing will have an impact due to this.

Comment 5 Nagendra Reddy 2024-01-25 15:22:36 UTC
Hi Santhosh/Mudit,

Tracker BZ-2256731 fixed in "ceph-17.2.6-193.el9cp" 

We verified with ceph version ceph-17.2.6-194.el9cp. Prio Limit in Ceph exporter deployment yaml is still '5' and it should be '0'. 

Please refer attached screenshots for reference.

Please let us know it is fixed in the same build or different.

Comment 6 Santosh Pillai 2024-01-29 03:49:09 UTC
(In reply to Nagendra Reddy from comment #5)
> Hi Santhosh/Mudit,
> 
> Tracker BZ-2256731 fixed in "ceph-17.2.6-193.el9cp" 
> 
> We verified with ceph version ceph-17.2.6-194.el9cp. Prio Limit in Ceph
> exporter deployment yaml is still '5' and it should be '0'. 

This is not the expectation here. Prio limit in Roo ceph exporter deployment will remain the same, that is, 5. 
The change was done in ceph side to increase the priority limit of ceph_mds_mem_rss metric so that it can be emitted even with Priority limit 5. 

So it should be tested that `ceph_mds_mem_rss` metric is visible even with Prio-limit 5. 

> 
> Please refer attached screenshots for reference.
> 
> Please let us know it is fixed in the same build or different.

Comment 9 Nagendra Reddy 2024-01-29 09:41:15 UTC
Verified with below kit. Bug is fixed. I can see the ceph_mds_mem_rss metrics in dashboard with Priority limit 5. Please find the attached screenshot for reference.

ODF 4.15.0-126.stable

4.15.0-0.nightly-2024-01-25-051548

Comment 13 errata-xmlrpc 2024-03-19 15:30:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383


Note You need to log in before you can comment on or make changes to this bug.