Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2260300

Summary:	[GSS][Doc] How to handle MDSs behind on trimming errors
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Akash Raj <akraj>
Component:	Documentation	Assignee:	Akash Raj <akraj>
Documentation sub component:	File System Guide	QA Contact:	Hemanth Kumar <hyelloji>
Status:	CLOSED CURRENTRELEASE	Docs Contact:	Disha Walvekar <dwalveka>
Severity:	medium
Priority:	unspecified	CC:	akraj, amk, dwalveka, kjosy, rmandyam, rpollack, tpetr, vshankar
Version:	5.0	Keywords:	NoDocsQEReview
Target Milestone:	---
Target Release:	Backlog
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	2133786	Environment:
Last Closed:	2024-01-25 12:29:15 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2133786
Bug Blocks:

Description Akash Raj 2024-01-25 06:01:46 UTC

+++ This bug was initially created as a clone of Bug #2133786 +++

* Describe the issue:

Recently with the increased number of OCS/ODF users, we are seeing a high number of situations where MDS complaints that it is behind on trimming.
What should be the best way to handle this situation?
We have some KCS [1] (unverified) which says to increase mds_log_max_segments value from 128 to 256. Is this a verified solution applicable to all clusters including ODF? Or should this value be determined?
What are the caveats if any that we should be aware of if we are increasing this value?



[1] https://access.redhat.com/solutions/6639511

* Document URL:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/file_system_guide/health-messages-for-the-ceph-file-system_fs


Product Version:
RHCS 4 and 5

--- Additional comment from Red Hat Bugzilla on 2023-01-01 08:28:27 UTC ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 08:48:58 UTC ---

Account disabled by LDAP Audit

--- Additional comment from Tomas Petr on 2023-01-09 15:13:53 UTC ---

Hi,
Should this be looked at by cephfs engineering to provide inputs for doc team?
We can see a good amount of cases for "MDSs behind on trimming", I am not sure that always "increase mds_log_max_segments value from 128 to 256" is the ultimate and only solution.
Would be great to have confirmation on suggestion on that topic.

adding Venky Shankar  from cephfs team, or maybe he can advice who is the person to ask.

--- Additional comment from Venky Shankar on 2023-01-11 09:30:49 UTC ---

(In reply to Tomas Petr from comment #3)
> Hi,
> Should this be looked at by cephfs engineering to provide inputs for doc
> team?
> We can see a good amount of cases for "MDSs behind on trimming", I am not
> sure that always "increase mds_log_max_segments value from 128 to 256" is
> the ultimate and only solution.
> Would be great to have confirmation on suggestion on that topic.
> 
> adding Venky Shankar  from cephfs team, or maybe he can advice who is the
> person to ask.

Increasing mds_log_max_segments is the recommended solution for the trim warning. However, this config should be reset back to its default when the cluster health recovers and the trim warning is seen no more. As far as what value to set it for - In most situations, setting mds_log_max_segments to 256 is preferable and should allow the MDS to catch up with trimming. If the warnings shows up frequently, I think its a good idea to get engineering involved for further debug and suggestions.

Does this sound helpful?

--- Additional comment from Akash Raj on 2024-01-16 10:01:07 UTC ---

Hi Karun.

In the mentioned doc - https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/file_system_guide/health-messages-for-the-ceph-file-system_fs, I see different "CODES" mentioned with explanation under "Behind trimming...".

Could you please confirm to which CODE are we referring to in this BZ issue?

Thanks.

--- Additional comment from Karun Josy on 2024-01-24 08:19:04 UTC ---

Hello Akash,

Sorry for the delay.

There is only 1 CODE mentioned under "Behind on trimming", the rest of the codes are for separate issues.

----------
"Behind on trimming…"
Code: MDS_HEALTH_TRIM


"Client <name> failing to respond to capability release"
Code: MDS_HEALTH_CLIENT_LATE_RELEASE, MDS_HEALTH_CLIENT_LATE_RELEASE_MANY

  
"Client <name> failing to respond to cache pressure"
 Code: MDS_HEALTH_CLIENT_RECALL, MDS_HEALTH_CLIENT_RECALL_MANY 
----------


I hope this helps.

Comment 1 Akash Raj 2024-01-25 06:03:55 UTC

*** Bug 2133786 has been marked as a duplicate of this bug. ***