Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2238665

Summary: mds: blocklist clients with "bloated" session metadata
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Venky Shankar <vshankar>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Hemanth Kumar <hyelloji>
Severity: high Docs Contact: Ranjini M N <rmandyam>
Priority: unspecified    
Version: 5.3CC: ceph-eng-bugs, cephqe-warriors, hyelloji, mcaldeir, rmandyam, tserlin, vdas, vereddy
Target Milestone: ---   
Target Release: 5.3z6   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-16.2.10-220.el8cp Doc Type: Bug Fix
Doc Text:
.Blocklist and evict client for large session metadata Previously, large client metadata buildup in the MDS would sometimes cause the MDS to switch to read-only mode. With this fix, the client that is causing the buildup is blocklisted and evicted, allowing the MDS to work as expected.
Story Points: ---
Clone Of: 2238663 Environment:
Last Closed: 2024-02-08 16:55:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2238663, 2238666    
Bug Blocks: 2258797    

Description Venky Shankar 2023-09-13 04:02:37 UTC
+++ This bug was initially created as a clone of Bug #2238663 +++

If the session's "completed_requests" vector gets too large, the session can get to a size where the MDS goes read-only because the OSD rejects sessionmap object updates with "Message size too long".

2023-07-10 13:53:30.529 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.529 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.530 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.534 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.534 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.534 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:35.635 7f8fe687e700 -1 mds.0.2679609 unhandled write error (90) Message too long, force readonly...
2023-07-10 13:53:35.635 7f8fe687e700  1 mds.0.cache force file system read-only
2023-07-10 13:53:35.635 7f8fe687e700  0 log_channel(cluster) log [WRN] : force file system read-only

If a session exceeds some configurable encoded size (maybe 16MB), then evict it.

Note for QE: steps to reproduce can be followed by the test case here: https://github.com/ceph/ceph/pull/52944/commits/84df4b3d0c9e767a74cf5af80e8138239992df2c#diff-1da45c7534a9accb30d17e5abf05f55ca5cc0df3a7fe826049c0fe23154a7d63R225

--- Additional comment from RHEL Program Management on 2023-09-13 04:01:20 UTC ---

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 1 RHEL Program Management 2023-09-13 04:02:48 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 9 errata-xmlrpc 2024-02-08 16:55:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 Security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:0745