Bug 2238665 - mds: blocklist clients with "bloated" session metadata
Summary: mds: blocklist clients with "bloated" session metadata
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 5.3
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: 5.3z6
Assignee: Venky Shankar
QA Contact: Hemanth Kumar
Ranjini M N
URL:
Whiteboard:
Depends On: 2238663 2238666
Blocks: 2258797
TreeView+ depends on / blocked
 
Reported: 2023-09-13 04:02 UTC by Venky Shankar
Modified: 2024-02-16 19:29 UTC (History)
8 users (show)

Fixed In Version: ceph-16.2.10-220.el8cp
Doc Type: Bug Fix
Doc Text:
.Blocklist and evict client for large session metadata Previously, large client metadata buildup in the MDS would sometimes cause the MDS to switch to read-only mode. With this fix, the client that is causing the buildup is blocklisted and evicted, allowing the MDS to work as expected.
Clone Of: 2238663
Environment:
Last Closed: 2024-02-08 16:55:11 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 61947 0 None None None 2023-09-13 04:02:37 UTC
Red Hat Issue Tracker RHCEPH-7432 0 None None None 2023-09-13 04:03:48 UTC
Red Hat Knowledge Base (Solution) 7056548 0 None None None 2024-02-16 19:29:51 UTC
Red Hat Product Errata RHSA-2024:0745 0 None None None 2024-02-08 16:55:14 UTC

Description Venky Shankar 2023-09-13 04:02:37 UTC
+++ This bug was initially created as a clone of Bug #2238663 +++

If the session's "completed_requests" vector gets too large, the session can get to a size where the MDS goes read-only because the OSD rejects sessionmap object updates with "Message size too long".

2023-07-10 13:53:30.529 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.529 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.530 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.534 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.534 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.534 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:35.635 7f8fe687e700 -1 mds.0.2679609 unhandled write error (90) Message too long, force readonly...
2023-07-10 13:53:35.635 7f8fe687e700  1 mds.0.cache force file system read-only
2023-07-10 13:53:35.635 7f8fe687e700  0 log_channel(cluster) log [WRN] : force file system read-only

If a session exceeds some configurable encoded size (maybe 16MB), then evict it.

Note for QE: steps to reproduce can be followed by the test case here: https://github.com/ceph/ceph/pull/52944/commits/84df4b3d0c9e767a74cf5af80e8138239992df2c#diff-1da45c7534a9accb30d17e5abf05f55ca5cc0df3a7fe826049c0fe23154a7d63R225

--- Additional comment from RHEL Program Management on 2023-09-13 04:01:20 UTC ---

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 1 RHEL Program Management 2023-09-13 04:02:48 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 9 errata-xmlrpc 2024-02-08 16:55:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 Security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:0745


Note You need to log in before you can comment on or make changes to this bug.