Bug 2238666 - mds: blocklist clients with "bloated" session metadata
Summary: mds: blocklist clients with "bloated" session metadata
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 5.3
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: 6.1z3
Assignee: Venky Shankar
QA Contact: sumr
Disha Walvekar
URL:
Whiteboard:
Depends On: 2238663
Blocks: 2238665 2247624
TreeView+ depends on / blocked
 
Reported: 2023-09-13 04:04 UTC by Venky Shankar
Modified: 2024-02-16 19:30 UTC (History)
8 users (show)

Fixed In Version: ceph-17.2.6-155.el9cp
Doc Type: Bug Fix
Doc Text:
.Blocklist and evict client for large session metadata Previously, large client metadata buildup in the MDS would sometimes cause the MDS to switch to read-only mode. With this fix, the client that is causing the buildup is blocklisted and evicted, allowing the MDS to work as expected.
Clone Of: 2238663
Environment:
Last Closed: 2023-12-12 13:55:50 UTC
Embargoed:
dwalveka: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 61947 0 None None None 2023-09-13 04:04:04 UTC
Red Hat Issue Tracker RHCEPH-7433 0 None None None 2023-09-13 04:05:49 UTC
Red Hat Knowledge Base (Solution) 7056548 0 None None None 2024-02-16 19:30:05 UTC
Red Hat Product Errata RHSA-2023:7740 0 None None None 2023-12-12 13:55:57 UTC

Description Venky Shankar 2023-09-13 04:04:04 UTC
+++ This bug was initially created as a clone of Bug #2238663 +++

If the session's "completed_requests" vector gets too large, the session can get to a size where the MDS goes read-only because the OSD rejects sessionmap object updates with "Message size too long".

2023-07-10 13:53:30.529 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.529 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.530 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.534 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.534 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:30.534 7f8fed08b700  0 log_channel(cluster) log [WRN] : client.744507717 does not advance its oldest_client_tid (3221389957), 5905929 completed requests recorded in session
2023-07-10 13:53:35.635 7f8fe687e700 -1 mds.0.2679609 unhandled write error (90) Message too long, force readonly...
2023-07-10 13:53:35.635 7f8fe687e700  1 mds.0.cache force file system read-only
2023-07-10 13:53:35.635 7f8fe687e700  0 log_channel(cluster) log [WRN] : force file system read-only

If a session exceeds some configurable encoded size (maybe 16MB), then evict it.

Note for QE: steps to reproduce can be followed by the test case here: https://github.com/ceph/ceph/pull/52944/commits/84df4b3d0c9e767a74cf5af80e8138239992df2c#diff-1da45c7534a9accb30d17e5abf05f55ca5cc0df3a7fe826049c0fe23154a7d63R225

--- Additional comment from RHEL Program Management on 2023-09-13 04:01:20 UTC ---

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 1 RHEL Program Management 2023-09-13 04:04:20 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 13 Venky Shankar 2023-11-28 10:21:59 UTC
Doc text updated. PTAL.

Comment 15 errata-xmlrpc 2023-12-12 13:55:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7740


Note You need to log in before you can comment on or make changes to this bug.