Bug 2269664

Summary: mds: disable defer_client_eviction_on_laggy_osds by default
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Venky Shankar <vshankar>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Amarnath <amk>
Severity: high Docs Contact: Akash Raj <akraj>
Priority: unspecified    
Version: 7.0CC: akraj, ceph-eng-bugs, cephqe-warriors, hyelloji, mcaldeir, tserlin, vereddy
Target Milestone: ---   
Target Release: 7.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-18.2.1-74.el9cp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 2269663 Environment:
Last Closed: 2024-06-13 14:29:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2267614, 2298578, 2298579    

Description Venky Shankar 2024-03-15 07:31:49 UTC
+++ This bug was initially created as a clone of Bug #2269663 +++

This config can result in a single client holding up mds to service other clients since once a client is deferred from eviction due to laggy OSD, a new clients cap acquire request can be possibly blocked until the other laggy client resumes operation, i.e., when the laggy OSD is considered non-laggy anymore.

Comment 5 Amarnath 2024-04-01 11:58:48 UTC
Hi All,

[ceph: root@mero017 /]#  ceph config get mds defer_client_eviction_on_laggy_osds
true
[ceph: root@mero017 /]# ceph versions
{
    "mon": {
        "ceph version 18.2.1-76.el9cp (2517f8a5ef5f5a6a22013b2fb11a591afd474668) reef (stable)": 3
    },
    "mgr": {
        "ceph version 18.2.1-76.el9cp (2517f8a5ef5f5a6a22013b2fb11a591afd474668) reef (stable)": 3
    },
    "osd": {
        "ceph version 18.2.1-76.el9cp (2517f8a5ef5f5a6a22013b2fb11a591afd474668) reef (stable)": 33
    },
    "mds": {
        "ceph version 18.2.1-76.el9cp (2517f8a5ef5f5a6a22013b2fb11a591afd474668) reef (stable)": 6
    },
    "overall": {
        "ceph version 18.2.1-76.el9cp (2517f8a5ef5f5a6a22013b2fb11a591afd474668) reef (stable)": 45
    }
}
[ceph: root@mero017 /]# 
We are validating the clients eviction also.
If clients are not evicted and if we still can access the mount points on the client then we are failing the test case 

Ref: https://github.com/red-hat-storage/cephci/blob/e01ff9a132697422bf8d320385aceed5140db553/tests/cephfs/cephfs_bugs/test_defer_client_evict_on_laggy_osd.py#L172
Log : http://magna002.ceph.redhat.com/cephci-jenkins/test-runs/18.2.1-77/Regression/cephfs/84/tier-2_cephfs_test-clients/Client_eviction_deferred_if_OSD_is_laggy_0.log

Regards,
Amarnath

Comment 8 errata-xmlrpc 2024-06-13 14:29:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925