Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2269663

Summary: mds: disable defer_client_eviction_on_laggy_osds by default
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Venky Shankar <vshankar>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Amarnath <amk>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: akraj, amk, ceph-eng-bugs, cephqe-warriors, hyelloji, mcaldeir, sostapov, sumr, tserlin, vereddy
Target Milestone: ---   
Target Release: 6.1z5   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-17.2.6-207.el9cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2269664 (view as bug list) Environment:
Last Closed: 2024-04-01 10:20:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2267617    

Description Venky Shankar 2024-03-15 07:29:15 UTC
This config can result in a single client holding up mds to service other clients since once a client is deferred from eviction due to laggy OSD, a new clients cap acquire request can be possibly blocked until the other laggy client resumes operation, i.e., when the laggy OSD is considered non-laggy anymore.

Comment 1 Scott Ostapovicz 2024-03-19 13:05:42 UTC
These BZs were targeted to z5 after the date when they should have been targeted at z6.

Comment 7 Amarnath 2024-03-21 08:30:49 UTC
Hi All,

On Latest builds we are observing defer_client_eviction_on_laggy_osds as false by default
[root@ceph-amk-bz-up-5jvwkm-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 17.2.6-208.el9cp (d192afbf9e5a88ecb3f52639da19d9f46d8be60c) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-208.el9cp (d192afbf9e5a88ecb3f52639da19d9f46d8be60c) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.6-208.el9cp (d192afbf9e5a88ecb3f52639da19d9f46d8be60c) quincy (stable)": 12
    },
    "mds": {
        "ceph version 17.2.6-208.el9cp (d192afbf9e5a88ecb3f52639da19d9f46d8be60c) quincy (stable)": 3
    },
    "overall": {
        "ceph version 17.2.6-208.el9cp (d192afbf9e5a88ecb3f52639da19d9f46d8be60c) quincy (stable)": 20
    }
}
[root@ceph-amk-bz-up-5jvwkm-node7 ~]# ceph config get mds defer_client_eviction_on_laggy_osds
false
[root@ceph-amk-bz-up-5jvwkm-node7 ~]# 


On the Previous build we had defer_client_eviction_on_laggy_osds to true by default
[root@ceph-amk-up-95t6oh-node8 rhceph-qe-jenkins]# ceph versions
{
    "mon": {
        "ceph version 17.2.6-205.el9cp (d2906f0987908581de69deb71dabc40289bce7e9) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-205.el9cp (d2906f0987908581de69deb71dabc40289bce7e9) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.6-205.el9cp (d2906f0987908581de69deb71dabc40289bce7e9) quincy (stable)": 16
    },
    "mds": {
        "ceph version 17.2.6-205.el9cp (d2906f0987908581de69deb71dabc40289bce7e9) quincy (stable)": 5
    },
    "overall": {
        "ceph version 17.2.6-205.el9cp (d2906f0987908581de69deb71dabc40289bce7e9) quincy (stable)": 26
    }
}
[root@ceph-amk-up-95t6oh-node8 rhceph-qe-jenkins]# ceph config get mds defer_client_eviction_on_laggy_osds
true



@Venky,
we already tests covering defer_client_eviction_on_laggy_osds setting as part of our regression runs
Do we need anything else to be tested for this BZ

Regards,
Amarnath

Comment 8 Venky Shankar 2024-03-21 09:34:38 UTC
(In reply to Amarnath from comment #7)
> Hi All,
> 
> On Latest builds we are observing defer_client_eviction_on_laggy_osds as
> false by default
> [root@ceph-amk-bz-up-5jvwkm-node7 ~]# ceph versions
> {
>     "mon": {
>         "ceph version 17.2.6-208.el9cp
> (d192afbf9e5a88ecb3f52639da19d9f46d8be60c) quincy (stable)": 3
>     },
>     "mgr": {
>         "ceph version 17.2.6-208.el9cp
> (d192afbf9e5a88ecb3f52639da19d9f46d8be60c) quincy (stable)": 2
>     },
>     "osd": {
>         "ceph version 17.2.6-208.el9cp
> (d192afbf9e5a88ecb3f52639da19d9f46d8be60c) quincy (stable)": 12
>     },
>     "mds": {
>         "ceph version 17.2.6-208.el9cp
> (d192afbf9e5a88ecb3f52639da19d9f46d8be60c) quincy (stable)": 3
>     },
>     "overall": {
>         "ceph version 17.2.6-208.el9cp
> (d192afbf9e5a88ecb3f52639da19d9f46d8be60c) quincy (stable)": 20
>     }
> }
> [root@ceph-amk-bz-up-5jvwkm-node7 ~]# ceph config get mds
> defer_client_eviction_on_laggy_osds
> false
> [root@ceph-amk-bz-up-5jvwkm-node7 ~]# 
> 
> 
> On the Previous build we had defer_client_eviction_on_laggy_osds to true by
> default
> [root@ceph-amk-up-95t6oh-node8 rhceph-qe-jenkins]# ceph versions
> {
>     "mon": {
>         "ceph version 17.2.6-205.el9cp
> (d2906f0987908581de69deb71dabc40289bce7e9) quincy (stable)": 3
>     },
>     "mgr": {
>         "ceph version 17.2.6-205.el9cp
> (d2906f0987908581de69deb71dabc40289bce7e9) quincy (stable)": 2
>     },
>     "osd": {
>         "ceph version 17.2.6-205.el9cp
> (d2906f0987908581de69deb71dabc40289bce7e9) quincy (stable)": 16
>     },
>     "mds": {
>         "ceph version 17.2.6-205.el9cp
> (d2906f0987908581de69deb71dabc40289bce7e9) quincy (stable)": 5
>     },
>     "overall": {
>         "ceph version 17.2.6-205.el9cp
> (d2906f0987908581de69deb71dabc40289bce7e9) quincy (stable)": 26
>     }
> }
> [root@ceph-amk-up-95t6oh-node8 rhceph-qe-jenkins]# ceph config get mds
> defer_client_eviction_on_laggy_osds
> true
> 
> 
> 
> @Venky,
> we already tests covering defer_client_eviction_on_laggy_osds setting as
> part of our regression runs
> Do we need anything else to be tested for this BZ

Do the tests ensure that clients are evicted (when this config is disabled) if they do not respond to cap revoke requests? That can happen in cases when some OSDs are laggy.

Comment 10 errata-xmlrpc 2024-04-01 10:20:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:1580