Bug 1974882
Summary: | slow performance on parallel rm operations to the same PVC RWX based on CephFS | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | kelwhite |
Component: | CephFS | Assignee: | Xiubo Li <xiubli> |
Status: | CLOSED ERRATA | QA Contact: | Amarnath <amk> |
Severity: | low | Docs Contact: | Ranjini M N <rmandyam> |
Priority: | high | ||
Version: | 5.0 | CC: | agunn, amk, bniver, ceph-eng-bugs, gfarnum, hchiramm, hyelloji, kdreyer, madam, mduasope, muagarwa, ocs-bugs, pdonnell, rmandyam, sostapov, tserlin, vereddy, xiubli, ykaul |
Target Milestone: | --- | Keywords: | ABIAssurance, Performance |
Target Release: | 5.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ceph-16.2.7-14.el8cp | Doc Type: | Bug Fix |
Doc Text: |
.The global `mds_lock` is now switched to fair mutex for better user experience
Previously, the Ceph Metadata Server (MDS) daemon used the std::mutex for the global `mds_lock` causing the lock waiters to be stuck for several seconds during heavy load.
This would lead to users experiencing slow operations for `rmdir` or `mkdir` commands.
With this update, the MDS daemon’s global `mds_lock` is switched to a fair mutex to guarantee the lock waiters are woken up and scheduled in FIFO mode resulting in a better user experience and improving the performance of clients in heavy loads.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-04-04 10:21:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2031073 |
Comment 2
Yaniv Kaul
2021-07-06 11:15:34 UTC
Not a 4.8 blocker, moving out. It looks like something went wrong with the assignment, as it is now assignerfd to an invalid user. Assigning to Humble so he can triage the performance degradation from OCS release to OCS release first. Hi @Xiubo, I tried the below commands in 4.2 and 5.1 builds. I see better performance in 4.2 Builds May i know if i am missing anything OS : RHEL 8.5 Commands : mkdir /mnt/kcephfs.A/ mkdir /mnt/kcephfs.B mkdir /mnt/kcephfs.C ceph fs subvolume create cephfs subvol_A ceph fs subvolume create cephfs subvol_B ceph fs subvolume create cephfs subvol_C ceph fs subvolume getpath cephfs subvol_A ceph fs subvolume getpath cephfs subvol_B ceph fs subvolume getpath cephfs subvol_C mount -t ceph 10.0.209.255,10.0.210.106,10.0.211.95:/volumes/_nogroup/subvol_A/8c97cbcf-8806-451b-8796-afbd04d20b41 /mnt/kcephfs.A/ -o name=ceph-amk-bz-l93gok-node7,secretfile=/etc/ceph/ceph-amk-bz-l93gok-node7.secret mount -t ceph 10.0.209.255,10.0.210.106,10.0.211.95:/volumes/_nogroup/subvol_B/e76cf51d-36f9-4f8a-beb7-f69b45bb74c8 /mnt/kcephfs.B/ -o name=ceph-amk-bz-l93gok-node7,secretfile=/etc/ceph/ceph-amk-bz-l93gok-node7.secret mount -t ceph 10.0.209.255,10.0.210.106,10.0.211.95:/volumes/_nogroup/subvol_C/53e3ad08-1176-4d92-b5ef-917328a5b123 /mnt/kcephfs.C/ -o name=ceph-amk-bz-l93gok-node7,secretfile=/etc/ceph/ceph-amk-bz-l93gok-node7.secret [root@ceph-upgrade-5-0-zcrq6x-node7 ~]# for d in A B C; do (cd /mnt/kcephfs.$d && for i in {0001..1000}; do mkdir -p removal$d.test$i; done ) & done; wait [1] 74019 [2] 74020 [3] 74021 [1] Done ( cd /mnt/kcephfs.$d && for i in {0001..1000}; do mkdir -p removal$d.test$i; done ) [3]+ Done ( cd /mnt/kcephfs.$d && for i in {0001..1000}; do mkdir -p removal$d.test$i; done ) [2]+ Done ( cd /mnt/kcephfs.$d && for i in {0001..1000}; do mkdir -p removal$d.test$i; done ) [root@ceph-upgrade-5-0-zcrq6x-node7 ~]# for i in A B C; do (cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i*) & done; wait [1] 77022 [2] 77023 [3] 77024 real 0m1.512s user 0m0.109s sys 0m0.225s real 0m1.516s user 0m0.110s sys 0m0.229s [1] Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* ) [3]+ Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* ) real 0m1.566s user 0m0.116s sys 0m0.205s [2]+ Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* ) [root@ceph-upgrade-5-0-zcrq6x-node7 ~]# ceph version ceph version 16.2.7-69.el8cp (3eaf40c02886a02f9b172579ac6048bad587b63b) pacific (stable) ======================================================================================================================================================================================================================== [root@ceph-amk-bz-l93gok-node7 ~]# for d in A B C; do (cd /mnt/kcephfs.$d && for i in {0001..1000}; do mkdir -p removal$d.test$i; done ) & done; wait [1] 83007 [2] 83008 [3] 83009 [1] Done ( cd /mnt/kcephfs.$d && for i in {0001..1000}; do mkdir -p removal$d.test$i; done ) [2]- Done ( cd /mnt/kcephfs.$d && for i in {0001..1000}; do mkdir -p removal$d.test$i; done ) [3]+ Done ( cd /mnt/kcephfs.$d && for i in {0001..1000}; do mkdir -p removal$d.test$i; done ) [root@ceph-amk-bz-l93gok-node7 ~]# for i in A B C; do (cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i*) & done; wait [1] 86010 [2] 86011 [3] 86012 real 0m1.661s user 0m0.112s sys 0m0.275s real 0m1.796s user 0m0.119s sys 0m0.213s real 0m1.816s user 0m0.103s sys 0m0.246s [1] Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* ) [2]- Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* ) [3]+ Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* ) [root@xiubli Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174 |