Bug 1974882
| Summary: | slow performance on parallel rm operations to the same PVC RWX based on CephFS | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | kelwhite |
| Component: | CephFS | Assignee: | Xiubo Li <xiubli> |
| Status: | CLOSED ERRATA | QA Contact: | Amarnath <amk> |
| Severity: | low | Docs Contact: | Ranjini M N <rmandyam> |
| Priority: | high | ||
| Version: | 5.0 | CC: | agunn, amk, bniver, ceph-eng-bugs, gfarnum, hchiramm, hyelloji, kdreyer, madam, mduasope, muagarwa, ocs-bugs, pdonnell, rmandyam, sostapov, tserlin, vereddy, xiubli, ykaul |
| Target Milestone: | --- | Keywords: | ABIAssurance, Performance |
| Target Release: | 5.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-16.2.7-14.el8cp | Doc Type: | Bug Fix |
| Doc Text: |
.The global `mds_lock` is now switched to fair mutex for better user experience
Previously, the Ceph Metadata Server (MDS) daemon used the std::mutex for the global `mds_lock` causing the lock waiters to be stuck for several seconds during heavy load.
This would lead to users experiencing slow operations for `rmdir` or `mkdir` commands.
With this update, the MDS daemon’s global `mds_lock` is switched to a fair mutex to guarantee the lock waiters are woken up and scheduled in FIFO mode resulting in a better user experience and improving the performance of clients in heavy loads.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-04-04 10:21:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2031073 | ||
|
Comment 2
Yaniv Kaul
2021-07-06 11:15:34 UTC
Not a 4.8 blocker, moving out. It looks like something went wrong with the assignment, as it is now assignerfd to an invalid user. Assigning to Humble so he can triage the performance degradation from OCS release to OCS release first.
Hi @Xiubo,
I tried the below commands in 4.2 and 5.1 builds.
I see better performance in 4.2 Builds
May i know if i am missing anything
OS : RHEL 8.5
Commands :
mkdir /mnt/kcephfs.A/
mkdir /mnt/kcephfs.B
mkdir /mnt/kcephfs.C
ceph fs subvolume create cephfs subvol_A
ceph fs subvolume create cephfs subvol_B
ceph fs subvolume create cephfs subvol_C
ceph fs subvolume getpath cephfs subvol_A
ceph fs subvolume getpath cephfs subvol_B
ceph fs subvolume getpath cephfs subvol_C
mount -t ceph 10.0.209.255,10.0.210.106,10.0.211.95:/volumes/_nogroup/subvol_A/8c97cbcf-8806-451b-8796-afbd04d20b41 /mnt/kcephfs.A/ -o name=ceph-amk-bz-l93gok-node7,secretfile=/etc/ceph/ceph-amk-bz-l93gok-node7.secret
mount -t ceph 10.0.209.255,10.0.210.106,10.0.211.95:/volumes/_nogroup/subvol_B/e76cf51d-36f9-4f8a-beb7-f69b45bb74c8 /mnt/kcephfs.B/ -o name=ceph-amk-bz-l93gok-node7,secretfile=/etc/ceph/ceph-amk-bz-l93gok-node7.secret
mount -t ceph 10.0.209.255,10.0.210.106,10.0.211.95:/volumes/_nogroup/subvol_C/53e3ad08-1176-4d92-b5ef-917328a5b123 /mnt/kcephfs.C/ -o name=ceph-amk-bz-l93gok-node7,secretfile=/etc/ceph/ceph-amk-bz-l93gok-node7.secret
[root@ceph-upgrade-5-0-zcrq6x-node7 ~]# for d in A B C; do (cd /mnt/kcephfs.$d && for i in {0001..1000}; do mkdir -p removal$d.test$i; done ) & done; wait
[1] 74019
[2] 74020
[3] 74021
[1] Done ( cd /mnt/kcephfs.$d && for i in {0001..1000};
do
mkdir -p removal$d.test$i;
done )
[3]+ Done ( cd /mnt/kcephfs.$d && for i in {0001..1000};
do
mkdir -p removal$d.test$i;
done )
[2]+ Done ( cd /mnt/kcephfs.$d && for i in {0001..1000};
do
mkdir -p removal$d.test$i;
done )
[root@ceph-upgrade-5-0-zcrq6x-node7 ~]# for i in A B C; do (cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i*) & done; wait
[1] 77022
[2] 77023
[3] 77024
real 0m1.512s
user 0m0.109s
sys 0m0.225s
real 0m1.516s
user 0m0.110s
sys 0m0.229s
[1] Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* )
[3]+ Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* )
real 0m1.566s
user 0m0.116s
sys 0m0.205s
[2]+ Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* )
[root@ceph-upgrade-5-0-zcrq6x-node7 ~]# ceph version
ceph version 16.2.7-69.el8cp (3eaf40c02886a02f9b172579ac6048bad587b63b) pacific (stable)
========================================================================================================================================================================================================================
[root@ceph-amk-bz-l93gok-node7 ~]# for d in A B C; do (cd /mnt/kcephfs.$d && for i in {0001..1000}; do mkdir -p removal$d.test$i; done ) & done; wait
[1] 83007
[2] 83008
[3] 83009
[1] Done ( cd /mnt/kcephfs.$d && for i in {0001..1000};
do
mkdir -p removal$d.test$i;
done )
[2]- Done ( cd /mnt/kcephfs.$d && for i in {0001..1000};
do
mkdir -p removal$d.test$i;
done )
[3]+ Done ( cd /mnt/kcephfs.$d && for i in {0001..1000};
do
mkdir -p removal$d.test$i;
done )
[root@ceph-amk-bz-l93gok-node7 ~]# for i in A B C; do (cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i*) & done; wait
[1] 86010
[2] 86011
[3] 86012
real 0m1.661s
user 0m0.112s
sys 0m0.275s
real 0m1.796s
user 0m0.119s
sys 0m0.213s
real 0m1.816s
user 0m0.103s
sys 0m0.246s
[1] Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* )
[2]- Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* )
[3]+ Done ( cd /mnt/kcephfs.$i && time strace -Tv -o ~/removal${i}.log -- rm -rf removal$i* )
[root@xiubli
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174 |