Description of problem: NFS cluster Crashing while performing read operation with QoS enabled 1. enabled max_client_combined_bw and max_export_combined_bw for PerShare_PerClient -> Performed write operation using dd command and bandwidth control out put was write speed is 7.7 MB/s. 2025-04-29 13:01:15,509 - cephci - ceph:1630 - INFO - Execution of dd if=/dev/urandom of=/mnt/nfs/sample.txt bs=100M count=1 on 10.0.64.131 took 14.340555 seconds 2025-04-29 13:01:15,509 - cephci - test_nfs_qos_on_cluster_level_enablement:42 - INFO - File created successfully on ceph-harish-vm-49jsl6-node4 2025-04-29 13:01:15,510 - cephci - test_nfs_qos_on_cluster_level_enablement:43 - INFO - write speed is 7.7 MB/s 2.Performed read operation on the same file using dd command and the bandwidth control output was read speed is 211 MB/s 2025-04-29 13:01:17,279 - cephci - ceph:1630 - INFO - Execution of dd if=/mnt/nfs/sample.txt of=/dev/urandom on 10.0.64.131 took 1.296683 seconds 2025-04-29 13:01:17,280 - cephci - test_nfs_qos_on_cluster_level_enablement:59 - INFO - read speed is 211 MB/s 3. Dropped the cache with cmd echo 3 > /proc/sys/vm/drop_caches 4. again repeated read operation as mentioned in step 2. Observation =========== On step 4. Ganesha cluster got crashed and dumped multiple cores. [root@cali016 coredump]# ls 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2922972.1745914458000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2925338.1745914479000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2925629.1745914501000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2926033.1745914523000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2926312.1745914545000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2927782.1745914813000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2929508.1745914835000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2929780.1745914857000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2930689.1745914891000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2932474.1745914913000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2932801.1745914935000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2933081.1745914957000000.zst' 'core.ganesha\x2enfsd.0.5abdc2e9cbaa4546825301ce4b3b9d46.2933477.1745914979000000.zst' ceph ps ============ [ceph: root@cali013 /]# ceph orch ps | grep nfs haproxy.nfs.TestClusterHA.cali016.chkhse cali016 *:2049,9049 running (24m) 79s ago 24m 101M - 2.4.22-f8e3218 6c223bddea69 e05e3e48fd0c keepalived.nfs.TestClusterHA.cali016.sxqhdd cali016 running (24m) 79s ago 24m 1555k - 2.2.8 09859a486cb9 f159676f7347 nfs.TestClusterHA.0.0.cali016.tzahej cali016 *:12049 error 79s ago 24m - - <unknown> <unknown> <unknown> ceph -s =========== [ceph: root@cali013 /]# ceph -s cluster: id: 288c1062-18fb-11f0-a987-b49691cee574 health: HEALTH_WARN 1 failed cephadm daemon(s) services: mon: 5 daemons, quorum cali013,cali020,cali016,cali019,cali015 (age 93m) mgr: cali016.pslybk(active, since 95m), standbys: cali013.heutyr mds: 1/1 daemons up, 1 standby osd: 34 osds: 34 up (since 85m), 34 in (since 9d) data: volumes: 1/1 healthy pools: 4 pools, 1073 pgs objects: 56 objects, 101 MiB usage: 3.1 GiB used, 84 TiB / 84 TiB avail pgs: 1073 active+clean Version-Release number of selected component (if applicable): 20 [ceph: root@cali013 /]# ceph --version ceph version 19.2.1-154.el9cp (66ec30425949b52e06ca00d78ef0b1cc395e6a39) squid (stable) [ceph: root@cali013 /]# rpm -qa | grep nfs libnfsidmap-2.5.4-27.el9.x86_64 nfs-utils-2.5.4-27.el9.x86_64 nfs-ganesha-selinux-6.5-10.el9cp.noarch nfs-ganesha-6.5-10.el9cp.x86_64 nfs-ganesha-ceph-6.5-10.el9cp.x86_64 nfs-ganesha-rados-grace-6.5-10.el9cp.x86_64 nfs-ganesha-rados-urls-6.5-10.el9cp.x86_64 nfs-ganesha-rgw-6.5-10.el9cp.x86_64 How reproducible: always Steps to Reproduce: Mentioned above Actual results: NFS cluster Crashing while performing read operation with QoS enabled Expected results: NFS ganesha shoudn't crash Additional info: back trace ============== #0 0x00007f52bff0554c in __stpcpy_evex () from /lib64/libc.so.6 [Current thread is 1 (LWP 104)] Missing separate debuginfos, use: dnf debuginfo-install abseil-cpp-20211102.0-4.el9.x86_64 c-ares-1.19.1-2.el9_4.x86_64 dbus-libs-1.12.20-8.el9.x86_64 glibc-2.34-125.el9_5.1.x86_64 grpc-1.46.7-10.el9.x86_64 grpc-cpp-1.46.7-10.el9.x86_64 gssproxy-0.8.4-7.el9.x86_64 keyutils-libs-1.6.3-1.el9.x86_64 krb5-libs-1.21.1-4.el9_5.x86_64 libacl-2.3.1-4.el9.x86_64 libattr-2.5.1-3.el9.x86_64 libblkid-2.37.4-20.el9.x86_64 libcom_err-1.46.5-5.el9.x86_64 libcurl-7.76.1-31.el9.x86_64 libgcc-11.5.0-5.el9_5.x86_64 libgpg-error-1.42-5.el9.x86_64 libibverbs-51.0-1.el9.x86_64 libicu-67.1-9.el9.x86_64 libnfsidmap-2.5.4-27.el9.x86_64 libnghttp2-1.43.0-6.el9.x86_64 libnl3-3.9.0-1.el9.x86_64 librdmacm-51.0-1.el9.x86_64 libselinux-3.6-1.el9.x86_64 libstdc++-11.5.0-5.el9_5.x86_64 libuuid-2.37.4-20.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 lttng-ust-2.12.0-6.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64 numactl-libs-2.0.18-2.el9.x86_64 openssl-libs-3.2.2-6.el9_5.1.x86_64 pcre2-10.40-6.el9.x86_64 protobuf-3.14.0-13.el9.x86_64 sssd-client-2.9.5-4.el9_5.4.x86_64 userspace-rcu-0.12.1-6.el9.x86_64 xz-libs-5.2.5-8.el9_0.x86_64 zlib-1.2.11-40.el9.x86_64 (gdb) bt #0 0x00007f52bff0554c in __stpcpy_evex () from /lib64/libc.so.6 #1 0x00007f52bd991e84 in ceph::buffer::v15_2_0::list::iterator_impl<false>::copy (this=0x7f51a2ffa8f0, len=<optimized out>, dest=0x0) at /usr/src/debug/ceph-19.2.1-159.el9cp.x86_64/src/common/buffer.cc:703 #2 0x00007f52b82e7ced in ceph_ll_read (cmount=<optimized out>, filehandle=<optimized out>, off=off@entry=0, len=<optimized out>, buf=0x0) at /usr/src/debug/ceph-19.2.1-159.el9cp.x86_64/src/include/buffer.h:1017 #3 0x00007f52bd2276e3 in ceph_fsal_read2 (obj_hdl=0x7f518c003a90, bypass=<optimized out>, done_cb=0x7f52c00f17e0 <mdc_read_cb>, read_arg=0x7f51852e47f8, caller_arg=0x7f51841d5b30) at /usr/src/debug/nfs-ganesha-6.5-10.el9cp.x86_64/src/FSAL/FSAL_CEPH/handle.c:2018 #4 0x00007f52c00f0f23 in mdcache_read2 (obj_hdl=0x7f518c01e648, bypass=<optimized out>, done_cb=<optimized out>, read_arg=0x7f51852e47f8, caller_arg=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-10.el9cp.x86_64/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:595 #5 0x00007f52c012cb43 in nfs4_read.constprop.0 (op=op@entry=0x7f518401cde0, data=data@entry=0x7f5184030e40, resp=resp@entry=0x7f51844d8aa0, info=info@entry=0x0, io=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-10.el9cp.x86_64/src/Protocols/NFS/nfs4_op_read.c:892 #6 0x00007f52c00bec32 in nfs4_op_read (op=0x7f518401cde0, data=0x7f5184030e40, resp=0x7f51844d8aa0) at /usr/src/debug/nfs-ganesha-6.5-10.el9cp.x86_64/src/Protocols/NFS/nfs4_op_read.c:969 #7 0x00007f52c00aa4de in process_one_op (data=data@entry=0x7f5184030e40, status=status@entry=0x7f51a2ffb54c) at /usr/src/debug/nfs-ganesha-6.5-10.el9cp.x86_64/src/Protocols/NFS/nfs4_Compound.c:912 #8 0x00007f52c00ac138 in nfs4_Compound (arg=<optimized out>, req=0x7f51843dd5e0, res=0x7f51843d97e0) at /usr/src/debug/nfs-ganesha-6.5-10.el9cp.x86_64/src/Protocols/NFS/nfs4_Compound.c:1413 #9 0x00007f52c0025485 in nfs_rpc_process_request (reqdata=<optimized out>, retry=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-10.el9cp.x86_64/src/MainNFSD/nfs_worker_thread.c:1479 #10 0x00007f52bfd745e7 in svc_request (xprt=0x7f516c00b940, xdrs=<optimized out>) at /usr/src/debug/libntirpc-6.3-2.el9cp.x86_64/src/svc_rqst.c:1229 #11 0x00007f52bfd78e5a in svc_rqst_xprt_task_recv (wpe=<optimized out>) at /usr/src/debug/libntirpc-6.3-2.el9cp.x86_64/src/svc_rqst.c:1210 #12 0x00007f52bfd7b91b in svc_rqst_epoll_loop (wpe=0x55f1e5f41e50) at /usr/src/debug/libntirpc-6.3-2.el9cp.x86_64/src/svc_rqst.c:1585 #13 0x00007f52bfd84cbc in work_pool_thread (arg=0x7f51d4078180) at /usr/src/debug/libntirpc-6.3-2.el9cp.x86_64/src/work_pool.c:187 #14 0x00007f52bfe247e2 in pthread_create.5 () from /lib64/libc.so.6 #15 0x0000000000000000 in ?? ()