Description of problem: ======================== On a scale cluster with 2000 nfs exports, while setting the bandwidth_control and ops_control limit at cluster level and export level, high memory utilization is observed. Note : Only the limits were set and no IO's were performed on these exports. ]# top -p 3279630 top - 17:01:08 up 3 days, 17:31, 1 user, load average: 0.27, 0.31, 0.37 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 127831.6 total, 79707.4 free, 44779.2 used, 5102.1 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 83052.4 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3279630 root 20 0 35.2g 31.1g 29568 S 24.7 24.9 26:12.31 ganesha.nfsd # ps -p 3279630 -o pid,%mem,rss,vsz,cmd PID %MEM RSS VSZ CMD 3279630 24.9 32609328 36940032 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT Version-Release number of selected component (if applicable): ============================================================ # ceph --version ceph version 19.2.0-108.el9cp (1762f710a9f63e0304d69ed81ad964841146c93d) squid (stable) # rpm -qa | grep nfs libnfsidmap-2.5.4-27.el9.x86_64 nfs-utils-2.5.4-27.el9.x86_64 nfs-ganesha-selinux-6.5-5.el9cp.noarch nfs-ganesha-6.5-5.el9cp.x86_64 nfs-ganesha-ceph-6.5-5.el9cp.x86_64 nfs-ganesha-rados-grace-6.5-5.el9cp.x86_64 nfs-ganesha-rados-urls-6.5-5.el9cp.x86_64 nfs-ganesha-rgw-6.5-5.el9cp.x86_64 nfs-ganesha-utils-6.5-5.el9cp.x86_64 How reproducible: ================ 1/1 Steps to Reproduce: =================== 1. Create NFS Ganesha cluster # ceph nfs cluster info nfsganesha { "nfsganesha": { "backend": [ { "hostname": "cali015", "ip": "10.8.130.15", "port": 12049 } ], "monitor_port": 9049, "port": 2049, "virtual_ip": "10.8.130.200" } } 2. Create 2000 NFS exports on 2000 subvolume 3. Set the cluster limits for bandwidth_control and ops_control Cluster level settings → ---------------------- # ceph nfs cluster qos enable bandwidth_control nfsganesha PerShare --max_export_write_bw 2GB --max_export_read_bw 2GB # ceph nfs cluster qos enable ops_control nfsganesha PerShare --max_export_iops 10000 4. Set the limits for 2000 exports as below --> Enable the bandwidth_control for 2000 exports as below ----------------------------------------------------------- 1-500 exports → # for i in $(seq 1 500);do ceph nfs export qos enable bandwidth_control nfsganesha /ganeshavol$i --max_export_write_bw 1GB --max_export_read_bw 1GB; done 501 -1000 exports → # for i in $(seq 501 1000);do ceph nfs export qos enable bandwidth_control nfsganesha /ganeshavol$i --max_export_write_bw 2GB --max_export_read_bw 2GB; done 1001 - 1500 exports → # for i in $(seq 1001 1500);do ceph nfs export qos enable bandwidth_control nfsganesha /ganeshavol$i --max_export_write_bw 3GB --max_export_read_bw 3GB; done 1501 - 2000 exports → # for i in $(seq 1501 2000);do ceph nfs export qos enable bandwidth_control nfsganesha /ganeshavol$i --max_export_write_bw 4GB --max_export_read_bw 4GB; done --> Enable the ops_control for 2000 exports as below ----------------------------------------------------- # for i in $(seq 1 500);do ceph nfs export qos enable ops_control nfsganesha /ganeshavol$i --max_export_iops 10000; done # for i in $(seq 501 1000);do ceph nfs export qos enable ops_control nfsganesha /ganeshavol$i --max_export_iops 12000; done # for i in $(seq 1001 1500);do ceph nfs export qos enable ops_control nfsganesha /ganeshavol$i --max_export_iops 14000; done # for i in $(seq 1501 2000);do ceph nfs export qos enable ops_control nfsganesha /ganeshavol$i --max_export_iops 16000; done # ceph nfs export info nfsganesha /ganeshavol1800 { "access_type": "RW", "clients": [], "cluster_id": "nfsganesha", "export_id": 1800, "fsal": { "cmount_path": "/", "fs_name": "cephfs", "name": "CEPH", "user_id": "nfs.nfsganesha.cephfs.2c1043d4" }, "path": "/volumes/ganeshagroup/ganesha1800/fc57d302-43b7-44cb-8461-d69f46b0323a", "protocols": [ 3, 4 ], "pseudo": "/ganeshavol1800", "qos_block": { "combined_rw_bw_control": false, "enable_bw_control": true, "enable_iops_control": true, "enable_qos": true, "max_export_iops": 16000, "max_export_read_bw": "4.0GB", "max_export_write_bw": "4.0GB" }, "security_label": true, "squash": "none", "transports": [ "TCP" ] } Actual results: =============== The NFS process was observed to be consuming 31.1 GB of memory. At the time, no exports were mounted on any clients, and no I/O operations were being executed. Expected results: ================= The NFS process should utilize a significantly lower amount of memory, especially when no exports are mounted on clients and no I/O operations are running Additional info:
Please test same scenario with QoS disabled. And please test with QoS and with "apply" command instead of "ceph mgr" commands. Please inform if memory usage increased even with "apply" commands
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:3635