2352781 – [NFS-Ganesha] Unexpectedly high memory utilization (31.1 GB) is observed when enabling bandwidth and OPS control limits at both the cluster and export levels in a scaled cluster environment (2000 exports).

Bug 2352781 - [NFS-Ganesha] Unexpectedly high memory utilization (31.1 GB) is observed when enabling bandwidth and OPS control limits at both the cluster and export levels in a scaled cluster environment (2000 exports). [NEEDINFO]

Summary: [NFS-Ganesha] Unexpectedly high memory utilization (31.1 GB) is observed when...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	NFS-Ganesha
Sub Component:
Version:	8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	8.0z3
Assignee:	Naresh
QA Contact:	Manisha Saini
Docs Contact:	Rivka Pollack
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2025-03-16 17:06 UTC by Manisha Saini
Modified:	2025-04-07 15:27 UTC (History)
CC List:	6 users (show)
Fixed In Version:	nfs-ganesha-6.5-8.el9cp
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2025-04-07 15:27:28 UTC
Embargoed:
Dependent Products:
Flags:	rpollack: needinfo? (nchillar)

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-10867	0	None	None	None	2025-03-16 17:08:47 UTC
Red Hat Product Errata	RHSA-2025:3635	0	None	None	None	2025-04-07 15:27:30 UTC

Description Manisha Saini 2025-03-16 17:06:51 UTC

Description of problem:
========================

On a scale cluster with 2000 nfs exports, while setting the bandwidth_control and ops_control limit at cluster level and export level, high memory utilization is observed.

Note : Only the limits were set and no IO's were performed on these exports.

]# top -p 3279630
top - 17:01:08 up 3 days, 17:31,  1 user,  load average: 0.27, 0.31, 0.37
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 127831.6 total,  79707.4 free,  44779.2 used,   5102.1 buff/cache
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  83052.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3279630 root      20   0   35.2g  31.1g  29568 S  24.7  24.9  26:12.31 ganesha.nfsd


# ps -p 3279630 -o pid,%mem,rss,vsz,cmd
    PID %MEM   RSS    VSZ CMD
3279630 24.9 32609328 36940032 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT

Version-Release number of selected component (if applicable):
============================================================

# ceph --version
ceph version 19.2.0-108.el9cp (1762f710a9f63e0304d69ed81ad964841146c93d) squid (stable)

# rpm -qa | grep nfs
libnfsidmap-2.5.4-27.el9.x86_64
nfs-utils-2.5.4-27.el9.x86_64
nfs-ganesha-selinux-6.5-5.el9cp.noarch
nfs-ganesha-6.5-5.el9cp.x86_64
nfs-ganesha-ceph-6.5-5.el9cp.x86_64
nfs-ganesha-rados-grace-6.5-5.el9cp.x86_64
nfs-ganesha-rados-urls-6.5-5.el9cp.x86_64
nfs-ganesha-rgw-6.5-5.el9cp.x86_64
nfs-ganesha-utils-6.5-5.el9cp.x86_64


How reproducible:
================
1/1


Steps to Reproduce:
===================
1. Create NFS Ganesha cluster

# ceph nfs cluster info nfsganesha
{
  "nfsganesha": {
    "backend": [
      {
        "hostname": "cali015",
        "ip": "10.8.130.15",
        "port": 12049
      }
    ],
    "monitor_port": 9049,
    "port": 2049,
    "virtual_ip": "10.8.130.200"
  }
}

2. Create 2000 NFS exports on 2000 subvolume

3. Set the cluster limits for bandwidth_control and ops_control

Cluster level settings →
----------------------
# ceph nfs cluster qos enable bandwidth_control nfsganesha PerShare --max_export_write_bw 2GB --max_export_read_bw 2GB

# ceph nfs cluster qos enable ops_control nfsganesha PerShare --max_export_iops 10000

4. Set the limits for 2000 exports as below

 --> Enable the bandwidth_control for 2000 exports as below
-----------------------------------------------------------

1-500 exports → 
# for i in $(seq 1 500);do ceph nfs export qos enable bandwidth_control nfsganesha /ganeshavol$i --max_export_write_bw 1GB --max_export_read_bw 1GB; done 

501 -1000 exports → 
# for i in $(seq 501 1000);do ceph nfs export qos enable bandwidth_control nfsganesha /ganeshavol$i --max_export_write_bw 2GB --max_export_read_bw 2GB; done

1001 - 1500 exports →
# for i in $(seq 1001 1500);do ceph nfs export qos enable bandwidth_control nfsganesha /ganeshavol$i --max_export_write_bw 3GB --max_export_read_bw 3GB; done

1501 - 2000 exports →
# for i in $(seq 1501 2000);do ceph nfs export qos enable bandwidth_control nfsganesha /ganeshavol$i --max_export_write_bw 4GB --max_export_read_bw 4GB; done

 --> Enable the ops_control for 2000 exports as below
-----------------------------------------------------
# for i in $(seq 1 500);do ceph nfs export qos enable ops_control nfsganesha /ganeshavol$i --max_export_iops 10000; done

# for i in $(seq 501 1000);do ceph nfs export qos enable ops_control nfsganesha /ganeshavol$i --max_export_iops 12000; done

# for i in $(seq 1001 1500);do ceph nfs export qos enable ops_control nfsganesha /ganeshavol$i --max_export_iops 14000; done

# for i in $(seq 1501 2000);do ceph nfs export qos enable ops_control nfsganesha /ganeshavol$i --max_export_iops 16000; done


# ceph nfs export info nfsganesha /ganeshavol1800
{
  "access_type": "RW",
  "clients": [],
  "cluster_id": "nfsganesha",
  "export_id": 1800,
  "fsal": {
    "cmount_path": "/",
    "fs_name": "cephfs",
    "name": "CEPH",
    "user_id": "nfs.nfsganesha.cephfs.2c1043d4"
  },
  "path": "/volumes/ganeshagroup/ganesha1800/fc57d302-43b7-44cb-8461-d69f46b0323a",
  "protocols": [
    3,
    4
  ],
  "pseudo": "/ganeshavol1800",
  "qos_block": {
    "combined_rw_bw_control": false,
    "enable_bw_control": true,
    "enable_iops_control": true,
    "enable_qos": true,
    "max_export_iops": 16000,
    "max_export_read_bw": "4.0GB",
    "max_export_write_bw": "4.0GB"
  },
  "security_label": true,
  "squash": "none",
  "transports": [
    "TCP"
  ]
}

Actual results:
===============
The NFS process was observed to be consuming 31.1 GB of memory. At the time, no exports were mounted on any clients, and no I/O operations were being executed.


Expected results:
=================
The NFS process should utilize a significantly lower amount of memory, especially when no exports are mounted on clients and no I/O operations are running


Additional info:

Comment 4 Naresh 2025-03-17 12:38:59 UTC

Please test same scenario with QoS disabled.

And please test with QoS and with "apply" command instead of "ceph mgr" commands.

Please inform if memory usage increased even with "apply" commands

Comment 12 errata-xmlrpc 2025-04-07 15:27:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:3635

Note You need to log in before you can comment on or make changes to this bug.