Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2236325

Summary: Ganesha showing high memory usage (100+ GiB) which is also not being released over time
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Paul Cuzner <pcuzner>
Component: NFS-GaneshaAssignee: Frank Filz <ffilz>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact: Rivka Pollack <rpollack>
Priority: unspecified    
Version: 7.0CC: akraj, cephqe-warriors, ffilz, gdeschner, gouthamr, kkeithle, mbenjamin, mobisht, tserlin, vdas, vumrao
Target Milestone: ---Keywords: Performance, Scale
Target Release: 7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-ganesha-5.6-3.el9cp, rhceph-container-7-113 Doc Type: Enhancement
Doc Text:
.Add `cmount_path` option and generate unique user ID With this enhancement, you can add the optional `cmount_path` option and generate a unique user ID for each Ceph File System to allow sharing CephFS clients across multiple Ganesha exports thereby reducing the memory usage for a single CephFS client.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-12-13 15:22:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2246077    
Bug Blocks:    
Attachments:
Description Flags
cephfs memory consumption none

Comment 1 Paul Cuzner 2023-08-30 22:25:49 UTC
Created attachment 1986206 [details]
cephfs memory consumption

Comment 9 Frank Filz 2023-09-22 14:17:06 UTC
Copying comment over from:

https://bugzilla.redhat.com/show_bug.cgi?id=2239769
[RHCS 7.0] [NFS-Ganesha] RSS usage is not reduced even though all data has been deleted and all clients have unmounted.

Also note that the bulk of Ganesha's memory usage is the cache. The cache does not shrink below the high water mark size, with older entries being re-used. Because of this, I would expect memory to grow as a fresh Ganesha instance is brought under load, and then memory to stabilize. The only time memory would grow significantly beyond that and then shrink back (with the caveat Kaleb mentioned that RSS size may not actually be able to be reduced) is when there is a transient load that demands more of the cache and it goes above the high water mark. Ganesha releases prior to V5.x do have a problem of poor cache management that makes growth above the high water mark almost a sure thing. That may have raised an expectation that an idle Ganesha would reduce memory use.

And yes, the cephfs clients are going to stay present unless the EXPORT is removed. And even then, as discussed in 2239769, RSS may not shrink.

Comment 10 Paul Cuzner 2023-09-27 20:13:37 UTC
I repeated my 100 client test against a single export (i.e. single cephfs client at the backend), with each client using a separate directory on the mount.

Although this is showing significantly less memory consumption, memory usage continues to climb even when
- the data is deleted
- the test run is repeated
 
also even with all data deleted and 12 hours of idle, the RAM usage has not reduced. With continual growth it's seems problematic for Ganesha to be used in memory constrained environments like OCP/ODF.

I've attached a screenshot to map the capacity used in the cluster against the RSS memory consumed by the Ganesha daemons (x2)

Comment 16 Manisha Saini 2023-10-25 07:57:22 UTC
This fix is incomplete and is not fully implemented and as we discussed on Slack, we require corresponding changes on the cephadm side to validate the fix. We are moving this issue to the "assigned" state until the necessary cephadm fixes are made available for QA verification.

Comment 17 Manisha Saini 2023-10-25 09:47:15 UTC
Have raised a separate BZ for Cephadm side changes - https://bugzilla.redhat.com/show_bug.cgi?id=2246077 

Marking this BZ as blocked until the fix is available to the Cephadm Bug

Comment 20 Manisha Saini 2023-11-14 07:17:56 UTC
Summary-

Observation with the latest build ---

By using a single cephfs client, the Ganesha daemons:
• use considerably less memory (100GB → 4GB).
• do not trigger any healthchecks (MDS_TRIM was active 4 times in the mult-client tests).
• reduce the RAM usage of the MDS daemon1.
• produce more consistent client performance from each Ganesha daemon


Full report - https://ibm.ent.box.com/s/38ax1sekmm9wvp5er332sghx06c2xvh9

For memory leak, we have another BZ to track - https://bugzilla.redhat.com/show_bug.cgi?id=2239769

Moving this BZ to verified state.

Comment 22 Frank Filz 2023-11-21 18:58:23 UTC
Oh, this is the BZ for cmount_path. Yes, it is needed for 7.0.

Comment 23 errata-xmlrpc 2023-12-13 15:22:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780