Bug 2236325 - Ganesha showing high memory usage (100+ GiB) which is also not being released over time
Summary: Ganesha showing high memory usage (100+ GiB) which is also not being released...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: NFS-Ganesha
Version: 7.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 7.0
Assignee: Frank Filz
QA Contact: Manisha Saini
Rivka Pollack
URL:
Whiteboard:
Depends On: 2246077
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-30 22:24 UTC by Paul Cuzner
Modified: 2024-01-19 11:23 UTC (History)
11 users (show)

Fixed In Version: nfs-ganesha-5.6-3.el9cp, rhceph-container-7-113
Doc Type: Enhancement
Doc Text:
.Add `cmount_path` option and generate unique user ID With this enhancement, you can add the optional `cmount_path` option and generate a unique user ID for each Ceph File System to allow sharing CephFS clients across multiple Ganesha exports thereby reducing the memory usage for a single CephFS client.
Clone Of:
Environment:
Last Closed: 2023-12-13 15:22:27 UTC
Embargoed:


Attachments (Terms of Use)
cephfs memory consumption (103.03 KB, image/png)
2023-08-30 22:25 UTC, Paul Cuzner
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2239769 0 unspecified CLOSED [RHCS 7.0] [NFS-Ganesha] RSS usage is not reduced even though all data has been deleted and all clients have unmounted. 2024-03-29 04:25:53 UTC
Red Hat Issue Tracker RHCEPH-7293 0 None None None 2023-08-30 22:25:52 UTC
Red Hat Product Errata RHBA-2023:7780 0 None None None 2023-12-13 15:22:37 UTC

Internal Links: 2231151 2239769

Comment 1 Paul Cuzner 2023-08-30 22:25:49 UTC
Created attachment 1986206 [details]
cephfs memory consumption

Comment 9 Frank Filz 2023-09-22 14:17:06 UTC
Copying comment over from:

https://bugzilla.redhat.com/show_bug.cgi?id=2239769
[RHCS 7.0] [NFS-Ganesha] RSS usage is not reduced even though all data has been deleted and all clients have unmounted.

Also note that the bulk of Ganesha's memory usage is the cache. The cache does not shrink below the high water mark size, with older entries being re-used. Because of this, I would expect memory to grow as a fresh Ganesha instance is brought under load, and then memory to stabilize. The only time memory would grow significantly beyond that and then shrink back (with the caveat Kaleb mentioned that RSS size may not actually be able to be reduced) is when there is a transient load that demands more of the cache and it goes above the high water mark. Ganesha releases prior to V5.x do have a problem of poor cache management that makes growth above the high water mark almost a sure thing. That may have raised an expectation that an idle Ganesha would reduce memory use.

And yes, the cephfs clients are going to stay present unless the EXPORT is removed. And even then, as discussed in 2239769, RSS may not shrink.

Comment 10 Paul Cuzner 2023-09-27 20:13:37 UTC
I repeated my 100 client test against a single export (i.e. single cephfs client at the backend), with each client using a separate directory on the mount.

Although this is showing significantly less memory consumption, memory usage continues to climb even when
- the data is deleted
- the test run is repeated
 
also even with all data deleted and 12 hours of idle, the RAM usage has not reduced. With continual growth it's seems problematic for Ganesha to be used in memory constrained environments like OCP/ODF.

I've attached a screenshot to map the capacity used in the cluster against the RSS memory consumed by the Ganesha daemons (x2)

Comment 16 Manisha Saini 2023-10-25 07:57:22 UTC
This fix is incomplete and is not fully implemented and as we discussed on Slack, we require corresponding changes on the cephadm side to validate the fix. We are moving this issue to the "assigned" state until the necessary cephadm fixes are made available for QA verification.

Comment 17 Manisha Saini 2023-10-25 09:47:15 UTC
Have raised a separate BZ for Cephadm side changes - https://bugzilla.redhat.com/show_bug.cgi?id=2246077 

Marking this BZ as blocked until the fix is available to the Cephadm Bug

Comment 20 Manisha Saini 2023-11-14 07:17:56 UTC
Summary-

Observation with the latest build ---

By using a single cephfs client, the Ganesha daemons:
• use considerably less memory (100GB → 4GB).
• do not trigger any healthchecks (MDS_TRIM was active 4 times in the mult-client tests).
• reduce the RAM usage of the MDS daemon1.
• produce more consistent client performance from each Ganesha daemon


Full report - https://ibm.ent.box.com/s/38ax1sekmm9wvp5er332sghx06c2xvh9

For memory leak, we have another BZ to track - https://bugzilla.redhat.com/show_bug.cgi?id=2239769

Moving this BZ to verified state.

Comment 22 Frank Filz 2023-11-21 18:58:23 UTC
Oh, this is the BZ for cmount_path. Yes, it is needed for 7.0.

Comment 23 errata-xmlrpc 2023-12-13 15:22:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780


Note You need to log in before you can comment on or make changes to this bug.