Created attachment 1986206 [details] cephfs memory consumption
Copying comment over from: https://bugzilla.redhat.com/show_bug.cgi?id=2239769 [RHCS 7.0] [NFS-Ganesha] RSS usage is not reduced even though all data has been deleted and all clients have unmounted. Also note that the bulk of Ganesha's memory usage is the cache. The cache does not shrink below the high water mark size, with older entries being re-used. Because of this, I would expect memory to grow as a fresh Ganesha instance is brought under load, and then memory to stabilize. The only time memory would grow significantly beyond that and then shrink back (with the caveat Kaleb mentioned that RSS size may not actually be able to be reduced) is when there is a transient load that demands more of the cache and it goes above the high water mark. Ganesha releases prior to V5.x do have a problem of poor cache management that makes growth above the high water mark almost a sure thing. That may have raised an expectation that an idle Ganesha would reduce memory use. And yes, the cephfs clients are going to stay present unless the EXPORT is removed. And even then, as discussed in 2239769, RSS may not shrink.
I repeated my 100 client test against a single export (i.e. single cephfs client at the backend), with each client using a separate directory on the mount. Although this is showing significantly less memory consumption, memory usage continues to climb even when - the data is deleted - the test run is repeated also even with all data deleted and 12 hours of idle, the RAM usage has not reduced. With continual growth it's seems problematic for Ganesha to be used in memory constrained environments like OCP/ODF. I've attached a screenshot to map the capacity used in the cluster against the RSS memory consumed by the Ganesha daemons (x2)
This fix is incomplete and is not fully implemented and as we discussed on Slack, we require corresponding changes on the cephadm side to validate the fix. We are moving this issue to the "assigned" state until the necessary cephadm fixes are made available for QA verification.
Have raised a separate BZ for Cephadm side changes - https://bugzilla.redhat.com/show_bug.cgi?id=2246077 Marking this BZ as blocked until the fix is available to the Cephadm Bug
Summary- Observation with the latest build --- By using a single cephfs client, the Ganesha daemons: • use considerably less memory (100GB → 4GB). • do not trigger any healthchecks (MDS_TRIM was active 4 times in the mult-client tests). • reduce the RAM usage of the MDS daemon1. • produce more consistent client performance from each Ganesha daemon Full report - https://ibm.ent.box.com/s/38ax1sekmm9wvp5er332sghx06c2xvh9 For memory leak, we have another BZ to track - https://bugzilla.redhat.com/show_bug.cgi?id=2239769 Moving this BZ to verified state.
Oh, this is the BZ for cmount_path. Yes, it is needed for 7.0.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:7780