2280364 – [NFS-Ganesha][v3] On scale cluster, ganesha memory not getting freed up (~ 9.4 GB ) post completing test and cleanup

Bug 2280364 - [NFS-Ganesha][v3] On scale cluster, ganesha memory not getting freed up (~ 9.4 GB ) post completing test and cleanup

Summary: [NFS-Ganesha][v3] On scale cluster, ganesha memory not getting freed up (~ 9....

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	NFS-Ganesha
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	8.0
Assignee:	Frank Filz
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:	2265322 2320396
Blocks:	2317218
TreeView+	depends on / blocked

Reported:	2024-05-14 15:19 UTC by Kaleb KEITHLEY
Modified:	2024-11-25 09:01 UTC (History)
CC List:	11 users (show)
Fixed In Version:	nfs-ganesha-6.0-5.el9cp
Doc Type:	Bug Fix
Doc Text:	.All memory consumed by the configuration reload process is now released Previously, reload exports would not release all the memory consumed by the configuration reload process causing the memory footprint to increase. With this fix, all memory consumed by the configuration reload process is released resulting in reduced memory footprint.
Clone Of:	2265322
Environment:
Last Closed:	2024-11-25 09:01:26 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-9012	0	None	None	None	2024-05-14 15:24:36 UTC
Red Hat Product Errata	RHBA-2024:10216	0	None	None	None	2024-11-25 09:01:31 UTC

Description Kaleb KEITHLEY 2024-05-14 15:19:34 UTC

+++ This bug was initially created as a clone of Bug #2265322 +++

Description of problem:
=================

While running the scale test on 500 exports in parallel from 100 clients (v3), memory spiked upto 10.3 GB. Post completion of tests, performed rm -rf on mount all the mount points and deleted all exports. But still the memory consumption was around 9.4 GB. The setup is idle from last 2 hours and this memory is not getting released.

Performed same test for v4 mounts and max memory consumption was 3-4 GB. In case of v3, the memory consumption is more then double.


Scale test configuration
------------------------

Exports : 500
Clients : 100 (RHEL clients)
Exports per client : 5
Version : v3
IOTool :FIO
HA configured :yes (All exports were mounted using same vip)


Memory usage post running the test - 10.3GB
===========================================

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
263349 root 20 0 18.2g 10.3g 22784 S 18.7 8.3 356:17.47 ganesha.nfsd


Memory consumption post cleanup - 9.4GB
=======================================

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 263349 root      20   0   18.0g   9.4g  22864 S  19.3   7.5 399:17.32 ganesha.nfsd



Version-Release number of selected component (if applicable):
===========================

# ceph --version
ceph version 18.2.1-33.el9cp (bb22b0dcc4808ae828c6c8266cb1e9bec86f3a8d) reef (stable)

# rpm -qa | grep nfs
libnfsidmap-2.5.4-20.el9.x86_64
nfs-utils-2.5.4-20.el9.x86_64
nfs-ganesha-selinux-5.7-1.el9cp.noarch
nfs-ganesha-5.7-1.el9cp.x86_64
nfs-ganesha-rgw-5.7-1.el9cp.x86_64
nfs-ganesha-ceph-5.7-1.el9cp.x86_64
nfs-ganesha-rados-grace-5.7-1.el9cp.x86_64
nfs-ganesha-rados-urls-5.7-1.el9cp.x86_64


How reproducible:
============
1/1


Steps to Reproduce:
==================

1. Configure ganesha cluster on ceph cluster
2. Create 500 exports 
3. Mount all the exports on 100 RHEL clients via v3 protocol.
4. Trigger FIO from all clients on 500 exports

Actual results:
==============
Memory consumption still remains high post cleanup

Expected results:
===============
Memory consumption should drop post performing cleanup


Additional info:
================

Test Logs:
--------
http://magna002.ceph.redhat.com/ceph-qe-logs/msaini/scale-500-exports-100-clients-fio-v3/

[ceph: root@cali013 /]# ceph nfs cluster ls
[
  "cephfs-nfs"
]

[ceph: root@cali013 /]# ceph nfs cluster info cephfs-nfs
{
  "cephfs-nfs": {
    "backend": [
      {
        "hostname": "cali015",
        "ip": "10.8.130.15",
        "port": 12049
      },
      {
        "hostname": "cali016",
        "ip": "10.8.130.16",
        "port": 12049
      }
    ],
    "monitor_port": 9049,
    "port": 2049,
    "virtual_ip": "10.8.130.236"
  }
}

[ceph: root@cali013 /]# ceph nfs export ls cephfs-nfs
[]
[ceph: root@cali013 /]#

--- Additional comment from Frank Filz on 2024-02-28 23:12:30 UTC ---

I just did some runs using FSAL_VFS and note the same behavior. It's actually pretty much the same with NFSv4 mount.

I did confirm that the file descriptors used for the files get closed when the unexport happens (we expect them to not get closed when the NFSv3 unmount occurs, and in fact, if the files are deleted locally, not via NFS, they don't get closed either).

I did see a small decrease in memory use after the unexport.

I also ran with the latest V6-dev.6 code plus a patch that fixes a DRC memory leak and saw no improvement.

I also added some logging and verified that the MDCACHE entries are released for each file (I ran with one client and one export, creating 10,000 files).

So more investigation into what is occupying the memory.

--- Additional comment from Frank Filz on 2024-03-01 01:01:42 UTC ---

I have done some debugging using valgrind on my FSAL_VFS setup. I see no radical memory leaks (I actually DID find a couple small memory leaks - fixes posted). Valgrind massif shows no significant memory (other than 50 MB of hash tables which we always have) once the exports are removed.

I think this is a situation where due to the way memory is utilized, we simply can not reduce the memory footprint even with malloc trim.

I'm at a loss for what we could do.

--- Additional comment from Manisha Saini on 2024-03-27 07:43:11 UTC ---

Hi Frank, Do we have RCA for the memory issue.

We've noticed that the memory usage is significantly higher on the v3 mount compared to the v4 mount.

Isn't the memory usage high on the idle setup (post completing tests and deleting the files on mounts) for v3?

--- Additional comment from Frank Filz on 2024-03-27 16:29:08 UTC ---

Please try with nfs-ganesha-5.7-2.el9cp

There is a fix:

a8d097edd210b7be14eb813e3eaf8fb503a6f708 FSAL's state_free function called by free_state doesn't actually free

That very likely is the (or at least a major) cause of memory growth.

There are some other fixes that may impact also.

But note also that while I was able to replicate the issue, valgrind memcheck showed no significant memory leaks.

--- Additional comment from RHEL Program Management on 2024-03-28 07:25:28 UTC ---

This bug is not attached to an Errata Tool advisory, so it is reverted to MODIFIED. Please attach this bug to an advisory before moving this bug to ON_QA.

--- Additional comment from  on 2024-03-28 14:40:41 UTC ---

Attaching this bug to the 7.1 errata advisory; need to re-target from 7.2 to 7.1.

Thomas

--- Additional comment from errata-xmlrpc on 2024-03-28 14:42:21 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2024:126567-01
https://errata.engineering.redhat.com/advisory/126567

--- Additional comment from errata-xmlrpc on 2024-03-28 14:42:28 UTC ---

This bug has been added to advisory RHBA-2024:126567 by Thomas Serlin (tserlin)

--- Additional comment from Manisha Saini on 2024-04-02 02:13:16 UTC ---

(In reply to Frank Filz from comment #4)
> Please try with nfs-ganesha-5.7-2.el9cp
> 
> There is a fix:
> 
> a8d097edd210b7be14eb813e3eaf8fb503a6f708 FSAL's state_free function called
> by free_state doesn't actually free
> 
> That very likely is the (or at least a major) cause of memory growth.
> 
> There are some other fixes that may impact also.
> 
> But note also that while I was able to replicate the issue, valgrind
> memcheck showed no significant memory leaks.

Hi Frank,

With latest build - nfs-ganesha-5.7-2 , I again reran test with 1000 exports and 100 clients with v3 mount using FIO.

Observing the same high memory usage of NFS daemon post performing cleanup --> 12.5g

Disk usage post completing IO's - > 43 TiB used, 25 TiB / 69 TiB avail
===========

# ceph -s
  cluster:
    id:     4e687a60-638e-11ee-8772-b49691cee574
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum cali013 (age 2d)
    mgr: cali013.qakwdk(active, since 2d), standbys: cali016.rhribl, cali015.hvvbwh
    mds: 1/1 daemons up, 1 standby
    osd: 35 osds: 28 up (since 2d), 28 in (since 4d)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   9 pools, 1233 pgs
    objects: 3.78M objects, 14 TiB
    usage:   43 TiB used, 25 TiB / 69 TiB avail
    pgs:     1233 active+clean

  io:
    client:   170 B/s rd, 62 MiB/s wr, 0 op/s rd, 92 op/s wr

Memory usage post completing IO's and before cleanup 
==========

Node 1:
—-------

MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 103858.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   19.8g  13.0g  23996 S  17.0  10.4 416:26.67 ganesha.nfsd

Node 2:
—---

MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  74121.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  50503 root      20   0 2962844 218312  23724 S   0.0   0.2   8:50.37 ganesha.nfsd



[ceph: root@cali013 /]# rpm -qa | grep nfs
libnfsidmap-2.5.4-20.el9.x86_64
nfs-utils-2.5.4-20.el9.x86_64
nfs-ganesha-selinux-5.7-2.el9cp.noarch
nfs-ganesha-5.7-2.el9cp.x86_64
nfs-ganesha-rgw-5.7-2.el9cp.x86_64
nfs-ganesha-ceph-5.7-2.el9cp.x86_64
nfs-ganesha-rados-grace-5.7-2.el9cp.x86_64
Nfs-ganesha-rados-urls-5.7-2.el9cp.x86_64


[ceph: root@cali013 /]# ceph --version
ceph version 18.2.1-89.el9cp (926619fe7135cbd6d305b46782ee7ecc7be199a3) reef (stable)


Post clean up (Deleting everything on exports and deleting all 1000 exports)
==========================================================================

Memory usage 
-----------

Node 1:
----
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   19.5g  12.5g  23996 S   0.0  10.0 596:45.88 ganesha.nfsd

Node 2:
-----

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  50503 root      20   0 2962844 219304  23736 S   0.0   0.2  10:18.34 ganesha.nfsd


Disk usage post deleting exports
------------------------

# ceph -s
  cluster:
    id:     4e687a60-638e-11ee-8772-b49691cee574
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum cali013 (age 3d)
    mgr: cali013.qakwdk(active, since 3d), standbys: cali016.rhribl, cali015.hvvbwh
    mds: 1/1 daemons up, 1 standby
    osd: 35 osds: 28 up (since 3d), 28 in (since 5d)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   9 pools, 1233 pgs
    objects: 316.89k objects, 1.2 TiB
    usage:   3.6 TiB used, 65 TiB / 69 TiB avail
    pgs:     1233 active+clean

  io:
    client:   170 B/s rd, 0 op/s rd, 0 op/s wr


[ceph: root@cali013 /]# ceph nfs cluster ls
[
  "cephfs-nfs"
]
[ceph: root@cali013 /]# ceph nfs export ls cephfs-nfs
[]


# ceph nfs cluster info
{
  "cephfs-nfs": {
    "backend": [
      {
        "hostname": "cali015",
        "ip": "10.8.130.15",
        "port": 12049
      },
      {
        "hostname": "cali016",
        "ip": "10.8.130.16",
        "port": 12049
      }
    ],
    "monitor_port": 9049,
    "port": 2049,
    "virtual_ip": "10.8.130.236"
  }
}


Logs -  
===============
FIO instances running on all 1000 exports in parallel - http://magna002.ceph.redhat.com/ceph-qe-logs/msaini/Automation/scale_linux_v3_1000exports_100clients/fio_logs/Test_nfs_scale_with_fio_0.log 

Cleanup on all exports and deleting 1000 exports post test completion - http://magna002.ceph.redhat.com/ceph-qe-logs/msaini/Automation/scale_linux_v3_1000exports_100clients/delete_exports_and_cleanup/Test_nfs_scale_with_fio_0.log

--- Additional comment from Frank Filz on 2024-04-03 14:37:28 UTC ---

Could you run the test several cycles and report on the memory use during and after each cycle?

--- Additional comment from Manisha Saini on 2024-04-07 19:40:36 UTC ---

Test is in process. Will update the results here once completed

--- Additional comment from Manisha Saini on 2024-04-09 02:42:43 UTC ---

Hi Frank,


The test was executed 3 times consecutively, and with each iteration, the memory usage increased by approximately 1 GB. After completing the 3 iterations and returning to an idle state, the NFS daemon consumed 14.6 GB of memory, which is considered high.

Memory utilisation details in each run with IO's and post cleanup



Iteration 1 
*************************

Node 1: After running IO's -->  13.0 GB
—-------

MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 103858.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   19.8g  13.0g  23996 S  17.0  10.4 416:26.67 ganesha.nfsd


Node 1: After cleanup --> 12.5 GB
-------


    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   19.5g  12.5g  23996 S   0.0  10.0 596:45.88 ganesha.nfsd






Iteration 2 
*************************


Node 1: After running IO's --> 15.7 GB
--------
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 100842.1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   22.0g  15.7g  23996 S  25.0  12.6 935:58.53 ganesha.nfsd



Node 1: After cleanup --> 13.9 GB
-------

MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 102730.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   20.7g  13.9g  23996 S   0.0  11.1   1095:12 ganesha.nfsd




Iteration 3 
*************************

Node 1: After running IO's --> 16.4 GB
------

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   22.4g  16.4g  23996 S   0.0  13.1   1793:14 ganesha.nfsd

Node 1: After cleanup --> 14.6 GB
-------


    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   21.4g  14.6g  23996 S   0.3  11.7   1793:46 ganesha.nfsd

--- Additional comment from Matt Benjamin (redhat) on 2024-04-10 15:31:09 UTC ---

Is this really a leak, though?  can the either  virt or res be induced to rise to an arbitrary level (ranging up to 100GB, for example, as we observed with the object cacher)?

thanks!

Matt

--- Additional comment from Frank Filz on 2024-04-10 17:26:57 UTC ---

I think we may leak some small bits for each export during each export re-load.

The way exports are configured and removed, we reload exports 500 times during export setup, and 500 times during unload. This means we see some 250k structures (from (N * (N+1)/2 twice) lost each test cycle.

I did fix one set of leaks in V5.7-2 (which is being used per the bug details above), but Valgrind showed some possibly lost memory that I didn't get a chance to chase down. If there really is a leak there, 250k structures per cycle does add up to something, though gigabyte means 4k memory per such structure... hmm... that's a page... Heap fragmentation might be a culprit here...

--- Additional comment from Manisha Saini on 2024-04-12 05:09:32 UTC ---

(In reply to Matt Benjamin (redhat) from comment #13)
> Is this really a leak, though?  can the either  virt or res be induced to
> rise to an arbitrary level (ranging up to 100GB, for example, as we observed
> with the object cacher)?
> 
> thanks!
> 
> Matt



Is it typical for memory consumption to increase with each run and for memory not to be released after cleanup? This behavior hasn't been observed with NFS v4.
The problem we encountered, where 100GB was consumed in 7.0, was related to the smallfile IO tool. However, we're unable to run the smallfile tool due to the existing bug - https://bugzilla.redhat.com/show_bug.cgi?id=2247762.

--- Additional comment from Manisha Saini on 2024-04-23 18:19:04 UTC ---

As suggested, I ran the same test for NFSv4.1 (2 Iterations) with 7.1 Build, observing the same memory growth (as we saw for NFSv3) with the last run.


Below are the stats for 2 Iterations for NFSv4.1 --

First Iteration
===========
Post Ruuning IO’s -
Node 1: 10.8g      |   Node 2: 13.0g

Post cleanup- 
Node 1: 9.7g       |    Node 2: 11.1g


2nd Iteration
===========
Post running IO’s
Node 1: 13.1g      |   Node 2: 14.5g

Post cleanup:
Node 1: 11.7g.     |  Node 2: 13.4g



It appears that the problem is not limited to NFSv3. It is also observed with NFSv4.

--- Additional comment from Manisha Saini on 2024-05-02 12:24:48 UTC ---

Again Ran test on the scratch build provided by dev having additional memory fixes

[ceph: root@cali013 /]# ceph --version
ceph version 18.2.1-150.el9cp (4a63dafcc8b87d799b599d01d34a419e85212ed1) reef (stable)


[ceph: root@cali013 /]# rpm -qa | grep nfs
libnfsidmap-2.5.4-20.el9.x86_64
nfs-utils-2.5.4-20.el9.x86_64
nfs-ganesha-selinux-5.7-3.0.TEST.ffilz20240422.el9cp.noarch
nfs-ganesha-5.7-3.0.TEST.ffilz20240422.el9cp.x86_64
nfs-ganesha-rgw-5.7-3.0.TEST.ffilz20240422.el9cp.x86_64
nfs-ganesha-ceph-5.7-3.0.TEST.ffilz20240422.el9cp.x86_64
nfs-ganesha-rados-grace-5.7-3.0.TEST.ffilz20240422.el9cp.x86_64
nfs-ganesha-rados-urls-5.7-3.0.TEST.ffilz20240422.el9cp.x86_64

Here are the results of 3 Iterations :

Exports : 1000
Clients : 100
Mount version : v3


First run
----------
Memory consumption after IO operations: 13.3g
Memory consumption after cleanup: 12.1g


Second run
------------
Memory consumption after IO operations: 15.5g
Memory consumption after cleanup: 14.0g


Third run
-------------
Memory consumption after IO operations: 16.3g
Memory consumption after cleanup: 15.0g

--- Additional comment from Scott Ostapovicz on 2024-05-13 20:10:04 UTC ---

So as I read this, Frank has made real progress in fixing some memory leaks here, and has confirmed this with Valgrind.  Manisha has confirmed that the remaining leakage affects both NFS v3 and NFS v4.  Might I suggest we accept the forward progress we have made in 7.1, mark this specific BZ as fixed (since it addressed the original problems, but not ALL problems), note in the errata that it is better than ever but still a work in progress, and then clone this issue for 7.1 z1 so Frank can continue his work addressing memory leakage as we constantly move the quality forward?

Comment 9 errata-xmlrpc 2024-11-25 09:01:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:10216

Note You need to log in before you can comment on or make changes to this bug.