Bug 2265322

Summary:	[NFS-Ganesha][v3] On scale cluster, ganesha memory not getting freed up (~ 9.4 GB ) post completing test and cleanup
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Manisha Saini <msaini>
Component:	NFS-Ganesha	Assignee:	Frank Filz <ffilz>
Status:	CLOSED ERRATA	QA Contact:	Manisha Saini <msaini>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	7.1	CC:	akraj, cephqe-warriors, ffilz, gouthamr, kkeithle, mbenjamin, sostapov, tserlin, vdas
Target Milestone:	---	Keywords:	Automation
Target Release:	7.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	nfs-ganesha-5.7-4.el9cp	Doc Type:	Bug Fix
Doc Text:	.All memory consumed by the configuration reload process is now released Previously, reload exports would not release all the memory consumed by the configuration reload process causing the memory footprint to increase. With this fix, all memory consumed by the configuration reload process is released resulting in reduced memory footprint.	Story Points:	---
Clone Of:
Clones:	2280364 (view as bug list)		Environment:
Last Closed:	2024-06-13 14:27:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2280364

Description Manisha Saini 2024-02-21 13:19:43 UTC

Description of problem:
=================

While running the scale test on 500 exports in parallel from 100 clients (v3), memory spiked upto 10.3 GB. Post completion of tests, performed rm -rf on mount all the mount points and deleted all exports. But still the memory consumption was around 9.4 GB. The setup is idle from last 2 hours and this memory is not getting released.

Performed same test for v4 mounts and max memory consumption was 3-4 GB. In case of v3, the memory consumption is more then double.


Scale test configuration
------------------------

Exports : 500
Clients : 100 (RHEL clients)
Exports per client : 5
Version : v3
IOTool :FIO
HA configured :yes (All exports were mounted using same vip)


Memory usage post running the test - 10.3GB
===========================================

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
263349 root 20 0 18.2g 10.3g 22784 S 18.7 8.3 356:17.47 ganesha.nfsd


Memory consumption post cleanup - 9.4GB
=======================================

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 263349 root      20   0   18.0g   9.4g  22864 S  19.3   7.5 399:17.32 ganesha.nfsd



Version-Release number of selected component (if applicable):
===========================

# ceph --version
ceph version 18.2.1-33.el9cp (bb22b0dcc4808ae828c6c8266cb1e9bec86f3a8d) reef (stable)

# rpm -qa | grep nfs
libnfsidmap-2.5.4-20.el9.x86_64
nfs-utils-2.5.4-20.el9.x86_64
nfs-ganesha-selinux-5.7-1.el9cp.noarch
nfs-ganesha-5.7-1.el9cp.x86_64
nfs-ganesha-rgw-5.7-1.el9cp.x86_64
nfs-ganesha-ceph-5.7-1.el9cp.x86_64
nfs-ganesha-rados-grace-5.7-1.el9cp.x86_64
nfs-ganesha-rados-urls-5.7-1.el9cp.x86_64


How reproducible:
============
1/1


Steps to Reproduce:
==================

1. Configure ganesha cluster on ceph cluster
2. Create 500 exports 
3. Mount all the exports on 100 RHEL clients via v3 protocol.
4. Trigger FIO from all clients on 500 exports

Actual results:
==============
Memory consumption still remains high post cleanup

Expected results:
===============
Memory consumption should drop post performing cleanup


Additional info:
================

Test Logs:
--------
http://magna002.ceph.redhat.com/ceph-qe-logs/msaini/scale-500-exports-100-clients-fio-v3/

[ceph: root@cali013 /]# ceph nfs cluster ls
[
  "cephfs-nfs"
]

[ceph: root@cali013 /]# ceph nfs cluster info cephfs-nfs
{
  "cephfs-nfs": {
    "backend": [
      {
        "hostname": "cali015",
        "ip": "10.8.130.15",
        "port": 12049
      },
      {
        "hostname": "cali016",
        "ip": "10.8.130.16",
        "port": 12049
      }
    ],
    "monitor_port": 9049,
    "port": 2049,
    "virtual_ip": "10.8.130.236"
  }
}

[ceph: root@cali013 /]# ceph nfs export ls cephfs-nfs
[]
[ceph: root@cali013 /]#

Comment 1 Frank Filz 2024-02-28 23:12:30 UTC

I just did some runs using FSAL_VFS and note the same behavior. It's actually pretty much the same with NFSv4 mount.

I did confirm that the file descriptors used for the files get closed when the unexport happens (we expect them to not get closed when the NFSv3 unmount occurs, and in fact, if the files are deleted locally, not via NFS, they don't get closed either).

I did see a small decrease in memory use after the unexport.

I also ran with the latest V6-dev.6 code plus a patch that fixes a DRC memory leak and saw no improvement.

I also added some logging and verified that the MDCACHE entries are released for each file (I ran with one client and one export, creating 10,000 files).

So more investigation into what is occupying the memory.

Comment 2 Frank Filz 2024-03-01 01:01:42 UTC

I have done some debugging using valgrind on my FSAL_VFS setup. I see no radical memory leaks (I actually DID find a couple small memory leaks - fixes posted). Valgrind massif shows no significant memory (other than 50 MB of hash tables which we always have) once the exports are removed.

I think this is a situation where due to the way memory is utilized, we simply can not reduce the memory footprint even with malloc trim.

I'm at a loss for what we could do.

Comment 3 Manisha Saini 2024-03-27 07:43:11 UTC

Hi Frank, Do we have RCA for the memory issue.

We've noticed that the memory usage is significantly higher on the v3 mount compared to the v4 mount.

Isn't the memory usage high on the idle setup (post completing tests and deleting the files on mounts) for v3?

Comment 4 Frank Filz 2024-03-27 16:29:08 UTC

Please try with nfs-ganesha-5.7-2.el9cp

There is a fix:

a8d097edd210b7be14eb813e3eaf8fb503a6f708 FSAL's state_free function called by free_state doesn't actually free

That very likely is the (or at least a major) cause of memory growth.

There are some other fixes that may impact also.

But note also that while I was able to replicate the issue, valgrind memcheck showed no significant memory leaks.

Comment 9 Manisha Saini 2024-04-02 02:13:16 UTC

(In reply to Frank Filz from comment #4)
> Please try with nfs-ganesha-5.7-2.el9cp
> 
> There is a fix:
> 
> a8d097edd210b7be14eb813e3eaf8fb503a6f708 FSAL's state_free function called
> by free_state doesn't actually free
> 
> That very likely is the (or at least a major) cause of memory growth.
> 
> There are some other fixes that may impact also.
> 
> But note also that while I was able to replicate the issue, valgrind
> memcheck showed no significant memory leaks.

Hi Frank,

With latest build - nfs-ganesha-5.7-2 , I again reran test with 1000 exports and 100 clients with v3 mount using FIO.

Observing the same high memory usage of NFS daemon post performing cleanup --> 12.5g

Disk usage post completing IO's - > 43 TiB used, 25 TiB / 69 TiB avail
===========

# ceph -s
  cluster:
    id:     4e687a60-638e-11ee-8772-b49691cee574
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum cali013 (age 2d)
    mgr: cali013.qakwdk(active, since 2d), standbys: cali016.rhribl, cali015.hvvbwh
    mds: 1/1 daemons up, 1 standby
    osd: 35 osds: 28 up (since 2d), 28 in (since 4d)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   9 pools, 1233 pgs
    objects: 3.78M objects, 14 TiB
    usage:   43 TiB used, 25 TiB / 69 TiB avail
    pgs:     1233 active+clean

  io:
    client:   170 B/s rd, 62 MiB/s wr, 0 op/s rd, 92 op/s wr

Memory usage post completing IO's and before cleanup 
==========

Node 1:
—-------

MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 103858.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   19.8g  13.0g  23996 S  17.0  10.4 416:26.67 ganesha.nfsd

Node 2:
—---

MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  74121.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  50503 root      20   0 2962844 218312  23724 S   0.0   0.2   8:50.37 ganesha.nfsd



[ceph: root@cali013 /]# rpm -qa | grep nfs
libnfsidmap-2.5.4-20.el9.x86_64
nfs-utils-2.5.4-20.el9.x86_64
nfs-ganesha-selinux-5.7-2.el9cp.noarch
nfs-ganesha-5.7-2.el9cp.x86_64
nfs-ganesha-rgw-5.7-2.el9cp.x86_64
nfs-ganesha-ceph-5.7-2.el9cp.x86_64
nfs-ganesha-rados-grace-5.7-2.el9cp.x86_64
Nfs-ganesha-rados-urls-5.7-2.el9cp.x86_64


[ceph: root@cali013 /]# ceph --version
ceph version 18.2.1-89.el9cp (926619fe7135cbd6d305b46782ee7ecc7be199a3) reef (stable)


Post clean up (Deleting everything on exports and deleting all 1000 exports)
==========================================================================

Memory usage 
-----------

Node 1:
----
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   19.5g  12.5g  23996 S   0.0  10.0 596:45.88 ganesha.nfsd

Node 2:
-----

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  50503 root      20   0 2962844 219304  23736 S   0.0   0.2  10:18.34 ganesha.nfsd


Disk usage post deleting exports
------------------------

# ceph -s
  cluster:
    id:     4e687a60-638e-11ee-8772-b49691cee574
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum cali013 (age 3d)
    mgr: cali013.qakwdk(active, since 3d), standbys: cali016.rhribl, cali015.hvvbwh
    mds: 1/1 daemons up, 1 standby
    osd: 35 osds: 28 up (since 3d), 28 in (since 5d)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   9 pools, 1233 pgs
    objects: 316.89k objects, 1.2 TiB
    usage:   3.6 TiB used, 65 TiB / 69 TiB avail
    pgs:     1233 active+clean

  io:
    client:   170 B/s rd, 0 op/s rd, 0 op/s wr


[ceph: root@cali013 /]# ceph nfs cluster ls
[
  "cephfs-nfs"
]
[ceph: root@cali013 /]# ceph nfs export ls cephfs-nfs
[]


# ceph nfs cluster info
{
  "cephfs-nfs": {
    "backend": [
      {
        "hostname": "cali015",
        "ip": "10.8.130.15",
        "port": 12049
      },
      {
        "hostname": "cali016",
        "ip": "10.8.130.16",
        "port": 12049
      }
    ],
    "monitor_port": 9049,
    "port": 2049,
    "virtual_ip": "10.8.130.236"
  }
}


Logs -  
===============
FIO instances running on all 1000 exports in parallel - http://magna002.ceph.redhat.com/ceph-qe-logs/msaini/Automation/scale_linux_v3_1000exports_100clients/fio_logs/Test_nfs_scale_with_fio_0.log 

Cleanup on all exports and deleting 1000 exports post test completion - http://magna002.ceph.redhat.com/ceph-qe-logs/msaini/Automation/scale_linux_v3_1000exports_100clients/delete_exports_and_cleanup/Test_nfs_scale_with_fio_0.log

Comment 10 Frank Filz 2024-04-03 14:37:28 UTC

Could you run the test several cycles and report on the memory use during and after each cycle?

Comment 11 Manisha Saini 2024-04-07 19:40:36 UTC

Test is in process. Will update the results here once completed

Comment 12 Manisha Saini 2024-04-09 02:42:43 UTC

Hi Frank,


The test was executed 3 times consecutively, and with each iteration, the memory usage increased by approximately 1 GB. After completing the 3 iterations and returning to an idle state, the NFS daemon consumed 14.6 GB of memory, which is considered high.

Memory utilisation details in each run with IO's and post cleanup



Iteration 1 
*************************

Node 1: After running IO's -->  13.0 GB
—-------

MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 103858.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   19.8g  13.0g  23996 S  17.0  10.4 416:26.67 ganesha.nfsd


Node 1: After cleanup --> 12.5 GB
-------


    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   19.5g  12.5g  23996 S   0.0  10.0 596:45.88 ganesha.nfsd






Iteration 2 
*************************


Node 1: After running IO's --> 15.7 GB
--------
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 100842.1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   22.0g  15.7g  23996 S  25.0  12.6 935:58.53 ganesha.nfsd



Node 1: After cleanup --> 13.9 GB
-------

MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 102730.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   20.7g  13.9g  23996 S   0.0  11.1   1095:12 ganesha.nfsd




Iteration 3 
*************************

Node 1: After running IO's --> 16.4 GB
------

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   22.4g  16.4g  23996 S   0.0  13.1   1793:14 ganesha.nfsd

Node 1: After cleanup --> 14.6 GB
-------


    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  92243 root      20   0   21.4g  14.6g  23996 S   0.3  11.7   1793:46 ganesha.nfsd

Comment 14 Frank Filz 2024-04-10 17:26:57 UTC

I think we may leak some small bits for each export during each export re-load.

The way exports are configured and removed, we reload exports 500 times during export setup, and 500 times during unload. This means we see some 250k structures (from (N * (N+1)/2 twice) lost each test cycle.

I did fix one set of leaks in V5.7-2 (which is being used per the bug details above), but Valgrind showed some possibly lost memory that I didn't get a chance to chase down. If there really is a leak there, 250k structures per cycle does add up to something, though gigabyte means 4k memory per such structure... hmm... that's a page... Heap fragmentation might be a culprit here...

Comment 16 Manisha Saini 2024-04-23 18:19:04 UTC

As suggested, I ran the same test for NFSv4.1 (2 Iterations) with 7.1 Build, observing the same memory growth (as we saw for NFSv3) with the last run.


Below are the stats for 2 Iterations for NFSv4.1 --

First Iteration
===========
Post Ruuning IO’s -
Node 1: 10.8g      |   Node 2: 13.0g

Post cleanup- 
Node 1: 9.7g       |    Node 2: 11.1g


2nd Iteration
===========
Post running IO’s
Node 1: 13.1g      |   Node 2: 14.5g

Post cleanup:
Node 1: 11.7g.     |  Node 2: 13.4g



It appears that the problem is not limited to NFSv3. It is also observed with NFSv4.

Comment 18 Scott Ostapovicz 2024-05-13 20:10:04 UTC

So as I read this, Frank has made real progress in fixing some memory leaks here, and has confirmed this with Valgrind.  Manisha has confirmed that the remaining leakage affects both NFS v3 and NFS v4.  Might I suggest we accept the forward progress we have made in 7.1, mark this specific BZ as fixed (since it addressed the original problems, but not ALL problems), note in the errata that it is better than ever but still a work in progress, and then clone this issue for 7.1 z1 so Frank can continue his work addressing memory leakage as we constantly move the quality forward?

Comment 23 errata-xmlrpc 2024-06-13 14:27:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925