2239769 – [RHCS 7.0] [NFS-Ganesha] RSS usage is not reduced even though all data has been deleted and all clients have unmounted.

Bug 2239769 - [RHCS 7.0] [NFS-Ganesha] RSS usage is not reduced even though all data has been deleted and all clients have unmounted.

Summary: [RHCS 7.0] [NFS-Ganesha] RSS usage is not reduced even though all data has be...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	NFS-Ganesha
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	7.0
Assignee:	Kaleb KEITHLEY
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:	2246077
Blocks:
TreeView+	depends on / blocked

Reported:	2023-09-20 06:01 UTC by Mohit Bisht
Modified:	2024-03-29 04:25 UTC (History)
CC List:	12 users (show)
Fixed In Version:	nfs-ganesha-5.6-3.el9cp, rhceph-container-7-113
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-17 04:39:02 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	2236325	0	unspecified	CLOSED	Ganesha showing high memory usage (100+ GiB) which is also not being released over time	2024-01-19 11:24:36 UTC
Red Hat Issue Tracker	RHCEPH-7489	0	None	None	None	2023-09-20 06:01:40 UTC

Internal Links: 2236325

Comment 10 Manisha Saini 2023-10-25 07:56:49 UTC

This fix is incomplete and is not fully implemented and as we discussed on Slack, we require corresponding changes on the cephadm side to validate the fix. We are moving this issue to the "assigned" state until the necessary cephadm fixes are made available for QA verification.

Comment 11 Manisha Saini 2023-10-25 09:46:59 UTC

Have raised a separate BZ for Cephadm side changes - https://bugzilla.redhat.com/show_bug.cgi?id=2246077 

Marking this BZ as blocked until the fix is available to the Cephadm Bug

Comment 14 Manisha Saini 2023-11-06 05:47:53 UTC

Observation when tested with the fix
========================


The Smallfile tool encountered a crash when ran in loop, which was performed on 10 exports. BZ - https://bugzilla.redhat.com/show_bug.cgi?id=2247762

Upon running Smallfile for a single iteration on the 10 exports, we noticed that the memory usage peaked at 2 GB. Subsequently, even after unmounting the NFS shares from the client, deleting the exports, and cleaning up the mount point, the memory consumption remained unchanged, persisting at the 2 GB consumption. Ideally it should get reduced as no exports are present at this point of stage.



Tested this with

# rpm -qa | grep nfs
libnfsidmap-2.5.4-18.el9.x86_64
nfs-utils-2.5.4-18.el9.x86_64
nfs-ganesha-selinux-5.6-3.el9cp.noarch
nfs-ganesha-5.6-3.el9cp.x86_64
nfs-ganesha-rgw-5.6-3.el9cp.x86_64
nfs-ganesha-ceph-5.6-3.el9cp.x86_64
nfs-ganesha-rados-grace-5.6-3.el9cp.x86_64
nfs-ganesha-rados-urls-5.6-3.el9cp.x86_64


# ceph --version
ceph version 18.2.0-113.el9cp (32cbda69435c7145d09eeaf5b5016e5d46370a5d) reef (stable)


Logs
======

---> Memory consumption post running single iteration of small file on 10 exports ->

Node 1:
************
top - 20:36:42 up 19 days,  1:08,  2 users,  load average: 5.30, 5.44, 5.40
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.3 us,  0.9 sy,  0.0 ni, 92.6 id,  4.7 wa,  0.1 hi,  0.4 si,  0.0 st
MiB Mem :  63758.3 total,    622.7 free,  14848.5 used,  50277.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  48909.8 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                            
3420347 root      20   0   15.1g   1.9g   8408 S  37.7   3.1 271:18.13 ganesha.nfsd                                                                                       



[root@argo016 ~]# ps -p 3420347 -o pid,%mem,rss,vsz,cmd
    PID %MEM   RSS    VSZ CMD
3420347  3.0 2000100 15809428 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT

Node 2:
*************
top - 20:40:16 up 19 days,  1:15,  1 user,  load average: 3.11, 3.04, 2.95
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  0.5 sy,  0.0 ni, 92.2 id,  6.1 wa,  0.1 hi,  0.2 si,  0.0 st
MiB Mem :  63770.3 total,    362.6 free,  12475.0 used,  52922.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  51295.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                            
2791842 root      20   0 7550652 762672   8288 S  11.7   1.2 109:57.75 ganesha.nfsd                                                                                       



[root@argo018 ~]# ps -p 2791842 -o pid,%mem,rss,vsz,cmd
    PID %MEM   RSS    VSZ CMD
2791842  1.1 762636 7550652 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT


==============

---> After deleting everything from 10 mount points, unmounting the shares from clients, total memory consumption

Node 1:
***********
[root@argo016 ~]# ps -p 3420347 -o pid,%mem,rss,vsz,cmd
    PID %MEM   RSS    VSZ CMD
3420347  3.0 1998652 15651116 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT


top - 03:13:25 up 19 days,  7:45,  1 user,  load average: 2.11, 2.09, 2.09
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  63758.3 total,   1352.1 free,  13704.6 used,  50711.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  50053.8 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                            
3420347 root      20   0   14.9g   1.9g   9816 S   0.0   3.1 280:26.78 ganesha.nfsd     


Node 2:
*************

top - 03:13:59 up 19 days,  7:48,  1 user,  load average: 0.00, 0.02, 0.02
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.1 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  63770.3 total,   1139.2 free,  11710.0 used,  52896.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  52060.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                            
2791842 root      20   0 7520840 762044   8080 S   0.0   1.2 111:32.48 ganesha.nfsd    



[root@argo018 ~]# ps -p 2791842 -o pid,%mem,rss,vsz,cmd
    PID %MEM   RSS    VSZ CMD
2791842  1.1 762044 7520840 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT


===========

---> Deleted all the exports of ganesha cluster.

# ceph nfs export ls nfsganesha
[
  "/ganesha1",
  "/ganesha2",
  "/ganesha3",
  "/ganesha4",
  "/ganesha5",
  "/ganesha6",
  "/ganesha7",
  "/ganesha8",
  "/ganesha9",
  "/ganesha10",
  "/ganesha11",
  "/ganesha12",
  "/ganesha13",
  "/ganesha14",
  "/ganesha15"
]



# for i in $(seq 1 15); do ceph nfs export rm nfsganesha /ganesha$i;done

# ceph nfs export ls nfsganesha
[]


# ceph nfs cluster ls
[
  "nfsganesha"
]

# ceph nfs cluster info nfsganesha
{
  "nfsganesha": {
    "backend": [
      {
        "hostname": "argo016",
        "ip": "10.8.128.216",
        "port": 2049
      },
      {
        "hostname": "argo018",
        "ip": "10.8.128.218",
        "port": 2049
      }
    ],
    "virtual_ip": null
  }
}

============

---> Total memory consumption post deleting the exports

Node 1
******************
[root@argo016 ~]# top -p 3420347

top - 03:20:37 up 19 days,  7:52,  2 users,  load average: 2.08, 2.08, 2.09
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.1 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  63758.3 total,   1281.4 free,  13762.4 used,  50724.5 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  49995.9 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                            
3420347 root      20   0   14.9g   1.9g  10224 S   0.3   3.1 280:27.15 ganesha.nfsd  

[root@argo016 ~]# ps -p 3420347 -o pid,%mem,rss,vsz,cmd
    PID %MEM   RSS    VSZ CMD
3420347  3.0 1999152 15651116 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT

Node 2
********************
[root@argo018 ~]# ps -p 2791842 -o pid,%mem,rss,vsz,cmd
    PID %MEM   RSS    VSZ CMD
2791842  1.1 762164 7520840 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT




The memory usage remains constant and does not decrease as expected.

Comment 15 Kaleb KEITHLEY 2023-11-06 10:06:18 UTC

(In reply to Manisha Saini from comment #14)
> 
> 
> The memory usage remains constant and does not decrease as expected.

The malloc_trim(3) call is run on a 30min timer that starts when ganesha starts. Depending on how long the test takes to run you may need to wait up to 30min to see any memory reduction.

You can also watch the ganesha log file (logging at DEBUG level) and watch for a  "malloc_trim() released some memory" or "malloc_trim() was not able to release memory" message.

Comment 16 Manisha Saini 2023-11-07 04:53:11 UTC

(In reply to Kaleb KEITHLEY from comment #15)
> (In reply to Manisha Saini from comment #14)
> > 
> > 
> > The memory usage remains constant and does not decrease as expected.
> 
> The malloc_trim(3) call is run on a 30min timer that starts when ganesha
> starts. Depending on how long the test takes to run you may need to wait up
> to 30min to see any memory reduction.

The cluster was in same state since yesterday morning.The memory usage remains constant i.e 2GB

[root@argo016 ~]# date
Tue Nov  7 04:52:29 AM UTC 2023

[root@argo016 ~]# ps -p 3420347 -o pid,%mem,rss,vsz,cmd
    PID %MEM   RSS    VSZ CMD
3420347  3.0 1999836 15651116 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT


top - 04:52:56 up 20 days,  9:24,  1 user,  load average: 2.09, 2.14, 2.13
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  63758.3 total,   1626.0 free,  13457.2 used,  50679.6 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  50301.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                             
3420347 root      20   0   14.9g   1.9g  10240 S   0.0   3.1 281:26.46 ganesha.nfsd   
> 
> You can also watch the ganesha log file (logging at DEBUG level) and watch
> for a  "malloc_trim() released some memory" or "malloc_trim() was not able
> to release memory" message.

Comment 17 Kaleb KEITHLEY 2023-11-07 12:59:25 UTC

(In reply to Manisha Saini from comment #16)
> (In reply to Kaleb KEITHLEY from comment #15)
> > (In reply to Manisha Saini from comment #14)
> > > 
> > > 
> > > The memory usage remains constant and does not decrease as expected.
> > 
> > The malloc_trim(3) call is run on a 30min timer that starts when ganesha
> > starts. Depending on how long the test takes to run you may need to wait up
> > to 30min to see any memory reduction.
> 
> The cluster was in same state since yesterday morning.The memory usage
> remains constant i.e 2GB
> 

It's never going to go to zero. That's a pretty good steady state number and there's probably nothing left to release. Check the logs for the debug-level messages I mentioned.

Comment 20 Manisha Saini 2023-11-09 06:42:20 UTC

(In reply to Kaleb KEITHLEY from comment #15)
> (In reply to Manisha Saini from comment #14)
> > 
> > 
> > The memory usage remains constant and does not decrease as expected.
> 
> The malloc_trim(3) call is run on a 30min timer that starts when ganesha
> starts. Depending on how long the test takes to run you may need to wait up
> to 30min to see any memory reduction.
> 
> You can also watch the ganesha log file (logging at DEBUG level) and watch
> for a  "malloc_trim() released some memory" or "malloc_trim() was not able
> to release memory" message.

Unable to enable debug logging. Raised BZ - https://bugzilla.redhat.com/show_bug.cgi?id=2248812

Comment 21 Frank Filz 2023-11-14 00:42:32 UTC

I'm not sure what the minimum RSS for Ganesha actually is.

We did just discover and fix a leak so this is worth running with the latest build to see what effect that fix has on minimum RSS size.

Comment 25 Scott Ostapovicz 2023-11-17 04:39:02 UTC

As noted in the comments above, this BZ describes a healthy process reaching a stable and sustainable plateau of allocated memory.  I am closing this issue as NOTABUG.

Comment 26 Frank Filz 2023-11-21 18:54:16 UTC

Is this the right BZ to attach doc text about the cmount_path option?

Comment 27 Frank Filz 2023-11-21 18:59:08 UTC

No, this is the malloc trim BZ...

Comment 28 Kaleb KEITHLEY 2023-11-21 19:12:58 UTC

this one? https://bugzilla.redhat.com/show_bug.cgi?id=2248084

Comment 32 Red Hat Bugzilla 2024-03-29 04:25:09 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.