Bug 2320396

Summary: [NFS-Ganesha] [V3] Memory leak is observed while mounting the export via v3, causing NFS process crash
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Manisha Saini <msaini>
Component: NFS-GaneshaAssignee: Sachin Punadikar <spunadik>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.0CC: cephqe-warriors, kkeithle, spunadik, tserlin, vdas
Target Milestone: ---Keywords: Regression
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nfs-ganesha-6.0-8.el9cp; libntirpc-6.0-2.el9cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2321108 (view as bug list) Environment:
Last Closed: 2024-11-25 09:13:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2280364, 2321108    

Description Manisha Saini 2024-10-21 16:27:59 UTC
Description of problem:
========================
On a scale cluster, create 2000 NFS exports. Mount the export on the client using version 3. During the first export mount, memory usage unexpectedly increased, leading to an NFS service crash and failed mount with vers=3.

# ls -lart
total 16388
drwxr-xr-x. 7 root root    4096 Jul 17  2023  ..
-rw-r-----. 1 root root 7272427 Oct 21 15:39 'core.ganesha\x2enfsd.0.92b21226dd874ad1bcbf401f0dd547d2.37084.1729525023000000.zst'
-rw-r-----. 1 root root 9486612 Oct 21 15:56 'core.ganesha\x2enfsd.0.92b21226dd874ad1bcbf401f0dd547d2.126145.1729526076000000.zst'


------
warning: Error reading shared library list entry at 0x696e6e75723a5652

warning: Error reading shared library list entry at 0x69626f6e6e613a56
Failed to read a valid object file image from memory.
--Type <RET> for more, q to quit, c to continue without paging--
Core was generated by `/usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fafa282794c in ?? ()
[Current thread is 1 (LWP 67)]
(gdb) bt
#0  0x00007fafa282794c in ?? ()
Backtrace stopped: Cannot access memory at address 0x7faf33ffdf80
(gdb)

============================


Output from the Ganesha process after the first mount failure shows that memory usage increased on an idle setup following the failed mount.

[root@cali015 coredump]# top -p 136052
top - 16:21:25 up 34 days, 19:36,  1 user,  load average: 6.62, 2.00, 3.85
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.6 us,  1.7 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 127826.4 total,  32772.4 free,  94554.5 used,   3422.0 buff/cache
MiB Swap:   4096.0 total,   3083.7 free,   1012.2 used.  33271.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 136052 root      20   0 2034.5g  75.1g  23136 D 128.6  60.1   1:55.43 ganesha.nfsd

=========================



# ceph nfs cluster info nfsganesha
{
  "nfsganesha": {
    "backend": [
      {
        "hostname": "cali015",
        "ip": "10.8.130.15",
        "port": 12049
      },
      {
        "hostname": "cali016",
        "ip": "10.8.130.16",
        "port": 12049
      }
    ],
    "monitor_port": 9049,
    "port": 2049,
    "virtual_ip": "10.8.130.236"
  }
}



Version-Release number of selected component (if applicable):


How reproducible:
================
2/2


Steps to Reproduce:
===================
1.Create ganesha cluster
2.Create 2000 NFS exports
3.Mount the 1st export on 1 client via vers=3

Actual results:
============
v3 Mount failed. Memory increased to 75GB.


Expected results:
=============
v3 mount should work. No memory leak should be expected


Additional info:

Comment 12 errata-xmlrpc 2024-11-25 09:13:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:10216