+++ This bug was initially created as a clone of Bug #2265322 +++ Description of problem: ================= While running the scale test on 500 exports in parallel from 100 clients (v3), memory spiked upto 10.3 GB. Post completion of tests, performed rm -rf on mount all the mount points and deleted all exports. But still the memory consumption was around 9.4 GB. The setup is idle from last 2 hours and this memory is not getting released. Performed same test for v4 mounts and max memory consumption was 3-4 GB. In case of v3, the memory consumption is more then double. Scale test configuration ------------------------ Exports : 500 Clients : 100 (RHEL clients) Exports per client : 5 Version : v3 IOTool :FIO HA configured :yes (All exports were mounted using same vip) Memory usage post running the test - 10.3GB =========================================== PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 263349 root 20 0 18.2g 10.3g 22784 S 18.7 8.3 356:17.47 ganesha.nfsd Memory consumption post cleanup - 9.4GB ======================================= PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 263349 root 20 0 18.0g 9.4g 22864 S 19.3 7.5 399:17.32 ganesha.nfsd Version-Release number of selected component (if applicable): =========================== # ceph --version ceph version 18.2.1-33.el9cp (bb22b0dcc4808ae828c6c8266cb1e9bec86f3a8d) reef (stable) # rpm -qa | grep nfs libnfsidmap-2.5.4-20.el9.x86_64 nfs-utils-2.5.4-20.el9.x86_64 nfs-ganesha-selinux-5.7-1.el9cp.noarch nfs-ganesha-5.7-1.el9cp.x86_64 nfs-ganesha-rgw-5.7-1.el9cp.x86_64 nfs-ganesha-ceph-5.7-1.el9cp.x86_64 nfs-ganesha-rados-grace-5.7-1.el9cp.x86_64 nfs-ganesha-rados-urls-5.7-1.el9cp.x86_64 How reproducible: ============ 1/1 Steps to Reproduce: ================== 1. Configure ganesha cluster on ceph cluster 2. Create 500 exports 3. Mount all the exports on 100 RHEL clients via v3 protocol. 4. Trigger FIO from all clients on 500 exports Actual results: ============== Memory consumption still remains high post cleanup Expected results: =============== Memory consumption should drop post performing cleanup Additional info: ================ Test Logs: -------- http://magna002.ceph.redhat.com/ceph-qe-logs/msaini/scale-500-exports-100-clients-fio-v3/ [ceph: root@cali013 /]# ceph nfs cluster ls [ "cephfs-nfs" ] [ceph: root@cali013 /]# ceph nfs cluster info cephfs-nfs { "cephfs-nfs": { "backend": [ { "hostname": "cali015", "ip": "10.8.130.15", "port": 12049 }, { "hostname": "cali016", "ip": "10.8.130.16", "port": 12049 } ], "monitor_port": 9049, "port": 2049, "virtual_ip": "10.8.130.236" } } [ceph: root@cali013 /]# ceph nfs export ls cephfs-nfs [] [ceph: root@cali013 /]# --- Additional comment from Frank Filz on 2024-02-28 23:12:30 UTC --- I just did some runs using FSAL_VFS and note the same behavior. It's actually pretty much the same with NFSv4 mount. I did confirm that the file descriptors used for the files get closed when the unexport happens (we expect them to not get closed when the NFSv3 unmount occurs, and in fact, if the files are deleted locally, not via NFS, they don't get closed either). I did see a small decrease in memory use after the unexport. I also ran with the latest V6-dev.6 code plus a patch that fixes a DRC memory leak and saw no improvement. I also added some logging and verified that the MDCACHE entries are released for each file (I ran with one client and one export, creating 10,000 files). So more investigation into what is occupying the memory. --- Additional comment from Frank Filz on 2024-03-01 01:01:42 UTC --- I have done some debugging using valgrind on my FSAL_VFS setup. I see no radical memory leaks (I actually DID find a couple small memory leaks - fixes posted). Valgrind massif shows no significant memory (other than 50 MB of hash tables which we always have) once the exports are removed. I think this is a situation where due to the way memory is utilized, we simply can not reduce the memory footprint even with malloc trim. I'm at a loss for what we could do. --- Additional comment from Manisha Saini on 2024-03-27 07:43:11 UTC --- Hi Frank, Do we have RCA for the memory issue. We've noticed that the memory usage is significantly higher on the v3 mount compared to the v4 mount. Isn't the memory usage high on the idle setup (post completing tests and deleting the files on mounts) for v3? --- Additional comment from Frank Filz on 2024-03-27 16:29:08 UTC --- Please try with nfs-ganesha-5.7-2.el9cp There is a fix: a8d097edd210b7be14eb813e3eaf8fb503a6f708 FSAL's state_free function called by free_state doesn't actually free That very likely is the (or at least a major) cause of memory growth. There are some other fixes that may impact also. But note also that while I was able to replicate the issue, valgrind memcheck showed no significant memory leaks. --- Additional comment from RHEL Program Management on 2024-03-28 07:25:28 UTC --- This bug is not attached to an Errata Tool advisory, so it is reverted to MODIFIED. Please attach this bug to an advisory before moving this bug to ON_QA. --- Additional comment from on 2024-03-28 14:40:41 UTC --- Attaching this bug to the 7.1 errata advisory; need to re-target from 7.2 to 7.1. Thomas --- Additional comment from errata-xmlrpc on 2024-03-28 14:42:21 UTC --- Bug report changed to ON_QA status by Errata System. A QE request has been submitted for advisory RHBA-2024:126567-01 https://errata.engineering.redhat.com/advisory/126567 --- Additional comment from errata-xmlrpc on 2024-03-28 14:42:28 UTC --- This bug has been added to advisory RHBA-2024:126567 by Thomas Serlin (tserlin) --- Additional comment from Manisha Saini on 2024-04-02 02:13:16 UTC --- (In reply to Frank Filz from comment #4) > Please try with nfs-ganesha-5.7-2.el9cp > > There is a fix: > > a8d097edd210b7be14eb813e3eaf8fb503a6f708 FSAL's state_free function called > by free_state doesn't actually free > > That very likely is the (or at least a major) cause of memory growth. > > There are some other fixes that may impact also. > > But note also that while I was able to replicate the issue, valgrind > memcheck showed no significant memory leaks. Hi Frank, With latest build - nfs-ganesha-5.7-2 , I again reran test with 1000 exports and 100 clients with v3 mount using FIO. Observing the same high memory usage of NFS daemon post performing cleanup --> 12.5g Disk usage post completing IO's - > 43 TiB used, 25 TiB / 69 TiB avail =========== # ceph -s cluster: id: 4e687a60-638e-11ee-8772-b49691cee574 health: HEALTH_OK services: mon: 1 daemons, quorum cali013 (age 2d) mgr: cali013.qakwdk(active, since 2d), standbys: cali016.rhribl, cali015.hvvbwh mds: 1/1 daemons up, 1 standby osd: 35 osds: 28 up (since 2d), 28 in (since 4d) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 9 pools, 1233 pgs objects: 3.78M objects, 14 TiB usage: 43 TiB used, 25 TiB / 69 TiB avail pgs: 1233 active+clean io: client: 170 B/s rd, 62 MiB/s wr, 0 op/s rd, 92 op/s wr Memory usage post completing IO's and before cleanup ========== Node 1: —------- MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 103858.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 92243 root 20 0 19.8g 13.0g 23996 S 17.0 10.4 416:26.67 ganesha.nfsd Node 2: —--- MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 74121.4 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 50503 root 20 0 2962844 218312 23724 S 0.0 0.2 8:50.37 ganesha.nfsd [ceph: root@cali013 /]# rpm -qa | grep nfs libnfsidmap-2.5.4-20.el9.x86_64 nfs-utils-2.5.4-20.el9.x86_64 nfs-ganesha-selinux-5.7-2.el9cp.noarch nfs-ganesha-5.7-2.el9cp.x86_64 nfs-ganesha-rgw-5.7-2.el9cp.x86_64 nfs-ganesha-ceph-5.7-2.el9cp.x86_64 nfs-ganesha-rados-grace-5.7-2.el9cp.x86_64 Nfs-ganesha-rados-urls-5.7-2.el9cp.x86_64 [ceph: root@cali013 /]# ceph --version ceph version 18.2.1-89.el9cp (926619fe7135cbd6d305b46782ee7ecc7be199a3) reef (stable) Post clean up (Deleting everything on exports and deleting all 1000 exports) ========================================================================== Memory usage ----------- Node 1: ---- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 92243 root 20 0 19.5g 12.5g 23996 S 0.0 10.0 596:45.88 ganesha.nfsd Node 2: ----- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 50503 root 20 0 2962844 219304 23736 S 0.0 0.2 10:18.34 ganesha.nfsd Disk usage post deleting exports ------------------------ # ceph -s cluster: id: 4e687a60-638e-11ee-8772-b49691cee574 health: HEALTH_OK services: mon: 1 daemons, quorum cali013 (age 3d) mgr: cali013.qakwdk(active, since 3d), standbys: cali016.rhribl, cali015.hvvbwh mds: 1/1 daemons up, 1 standby osd: 35 osds: 28 up (since 3d), 28 in (since 5d) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 9 pools, 1233 pgs objects: 316.89k objects, 1.2 TiB usage: 3.6 TiB used, 65 TiB / 69 TiB avail pgs: 1233 active+clean io: client: 170 B/s rd, 0 op/s rd, 0 op/s wr [ceph: root@cali013 /]# ceph nfs cluster ls [ "cephfs-nfs" ] [ceph: root@cali013 /]# ceph nfs export ls cephfs-nfs [] # ceph nfs cluster info { "cephfs-nfs": { "backend": [ { "hostname": "cali015", "ip": "10.8.130.15", "port": 12049 }, { "hostname": "cali016", "ip": "10.8.130.16", "port": 12049 } ], "monitor_port": 9049, "port": 2049, "virtual_ip": "10.8.130.236" } } Logs - =============== FIO instances running on all 1000 exports in parallel - http://magna002.ceph.redhat.com/ceph-qe-logs/msaini/Automation/scale_linux_v3_1000exports_100clients/fio_logs/Test_nfs_scale_with_fio_0.log Cleanup on all exports and deleting 1000 exports post test completion - http://magna002.ceph.redhat.com/ceph-qe-logs/msaini/Automation/scale_linux_v3_1000exports_100clients/delete_exports_and_cleanup/Test_nfs_scale_with_fio_0.log --- Additional comment from Frank Filz on 2024-04-03 14:37:28 UTC --- Could you run the test several cycles and report on the memory use during and after each cycle? --- Additional comment from Manisha Saini on 2024-04-07 19:40:36 UTC --- Test is in process. Will update the results here once completed --- Additional comment from Manisha Saini on 2024-04-09 02:42:43 UTC --- Hi Frank, The test was executed 3 times consecutively, and with each iteration, the memory usage increased by approximately 1 GB. After completing the 3 iterations and returning to an idle state, the NFS daemon consumed 14.6 GB of memory, which is considered high. Memory utilisation details in each run with IO's and post cleanup Iteration 1 ************************* Node 1: After running IO's --> 13.0 GB —------- MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 103858.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 92243 root 20 0 19.8g 13.0g 23996 S 17.0 10.4 416:26.67 ganesha.nfsd Node 1: After cleanup --> 12.5 GB ------- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 92243 root 20 0 19.5g 12.5g 23996 S 0.0 10.0 596:45.88 ganesha.nfsd Iteration 2 ************************* Node 1: After running IO's --> 15.7 GB -------- MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 100842.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 92243 root 20 0 22.0g 15.7g 23996 S 25.0 12.6 935:58.53 ganesha.nfsd Node 1: After cleanup --> 13.9 GB ------- MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 102730.5 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 92243 root 20 0 20.7g 13.9g 23996 S 0.0 11.1 1095:12 ganesha.nfsd Iteration 3 ************************* Node 1: After running IO's --> 16.4 GB ------ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 92243 root 20 0 22.4g 16.4g 23996 S 0.0 13.1 1793:14 ganesha.nfsd Node 1: After cleanup --> 14.6 GB ------- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 92243 root 20 0 21.4g 14.6g 23996 S 0.3 11.7 1793:46 ganesha.nfsd --- Additional comment from Matt Benjamin (redhat) on 2024-04-10 15:31:09 UTC --- Is this really a leak, though? can the either virt or res be induced to rise to an arbitrary level (ranging up to 100GB, for example, as we observed with the object cacher)? thanks! Matt --- Additional comment from Frank Filz on 2024-04-10 17:26:57 UTC --- I think we may leak some small bits for each export during each export re-load. The way exports are configured and removed, we reload exports 500 times during export setup, and 500 times during unload. This means we see some 250k structures (from (N * (N+1)/2 twice) lost each test cycle. I did fix one set of leaks in V5.7-2 (which is being used per the bug details above), but Valgrind showed some possibly lost memory that I didn't get a chance to chase down. If there really is a leak there, 250k structures per cycle does add up to something, though gigabyte means 4k memory per such structure... hmm... that's a page... Heap fragmentation might be a culprit here... --- Additional comment from Manisha Saini on 2024-04-12 05:09:32 UTC --- (In reply to Matt Benjamin (redhat) from comment #13) > Is this really a leak, though? can the either virt or res be induced to > rise to an arbitrary level (ranging up to 100GB, for example, as we observed > with the object cacher)? > > thanks! > > Matt Is it typical for memory consumption to increase with each run and for memory not to be released after cleanup? This behavior hasn't been observed with NFS v4. The problem we encountered, where 100GB was consumed in 7.0, was related to the smallfile IO tool. However, we're unable to run the smallfile tool due to the existing bug - https://bugzilla.redhat.com/show_bug.cgi?id=2247762. --- Additional comment from Manisha Saini on 2024-04-23 18:19:04 UTC --- As suggested, I ran the same test for NFSv4.1 (2 Iterations) with 7.1 Build, observing the same memory growth (as we saw for NFSv3) with the last run. Below are the stats for 2 Iterations for NFSv4.1 -- First Iteration =========== Post Ruuning IO’s - Node 1: 10.8g | Node 2: 13.0g Post cleanup- Node 1: 9.7g | Node 2: 11.1g 2nd Iteration =========== Post running IO’s Node 1: 13.1g | Node 2: 14.5g Post cleanup: Node 1: 11.7g. | Node 2: 13.4g It appears that the problem is not limited to NFSv3. It is also observed with NFSv4. --- Additional comment from Manisha Saini on 2024-05-02 12:24:48 UTC --- Again Ran test on the scratch build provided by dev having additional memory fixes [ceph: root@cali013 /]# ceph --version ceph version 18.2.1-150.el9cp (4a63dafcc8b87d799b599d01d34a419e85212ed1) reef (stable) [ceph: root@cali013 /]# rpm -qa | grep nfs libnfsidmap-2.5.4-20.el9.x86_64 nfs-utils-2.5.4-20.el9.x86_64 nfs-ganesha-selinux-5.7-3.0.TEST.ffilz20240422.el9cp.noarch nfs-ganesha-5.7-3.0.TEST.ffilz20240422.el9cp.x86_64 nfs-ganesha-rgw-5.7-3.0.TEST.ffilz20240422.el9cp.x86_64 nfs-ganesha-ceph-5.7-3.0.TEST.ffilz20240422.el9cp.x86_64 nfs-ganesha-rados-grace-5.7-3.0.TEST.ffilz20240422.el9cp.x86_64 nfs-ganesha-rados-urls-5.7-3.0.TEST.ffilz20240422.el9cp.x86_64 Here are the results of 3 Iterations : Exports : 1000 Clients : 100 Mount version : v3 First run ---------- Memory consumption after IO operations: 13.3g Memory consumption after cleanup: 12.1g Second run ------------ Memory consumption after IO operations: 15.5g Memory consumption after cleanup: 14.0g Third run ------------- Memory consumption after IO operations: 16.3g Memory consumption after cleanup: 15.0g --- Additional comment from Scott Ostapovicz on 2024-05-13 20:10:04 UTC --- So as I read this, Frank has made real progress in fixing some memory leaks here, and has confirmed this with Valgrind. Manisha has confirmed that the remaining leakage affects both NFS v3 and NFS v4. Might I suggest we accept the forward progress we have made in 7.1, mark this specific BZ as fixed (since it addressed the original problems, but not ALL problems), note in the errata that it is better than ever but still a work in progress, and then clone this issue for 7.1 z1 so Frank can continue his work addressing memory leakage as we constantly move the quality forward?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:10216