Description of problem: ======================= I enabled md-cache on a distrep vol. I mounted the volume on two different clients and created about hundred thousand zero byte files. When I do a lookup, the residual memory increased on using top command for the mount process. I then deleted all the files and I find that the residual memory is not at all decreasing. The memory usage consumption has gone up to 4% as below PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19001 root 20 0 1047712 293652 3684 S 0.0 3.7 10:09.42 gluster+ I even did a lookup after about 15 min to see atleast then if memory will be freed, but it was not getting freed at all I expect that on removal of files, the cache must be invalidated but looks like the memory is not freed which has been alloted. However the lookup doesnt display anything Volume Name: distrep Type: Distributed-Replicate Volume ID: 69a1f685-5024-4b1f-a6bd-81a350f83da9 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.35.86:/rhs/brick1/distrep Brick2: 10.70.35.9:/rhs/brick1/distrep Brick3: 10.70.35.153:/rhs/brick1/distrep Brick4: 10.70.35.79:/rhs/brick1/distrep Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet features.cache-invalidation: on features.cache-invalidation-timeout: 60 performance.stat-prefetch: on performance.cache-invalidation: on performance.md-cache-timeout: 60 BUILD: It was taken from what was mentioned in http://etherpad.corp.redhat.com/md-cache-3-2 nfs-ganesha-gluster-2.4.0-2.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-api-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-events-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-libs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-cli-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-server-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-rdma-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 python-gluster-3.8.4-2.26.git0a405a4.el7rhgs.noarch glusterfs-fuse-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
in this way complete memory can be consumed leading to mount process crashed. When I started the test the residual memory was at 51604KB and on first lookup(post creation of 1lakh files) went to 190932 Even after delete finally it was at 293652 Also, note that i noticed that when i re issue lookups after larger time gaps say about 15min, I see that the residual memory shooting up, this too could be a problem that needs addressing
From my analysis, I could see the memory usage increase as the files get created, but when the files are removed, md-cache cleans up the cache. But the memory usage (as shown in top) doesn't reduce greatly. This is the case with md-cache enabled or disabled. There surely is some leak, but it is not by md-cache. We need to debug it further to identify which component is consuming the remaining memory. From the first look couldn't find it out from statedump.
As mentioned in Comment #3, i could reproduce the leak, but is seen without md-cache as well. Could you please confirm?
I agree this can be seen even with md-cache However, shouldn't we be clearing the cache atleast with md-cache enabled when upcalls are triggered. Can't we leverage that intelligence?
(In reply to nchilaka from comment #5) > I agree this can be seen even with md-cache > However, shouldn't we be clearing the cache atleast with md-cache enabled > when upcalls are triggered. Can't we leverage that intelligence? md-cache already clears the cache that it allocated as a part of unlink. We do not require upcall to clear the cache on unlink in any component, as it is on the same mount. I guess this is a trivial leak, not sure which component.
changing summary as it may not have to do with mdcache, based on above comments
Requires re-testing with the latest release, as lots of memory leaks have gone in from 3.2 to now.
As mentioned in the previous comments, its not related to md-cache, hence changing the component.
(In reply to Poornima G from comment #13) > Requires re-testing with the latest release, as lots of memory leaks have > gone in from 3.2 to now. retested on 3.4.2 3.12.2-29 build, still the problem exists. [root@dhcp35-64 ~]# cat test.log below was taken while writes were going on Fri Nov 23 20:22:35 IST 2018 13456 root 20 0 642756 182528 4140 S 0.0 4.7 5:52.45 glusterfs Below was taken after doing a find * and ls -lRt Sat Nov 24 21:29:52 IST 2018 13456 root 20 0 839364 390532 4156 S 0.0 10.1 11:05.12 glusterfs now going to do rm -rf Sat Nov 24 21:41:23 IST 2018 rm -rf complete and filesystem empty Sat Nov 24 21:41:23 IST 2018 13456 root 20 0 810692 365248 4204 S 0.0 9.4 12:55.24 glusterfs #### rechecking after about 15min Sat Nov 24 21:58:16 IST 2018 13456 root 20 0 810692 365248 4204 S 0.0 9.4 12:55.28 glusterfs #### rechecking after about 15min Sat Nov 24 21:58:19 IST 2018 13456 root 20 0 810692 365248 4204 S 0.0 9.4 12:55.28 glusterfs
Need a test with 3.4.4 release / 3.5.0 builds. Mainly because we have fuse inode garbage collection feature now.
sosreports and client statedumps @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1386658/reproducer-on-rhgs350-comment19/client/dhcp47-147.lab.eng.blr.redhat.com/
Hi Nag, Are we still seeing the issue in 3.5.1? Thanks, Mohit Agrawal
(In reply to Mohit Agrawal from comment #22) > Hi Nag, > > Are we still seeing the issue in 3.5.1? > > Thanks, > Mohit Agrawal I Mohit, yes I saw in 3.5.1 too
This issue is not reproducible with RHGS 3.5.4 on RHEL7. Validation was also done on RHEL 8 based RHGS 3.5.4. Based on these facts, closing this bug