Bug 1558379
Summary: | Huge memory usage of FUSE client | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | MA <marwan.akiki> | ||||||||
Component: | fuse | Assignee: | bugs <bugs> | ||||||||
Status: | CLOSED NEXTRELEASE | QA Contact: | |||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 3.13 | CC: | bugs, geert.nijpels, glush, jahernan, jbyers, kjohnson, marwan.akiki, moagrawa, nbalacha, oleksandr, olim, onatalen, pkarampu, yannick.perret | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | 1369364 | Environment: | |||||||||
Last Closed: | 2018-03-22 08:47:08 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | fuse | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
MA
2018-03-20 06:44:48 UTC
By the time we found this, 3.13 EOLed, https://review.gluster.org/19647 is the fix. It is merged also in 4.0 release: https://review.gluster.org/19654 Could you try the fixed versions and let us know if you still see the issue? Pranith I just tested the patch against 3.13.2, on a Debian 8 (just client part). Without the patch the FUSE 3.13.2 client reached >6G of virtual size (and stay) after extracting in loop a kernel tar.gz inside the FS about 20 times. With the patch at this time VSZ grows from ~400M to ~600M and seems to be stable now (6 loops performed so far). I let the tests running to be sure, but it is whatever far better than before. Thanks. -- Y. So, more tests. I performed 16 loops of extracting the same archive onto the same place (overwriting content) in a FUSE mountpoint with patched 3.13.2 client. The volume is a x2 replica (default configuration) from 2 servers with the same OS (fresh Debian 8 64b) and the same glusterfs version (but not patched on servers). Without the patch after *each* FS operation the memory size of the client process grows. With the patch it grows until a certain point and then stay stable. The archive is the latest linux kernel tarball (~154M). Here is the client process VSZ/RSS over time: 427796 10788 (initial memory, just after mounting) 427796 10788 493332 21020 (starting extracting archive) 493332 27772 493332 45620 493332 63072 (…) 493332 88904 493332 104484 493332 128672 (…) 689940 223832 689940 228404 At this point memory size is stable. Later I started an other extraction of the same archive in an other target directory, while the main loop was still running. Memory increase again a little: 689940 232172 (…) 757612 363916 757612 373788 757612 383672 757612 394316 757612 404792 (…) 888684 455848 At this point memory size is again stable. So clearly the memory leak related to every operations is corrected, at least for my configuration / options (note: without the patch even listing content increased the memory size). In my point of view there is still a question: why the memory never reduce? Now all operations are over on the mountpoint (for ~4 hours now) and memory size is still exactly the same. I also then deleted all content from the mountpoint without any change. Is it an other kind of memory leak? Is it some kind of cached data? But if it is cached data it should have expired now, moreover after deleting all content (caching non-existing nodes don't seems useful). I can of course perform more tests if it can help, please let me know. By my side I will run other copies with other directory targets, in order to see if memory will still grows (a little now) and stay like this. Thanks, -- Y. The issue is that I cannot update the clients until I'm sure that the patch is stable. Unfortunately I'm on live system and it was updated to 3.13 instead of 3.12. All help is highly appreciated. (In reply to Yannick Perret from comment #3) > So, more tests. > > I performed 16 loops of extracting the same archive onto the same place > (overwriting content) in a FUSE mountpoint with patched 3.13.2 client. > The volume is a x2 replica (default configuration) from 2 servers with the > same OS (fresh Debian 8 64b) and the same glusterfs version (but not patched > on servers). > > Without the patch after *each* FS operation the memory size of the client > process grows. With the patch it grows until a certain point and then stay > stable. The archive is the latest linux kernel tarball (~154M). > > Here is the client process VSZ/RSS over time: > > 427796 10788 (initial memory, just after mounting) > 427796 10788 > 493332 21020 (starting extracting archive) > 493332 27772 > 493332 45620 > 493332 63072 > (…) > 493332 88904 > 493332 104484 > 493332 128672 > (…) > 689940 223832 > 689940 228404 > At this point memory size is stable. > > Later I started an other extraction of the same archive in an other target > directory, while the main loop was still running. Memory increase again a > little: > 689940 232172 > (…) > 757612 363916 > 757612 373788 > 757612 383672 > 757612 394316 > 757612 404792 > (…) > 888684 455848 > At this point memory size is again stable. > > > So clearly the memory leak related to every operations is corrected, at > least for my configuration / options (note: without the patch even listing > content increased the memory size). > > > In my point of view there is still a question: why the memory never reduce? > Now all operations are over on the mountpoint (for ~4 hours now) and memory > size is still exactly the same. > I also then deleted all content from the mountpoint without any change. > > Is it an other kind of memory leak? Is it some kind of cached data? But if > it is cached data it should have expired now, moreover after deleting all > content (caching non-existing nodes don't seems useful). > > > I can of course perform more tests if it can help, please let me know. > By my side I will run other copies with other directory targets, in order to > see if memory will still grows (a little now) and stay like this. > > Thanks, > -- > Y. What you can give me is the statedump of the client before running the test and statedump of the client after running these tests and deleting all the content you created as part of the test. With these two files, I can compare what grew to see if it is expected or something more needs to be fixed. kill -USR1 <pid-of-client-process> generates statedump at "/var/run/gluster" with the pattern glusterdump.<pid>.<timestamp> Upload these two files and we will have some data to analyse. (In reply to MA from comment #4) > The issue is that I cannot update the clients until I'm sure that the patch > is stable. > Unfortunately I'm on live system and it was updated to 3.13 instead of 3.12. > All help is highly appreciated. I'm afraid the clients will be OOM killed after a point, if you don't downgrade to 3.12.x This bug is not present in 3.12.x Created attachment 1411062 [details]
Statedump after mount
Statedump just after mount
Created attachment 1411065 [details]
Statedump after archive extraction
Statedump after extracting an archive into FS
Created attachment 1411074 [details]
Statedump after cleanup
Statedump after deleting every content
I performed the following steps: 1. cleanup FS content, and unmount 2. mount glusterfs volume (see comment #3 for details about OS and volume configuration) 3. create statedump "step1" 4. extraction of linux-4.16-rc5.tar.gz into the FS 5. create statedump "step2" 6. rm -rf all FS content 7. create statedump "step3" From 'ps' command I monitored VSZ and RSS during these steps: - just after mounting the FS: 427796/10488 - just after extracting the tarball: 757612/237544 (in between memory grows slowly from 427796 to 757612) - just after removing all content: 757612/241000 This last value is still the same after ~15 minutes the 'rm -rf' was performed, and did not changed after performing 'sync ; echo 3 >/proc/sys/vm/drop_caches' (to be sure). Here is the full 'ps aux' line: root 12118 8.7 5.9 757612 241000 ? Ssl 10:02 3:17 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=xx.yy.zz.ww --volfile-id=test-volume /root/MNT As said previously the initial bug that makes memory growing on each syscall seems corrected. But I would expect after removing all files (FS is currently empty) than memory falls back to a similar value than at start time. I would also expect memory to do the same after a long time of inactivity. Please let me know if I can help. Regards, -- Y. At this time, comparing the two statedumps, I can see 3 sections with "high" values: [debug/io-stats.test-volume - usage-type gf_common_mt_strdup memusage] size=17450 num_allocs=393 max_size=3467963 max_num_allocs=67310 total_allocs=134621 [debug/io-stats.test-volume - usage-type gf_io_stats_mt_ios_stat memusage] size=198392 num_allocs=403 max_size=33924056 max_num_allocs=67319 total_allocs=67320 [debug/io-stats.test-volume - usage-type gf_io_stats_mt_ios_stat_list memusage] size=12896 num_allocs=403 max_size=12928 max_num_allocs=404 total_allocs=69767 These 3 seems related to "debug": I did not activated any debug option, neither on client nor on servers. Note: I found exactly the same sections again in the statedump file. -- Y. The malloc implementation in glibc (the one gluster uses) does return the memory to the system only if the released memory belongs to the top of the heap. This means that a single allocation on the top of the heap prevents all other released memory from being returned to the system. Anyway, the memory is still free from the point of view of the application, so it can be reused when more memory is requested. One way to test if this is really the case would be to repeat the same test you did in comment #3 but before untaring to another directory, remove all previous data. This should release a lot of cached data that should be reused by the next untar, keeping memory usage (almost) constant. For example: mkdir /gluster/dir1 tar -C /gluster/dir1 -xf linux.tgz # check memory usage rm -rf /gluster/dir1 # or dropping caches mkdir /gluster/dir2 tar -C /gluster/dir2 -xf linux.tgz # check memory usage There shouldn't be a significant increase in memory usage after the second untar. (In reply to Xavi Hernandez from comment #12) > The malloc implementation in glibc (the one gluster uses) does return the > memory to the system only if the released memory belongs to the top of the > heap. This means that a single allocation on the top of the heap prevents > all other released memory from being returned to the system. > > Anyway, the memory is still free from the point of view of the application, > so it can be reused when more memory is requested. > I understand this. And, so far, it seems to be confirmed by the fact that I need to perform *new additional* operations to make memory increased. But when I remove all files from the FS − I still performed several 'rm -rf MNT/*' − top of heap (which should have hold stuff about last files operations) should be free, and this memory should returns to the system. This is not the case so far: after a total cleanup of FS content the glusterfs process stay at the same memory value. In the context of few operations on the FS this is clearly not a problem. Neither for temporary mounts (i.e. some of our backups that are automounted). But at my office some of the volumes are users HOME, with about +300 people. They are mounted using NFS export (so without a real failover). Currently by just dealing with ~20 different subdirectories today (with various archives) I reached VSZ/RSS of 1216364/754568, How high would it grows if I use it on a +300 directories activity on a permanent mount? Regards, -- Y. Hmmm… Please don't mind this previous comment. I tried something: the glusterfs process was still using 1216364/754568 (VSZ/RSS) and I started a small program that performed 'malloc' (and use allocated memory by writing inside allocated areas) in loops until it failed. After that I checked memory used by glusterfs process (for the record: all content destroyed by a 'rm -rf MNT/*'): root 12118 3.0 0.0 1216364 0 ? Ssl 10:02 20:59 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=xx.yy.zz.ww --volfile-id=test-volume /root/MNT Resident memory falls to 0. After a single 'ls -la' I got 1216364/2080. So VSZ don't changed but resident memory seems to be returned to the OS only when the OS needs fresh memory. So at this point all is fine for me (VSZ is not really pertinent) if RSS takes care of free-but-not-released-memory on memory pressure it is fine. In my point of view this bug is solved. Thanks to you guys. Regards, -- Y. (In reply to Yannick Perret from comment #13) > But when I remove all files from the FS − I still performed several 'rm -rf > MNT/*' − top of heap (which should have hold stuff about last files > operations) should be free, and this memory should returns to the system. > This is not the case so far: after a total cleanup of FS content the > glusterfs process stay at the same memory value. This is not always true. While gluster is running, it uses memory for many internal operations that are not directly related to cached data. So even after having deleted all files, Gluster will still be using some memory, and it can happen that one of these blocks of memory comes from the top of the heap if it has been allocated (or reallocated) while data was being processed. |