Description of problem: I've noticed that, after a lot of file transfers (say rsync for instance) that the glusterfs process on the client machine (using the native fuse client) will suck up about 41 - 42% of the memory on the system, consistently. It doesn't matter if the client machine has 4GB or RAM or 12GB of RAM. After a certain amount of time, after heavy use, it will consistently be sucking up this amount of memory Even after the connection is idle (transfer/rsync has completed), the amount of consumed memory remains large. Here is an example of what I'm seeing, via top, on two different client machines: ############################################################################## top - 10:36:01 up 2 days, 23:59, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 228 total, 1 running, 227 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12187164k total, 11906032k used, 281132k free, 63252k buffers Swap: 2097144k total, 59004k used, 2038140k free, 259724k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3841 root 20 0 5186m 4.8g 2568 S 0.0 41.5 152:57.88 glusterfs --- and --- top - 10:36:25 up 1 day, 18:46, 3 users, load average: 0.00, 0.00, 0.00 Tasks: 98 total, 1 running, 97 sleeping, 0 stopped, 0 zombie Cpu(s): 1.2%us, 1.3%sy, 0.0%ni, 97.3%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 3922928k total, 3153084k used, 769844k free, 9576k buffers Swap: 1048568k total, 656k used, 1047912k free, 160448k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1187 root 20 0 1800m 1.5g 2808 S 5.3 40.7 129:29.83 glusterfs ############################################################################## the second machine (4GB RAM) still has an active rsync, as can be noted by the CPU load, On the first machine (12GB RAM), rsync completed two days ago. Version-Release number of selected component (if applicable): client: # rpm -qa | grep gluster glusterfs-3.4.1-2.el6.x86_64 glusterfs-libs-3.4.1-2.el6.x86_64 glusterfs-fuse-3.4.1-2.el6.x86_64 server: # rpm -qa | grep gluster glusterfs-libs-3.4.1-2.el6.x86_64 glusterfs-fuse-3.4.1-2.el6.x86_64 glusterfs-server-3.4.1-2.el6.x86_64 glusterfs-3.4.1-2.el6.x86_64 glusterfs-cli-3.4.1-2.el6.x86_64 How reproducible: always Steps to Reproduce: 1. mount gluster volume via native (fuse) client 2. rsync several terabytes of data Additional info: - Server are RHEL6 (6.4) - Clients are RHEL6 (12GB box) and CentOS 6 (4GB box) - Both clients and servers are fully updated, all running kernel 2.6.32-358.23.2.el6.x86_64 Configuration: # gluster volume info Volume Name: gv0 Type: Distributed-Replicate Volume ID: a86fbffd-408d-41f9-b2ed-a3816f09d924 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gluster0:/export/brick0 Brick2: gluster1:/export/brick0 Brick3: gluster2:/export/brick0 Brick4: gluster3:/export/brick0 Will add a few attachments following this with additional info...
Created attachment 815916 [details] lsof output
Created attachment 815917 [details] pmap output
Created attachment 815918 [details] status dump Messages like these: TIME=2013-10-24 10:48:35.001570 message=[0] fuse_forget: 37155599: FORGET 777007404/1 gfid: (432f0f9b-ea58-4472-86e9-37a715f1aa26) were triggered after issuing a: echo 3 >/proc/sys/vm/drop_caches following a chat with JoeJulian on IRC. (Ultimately that only dropped the ram consumption less than 1%).
This may be kernel specific. I see much more memory allocation with EL6 kernels than I do with Fedora.
(In reply to Joe Julian from comment #4) > This may be kernel specific. I see much more memory allocation with EL6 > kernels than I do with Fedora. To test your theory, I spun up two Fedora VMs (KVM) with same specs as the 4GB EL6 VM (also KVM) mentioned in the initial report. I'm actually seeing greater memory consumption in Fedora. The F18 VM: rsync completed yesterday: ############################################################################## top - 17:06:10 up 23:23, 2 users, load average: 0.00, 0.01, 0.05 Tasks: 80 total, 1 running, 79 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 4049744 total, 3817220 used, 232524 free, 8044 buffers KiB Swap: 1048572 total, 207528 used, 841044 free, 38584 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 820 root 20 0 2299m 1.9g 1380 S 0.0 48.5 60:38.12 glusterfs ############################################################################## --- and --- The F19 VM: rsync still in progress, started 3 hours ago: ############################################################################## top - 17:06:50 up 23:56, 2 users, load average: 0.11, 0.19, 0.22 Tasks: 86 total, 2 running, 84 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.3 us, 2.8 sy, 0.0 ni, 91.8 id, 0.0 wa, 0.7 hi, 1.2 si, 0.2 st KiB Mem: 4049872 total, 3911176 used, 138696 free, 236 buffers KiB Swap: 1048572 total, 79080 used, 969492 free, 15680 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1114 root 20 0 2272804 1.947g 2036 R 10.0 50.4 29:54.90 glusterfs ############################################################################## Respective kernels: F18 VM: 3.11.4-101.fc18.x86_64 F19 VM: 3.11.6-200.fc19.x86_64 Let me know if you want/need any more info.
Also, official gluster 3.4.1 rpms on both Fedora systems: # rpm -qa | grep gluster glusterfs-fuse-3.4.1-1.fc18.x86_64 glusterfs-3.4.1-1.fc18.x86_64 glusterfs-libs-3.4.1-1.fc18.x86_64 --- and --- # rpm -qa | grep gluster glusterfs-3.4.1-1.fc19.x86_64 glusterfs-libs-3.4.1-1.fc19.x86_64 glusterfs-fuse-3.4.1-1.fc19.x86_64
Two additional notes: The earlier rsync (mentioned in Comment #5) using the F19 VM completed - it actually peaked using 53.5% of the system memory: ############################################################################## top - 18:46:38 up 1 day, 1:36, 2 users, load average: 0.41, 0.27, 0.24 Tasks: 86 total, 3 running, 83 sleeping, 0 stopped, 0 zombie %Cpu(s): 5.6 us, 6.8 sy, 0.0 ni, 84.8 id, 0.0 wa, 1.0 hi, 1.7 si, 0.2 st KiB Mem: 4049872 total, 3943232 used, 106640 free, 196 buffers KiB Swap: 1048572 total, 190860 used, 857712 free, 243500 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1114 root 20 0 2412556 2.066g 1576 R 21.0 53.5 49:33.36 glusterfs ############################################################################## it dropped back down just before rsync completed and stayed at 48.7% after completion, which is comparable to the F18 VM listed above, but also significantly more than EL6. ############################################################################## top - 20:42:03 up 1 day, 3:31, 3 users, load average: 0.00, 0.01, 0.05 Tasks: 86 total, 1 running, 85 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 4049872 total, 3519144 used, 530728 free, 3596 buffers KiB Swap: 1048572 total, 197364 used, 851208 free, 30664 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1114 root 20 0 2353704 1.883g 1500 S 0.0 48.7 59:51.69 glusterfs ############################################################################## still, it looks like glusterfs is consuming a large amount of system memory regardless of the underlying Linux OS.
Created attachment 843861 [details] Gluster profiling enabled
I noticed similar glusterfs behaviour in 3.3.1 version. Gluster Node version 3.3.1 was running under CentOS 6.4 So decided to upgrade to: glusterfs-3.4.1-3.el6.x86_64 glusterfs-geo-replication-3.4.1-3.el6.x86_64 glusterfs-libs-3.4.1-3.el6.x86_64 glusterfs-cli-3.4.1-3.el6.x86_64 glusterfs-server-3.4.1-3.el6.x86_64 glusterfs-fuse-3.4.1-3.el6.x86_64 OS: CentOS release 6.5 (Final) After upgrade, however the issue have not gone away. System is stable, I cant complain, failover with carp works like charm. However it bugs me why glusterfs commit memory all the time. I am running 4 to 5 volumes and all are type: Replicate. Two volumes are almost idle most of the time and two are active once a day. Attached profiling enabled LOG for two active volumes. Munin graphs shows memory commit over time very clear. Meanwhile, I noticed restart of glusterd helps but does not prevents from committing memory. Adding more RAM has not helped either. http://i.imgur.com/ZbGmnfs.png http://i.imgur.com/rwEbm33.png
Updating version to 3.4.2. I updated server and client packages to 3.4.2 earlier today. After a couple of rsyncs, glusterfs is consuming 58.4% of the system memory on one of our clients. So it is fair to say that the problem remains in 3.4.2.
I'm seeing the same thing with 3.5.0 on CentOS 6. We're running the Robinhood Policy Engine (https://sourceforge.net/projects/robinhood/) to enforce a directory cleanup policy. It periodically scans a ~275TB, 40-brick Gluster filesystem with ~250M files. After it's scanned only ~50M files, the FUSE client has an RSS of ~4GB: [jwm@robinhood:pts/2 ~> psg gluster root 4842 4842 9.7 51.7 4856572 4168700 ? Ssl ep_poll Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4844 0.0 51.7 4856572 4168700 ? Ssl rt_sigtimedw Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4845 0.0 51.7 4856572 4168700 ? Ssl futex_wait_q Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4846 0.0 51.7 4856572 4168700 ? Ssl futex_wait_q Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4847 0.0 51.7 4856572 4168700 ? Ssl hrtimer_nano Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4850 4.2 51.7 4856572 4168700 ? Ssl request_wait Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4851 0.0 51.7 4856572 4168700 ? Ssl pipe_wait Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch Note that this is RSS of the process itself, not kernel page/entry/inode/dentry cache.
*** This bug has been marked as a duplicate of bug 1127140 ***