Bug 1023191
Summary: | glusterfs consuming a large amount of system memory | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Chad Feller <cfeller> | ||||||||||
Component: | core | Assignee: | GlusterFS Bugs list <gluster-bugs> | ||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 3.4.2 | CC: | bugs, gluster-bugs, joe, jwm, khoi.mai2008, kkeithle, redhat.bugs, thefiguras | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2014-10-16 15:47:01 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Chad Feller
2013-10-24 20:27:29 UTC
Created attachment 815916 [details]
lsof output
Created attachment 815917 [details]
pmap output
Created attachment 815918 [details]
status dump
Messages like these:
TIME=2013-10-24 10:48:35.001570
message=[0] fuse_forget: 37155599: FORGET 777007404/1 gfid: (432f0f9b-ea58-4472-86e9-37a715f1aa26)
were triggered after issuing a:
echo 3 >/proc/sys/vm/drop_caches
following a chat with JoeJulian on IRC.
(Ultimately that only dropped the ram consumption less than 1%).
This may be kernel specific. I see much more memory allocation with EL6 kernels than I do with Fedora. (In reply to Joe Julian from comment #4) > This may be kernel specific. I see much more memory allocation with EL6 > kernels than I do with Fedora. To test your theory, I spun up two Fedora VMs (KVM) with same specs as the 4GB EL6 VM (also KVM) mentioned in the initial report. I'm actually seeing greater memory consumption in Fedora. The F18 VM: rsync completed yesterday: ############################################################################## top - 17:06:10 up 23:23, 2 users, load average: 0.00, 0.01, 0.05 Tasks: 80 total, 1 running, 79 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 4049744 total, 3817220 used, 232524 free, 8044 buffers KiB Swap: 1048572 total, 207528 used, 841044 free, 38584 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 820 root 20 0 2299m 1.9g 1380 S 0.0 48.5 60:38.12 glusterfs ############################################################################## --- and --- The F19 VM: rsync still in progress, started 3 hours ago: ############################################################################## top - 17:06:50 up 23:56, 2 users, load average: 0.11, 0.19, 0.22 Tasks: 86 total, 2 running, 84 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.3 us, 2.8 sy, 0.0 ni, 91.8 id, 0.0 wa, 0.7 hi, 1.2 si, 0.2 st KiB Mem: 4049872 total, 3911176 used, 138696 free, 236 buffers KiB Swap: 1048572 total, 79080 used, 969492 free, 15680 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1114 root 20 0 2272804 1.947g 2036 R 10.0 50.4 29:54.90 glusterfs ############################################################################## Respective kernels: F18 VM: 3.11.4-101.fc18.x86_64 F19 VM: 3.11.6-200.fc19.x86_64 Let me know if you want/need any more info. Also, official gluster 3.4.1 rpms on both Fedora systems: # rpm -qa | grep gluster glusterfs-fuse-3.4.1-1.fc18.x86_64 glusterfs-3.4.1-1.fc18.x86_64 glusterfs-libs-3.4.1-1.fc18.x86_64 --- and --- # rpm -qa | grep gluster glusterfs-3.4.1-1.fc19.x86_64 glusterfs-libs-3.4.1-1.fc19.x86_64 glusterfs-fuse-3.4.1-1.fc19.x86_64 Two additional notes: The earlier rsync (mentioned in Comment #5) using the F19 VM completed - it actually peaked using 53.5% of the system memory: ############################################################################## top - 18:46:38 up 1 day, 1:36, 2 users, load average: 0.41, 0.27, 0.24 Tasks: 86 total, 3 running, 83 sleeping, 0 stopped, 0 zombie %Cpu(s): 5.6 us, 6.8 sy, 0.0 ni, 84.8 id, 0.0 wa, 1.0 hi, 1.7 si, 0.2 st KiB Mem: 4049872 total, 3943232 used, 106640 free, 196 buffers KiB Swap: 1048572 total, 190860 used, 857712 free, 243500 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1114 root 20 0 2412556 2.066g 1576 R 21.0 53.5 49:33.36 glusterfs ############################################################################## it dropped back down just before rsync completed and stayed at 48.7% after completion, which is comparable to the F18 VM listed above, but also significantly more than EL6. ############################################################################## top - 20:42:03 up 1 day, 3:31, 3 users, load average: 0.00, 0.01, 0.05 Tasks: 86 total, 1 running, 85 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 4049872 total, 3519144 used, 530728 free, 3596 buffers KiB Swap: 1048572 total, 197364 used, 851208 free, 30664 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1114 root 20 0 2353704 1.883g 1500 S 0.0 48.7 59:51.69 glusterfs ############################################################################## still, it looks like glusterfs is consuming a large amount of system memory regardless of the underlying Linux OS. Created attachment 843861 [details]
Gluster profiling enabled
I noticed similar glusterfs behaviour in 3.3.1 version. Gluster Node version 3.3.1 was running under CentOS 6.4 So decided to upgrade to: glusterfs-3.4.1-3.el6.x86_64 glusterfs-geo-replication-3.4.1-3.el6.x86_64 glusterfs-libs-3.4.1-3.el6.x86_64 glusterfs-cli-3.4.1-3.el6.x86_64 glusterfs-server-3.4.1-3.el6.x86_64 glusterfs-fuse-3.4.1-3.el6.x86_64 OS: CentOS release 6.5 (Final) After upgrade, however the issue have not gone away. System is stable, I cant complain, failover with carp works like charm. However it bugs me why glusterfs commit memory all the time. I am running 4 to 5 volumes and all are type: Replicate. Two volumes are almost idle most of the time and two are active once a day. Attached profiling enabled LOG for two active volumes. Munin graphs shows memory commit over time very clear. Meanwhile, I noticed restart of glusterd helps but does not prevents from committing memory. Adding more RAM has not helped either. http://i.imgur.com/ZbGmnfs.png http://i.imgur.com/rwEbm33.png Updating version to 3.4.2. I updated server and client packages to 3.4.2 earlier today. After a couple of rsyncs, glusterfs is consuming 58.4% of the system memory on one of our clients. So it is fair to say that the problem remains in 3.4.2. I'm seeing the same thing with 3.5.0 on CentOS 6. We're running the Robinhood Policy Engine (https://sourceforge.net/projects/robinhood/) to enforce a directory cleanup policy. It periodically scans a ~275TB, 40-brick Gluster filesystem with ~250M files. After it's scanned only ~50M files, the FUSE client has an RSS of ~4GB: [jwm@robinhood:pts/2 ~> psg gluster root 4842 4842 9.7 51.7 4856572 4168700 ? Ssl ep_poll Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4844 0.0 51.7 4856572 4168700 ? Ssl rt_sigtimedw Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4845 0.0 51.7 4856572 4168700 ? Ssl futex_wait_q Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4846 0.0 51.7 4856572 4168700 ? Ssl futex_wait_q Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4847 0.0 51.7 4856572 4168700 ? Ssl hrtimer_nano Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4850 4.2 51.7 4856572 4168700 ? Ssl request_wait Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch root 4842 4851 0.0 51.7 4856572 4168700 ? Ssl pipe_wait Jun25 /usr/sbin/glusterfs --acl --gid-timeout=2 --volfile-server=holyscratch --volfile-max-fetch-attempts=10 --volfile-server-transport=tcp --volfile-id=/holyscratch /n/holyscratch Note that this is RSS of the process itself, not kernel page/entry/inode/dentry cache. *** This bug has been marked as a duplicate of bug 1127140 *** |