Description of problem: I am running glusterfs 3.4.2 on linux kernel version 2.6.34.12 on two x86_64 board with 16 GB of RAM each. I have several gluster file-systems (close to 10)in twin-replicated mode containing around 4 GB of data aggregate. Sometimes, following reboot of boards, I observe that glustershd memory % in top output increases above 50% (over 8 GB) causing problems when trying to run other key processes. Version-Release number of selected component (if applicable): glusterfs 3.4.2 linux kernel 2.6.34.12 How reproducible: Intermittent. Our systems reboot very frequently and during testing we often format our disks to clean out the bricks and then add them back. So, there is quite a lot of 'uncontrolled' self heal going on on our systems. Steps to Reproduce: 1. Remove all the bricks on one of the serves from all replicated volumes. 2. Erase the logical volumes that comprise these brcks. 3. Re-create the bricks and add them back to the replicated volumes causing massive heal of data. Actual results: Sometimes, maybe around once in 20-30 times glustershd memory usage exceeds 50% (8 GB) causing other applications to fail spawn/terminate abruptly. Work around is to kill glustershd, and then restart /etc/init.d/glusterd to get the former to spawn back. Expected results: We would expect the memory usage to fall within a reasonable ceiling, say, 20%? Additional info: Please note that this bug is specifically for high memory consumption by the glusterfs self-heal daemon. I am aware that several other bugs exist in bugzilla catering to generic high memory consumption by glusterfs daemons, or maybe specific ones such as those pertaining to gfs nfs.
I took the statedump and found that the process is leaking 'path' from circular buffers it uses to remember the last 1024 entries that healed/failed/split-brain. http://review.gluster.org/4790 has the fix which enables the data structure to give a cleanup function for freeing the data structure.
Found one more 'dict' leak in metadata self-heal. This leak is present even in 3.5.x. Will be cloning this bug. Thanks a lot Anirban for raising the issue.
'dict' leak I mentioned above only exists in 3.5.x it seems. So the only leak in 3.4.2 is the one mentioned in comment-1
REVIEW: http://review.gluster.org/8541 (cluster/afr: Fix memory leak of file-path in self-heal-daemon) posted (#1) for review on release-3.4 by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/8541 (cluster/afr: Fix memory leak of file-path in self-heal-daemon) posted (#2) for review on release-3.4 by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/8541 committed in release-3.4 by Kaleb KEITHLEY (kkeithle) ------ commit f0ddba7e0913db505f1295e9b3b7d35ead9c4407 Author: Pranith Kumar K <pkarampu> Date: Tue Aug 26 12:59:47 2014 +0530 cluster/afr: Fix memory leak of file-path in self-heal-daemon Backport of http://review.gluster.org/4790 Note: Only the part which fixes the memory leak is backported shd event has path which needs to be freed as part of circular buffer cleanup. This patch introduces the functionality so that self-heal-daemon can use it. Change-Id: I3f3823d5587eda2fcb278f0fdb89123a31c9d786 BUG: 1119894 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/8541 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Ravishankar N <ravishankar> Reviewed-by: Kaleb KEITHLEY <kkeithle>