+++ This bug was initially created as a clone of Bug #1119894 +++
Description of problem:
I am running glusterfs 3.4.2 on linux kernel version 126.96.36.199 on two x86_64 board with 16 GB of RAM each. I have several gluster file-systems (close to 10)in twin-replicated mode containing around 4 GB of data aggregate.
Sometimes, following reboot of boards, I observe that glustershd memory % in top output increases above 50% (over 8 GB) causing problems when trying to run other key processes.
Version-Release number of selected component (if applicable):
linux kernel 188.8.131.52
Intermittent. Our systems reboot very frequently and during testing we often format our disks to clean out the bricks and then add them back. So, there is quite a lot of 'uncontrolled' self heal going on on our systems.
Steps to Reproduce:
1. Remove all the bricks on one of the serves from all replicated volumes.
2. Erase the logical volumes that comprise these brcks.
3. Re-create the bricks and add them back to the replicated volumes causing massive heal of data.
Sometimes, maybe around once in 20-30 times glustershd memory usage exceeds 50% (8 GB) causing other applications to fail spawn/terminate abruptly. Work around is to kill glustershd, and then restart /etc/init.d/glusterd to get the former to spawn back.
We would expect the memory usage to fall within a reasonable ceiling, say, 20%?
Please note that this bug is specifically for high memory consumption by the glusterfs self-heal daemon. I am aware that several other bugs exist in bugzilla catering to generic high memory consumption by glusterfs daemons, or maybe specific ones such as those pertaining to gfs nfs.
--- Additional comment from Pranith Kumar K on 2014-07-16 03:11:16 EDT ---
I took the statedump and found that the process is leaking 'path' from circular buffers it uses to remember the last 1024 entries that healed/failed/split-brain.
http://review.gluster.org/4790 has the fix which enables the data structure to give a cleanup function for freeing the data structure.
--- Additional comment from Pranith Kumar K on 2014-07-16 05:55:16 EDT ---
Found one more 'dict' leak in metadata self-heal. This leak is present even in 3.5.x. Will be cloning this bug. Thanks a lot Anirban for raising the issue.
--- Additional comment from Pranith Kumar K on 2014-07-16 06:34:57 EDT ---
'dict' leak I mentioned above only exists in 3.5.x it seems. So the only leak in 3.4.2 is the one mentioned in comment-1
REVIEW: http://review.gluster.org/8316 (cluster/afr: Fix leaks in self-heal code path) posted (#1) for review on release-3.5 by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/8316 committed in release-3.5 by Niels de Vos (ndevos)
Author: Pranith Kumar K <pkarampu>
Date: Wed Jul 16 15:03:19 2014 +0530
cluster/afr: Fix leaks in self-heal code path
Signed-off-by: Pranith Kumar K <pkarampu>
Tested-by: Gluster Build System <jenkins.com>
Reviewed-by: Krishnan Parthasarathi <kparthas>
Reviewed-by: Ravishankar N <ravishankar>
Reviewed-by: Niels de Vos <ndevos>
The first (and last?) Beta for GlusterFS 3.5.2 has been released . Please verify if the release solves this bug report for you. In case the glusterfs-3.5.2beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.
Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update (possibly an "updates-testing" repository) infrastructure for your distribution.
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.2, please reopen this bug report.
glusterfs-3.5.2 has been announced on the Gluster Users mailinglist , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.