Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1119894

Summary: Glustershd memory usage too high
Product: [Community] GlusterFS Reporter: Anirban Ghoshal <a.ghoshal>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.2CC: bugs, gluster-bugs, kkeithle, pkarampu
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1120151 (view as bug list) Environment:
Last Closed: 2015-04-13 07:11:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1120151, 1120245, 1125245, 1139586    

Description Anirban Ghoshal 2014-07-15 18:58:15 UTC
Description of problem:

I am running glusterfs 3.4.2 on linux kernel version 2.6.34.12 on two x86_64 board with 16 GB of RAM each. I have several gluster file-systems (close to 10)in twin-replicated mode containing around 4 GB of data aggregate. 

Sometimes, following reboot of boards, I observe that glustershd memory % in top output increases above 50% (over 8 GB) causing problems when trying to run other key processes. 

Version-Release number of selected component (if applicable):
glusterfs 3.4.2
linux kernel 2.6.34.12

How reproducible:
Intermittent. Our systems reboot very frequently and during testing we often format our disks to clean out the bricks and then add them back. So, there is quite a lot of 'uncontrolled' self heal going on on our systems.

Steps to Reproduce:
1. Remove all the bricks on one of the serves from all replicated volumes.
2. Erase the logical volumes that comprise these brcks.
3. Re-create the bricks and add them back to the replicated volumes causing massive heal of data.


Actual results:
Sometimes, maybe around once in 20-30 times glustershd memory usage exceeds 50% (8 GB) causing other applications to fail spawn/terminate abruptly. Work around is to  kill glustershd, and then restart /etc/init.d/glusterd to get the former to spawn back.

Expected results:

We would expect the memory usage to fall within a reasonable ceiling, say, 20%?

Additional info:

Please note that this bug is specifically for high memory consumption by the glusterfs self-heal daemon. I am aware that several other bugs exist in bugzilla catering to generic high memory consumption by glusterfs daemons, or maybe specific ones such as those pertaining to gfs nfs.

Comment 1 Pranith Kumar K 2014-07-16 07:11:16 UTC
I took the statedump and found that the process is leaking 'path' from circular buffers it uses to remember the last 1024 entries that healed/failed/split-brain.
http://review.gluster.org/4790 has the fix which enables the data structure to give a cleanup function for freeing the data structure.

Comment 2 Pranith Kumar K 2014-07-16 09:55:16 UTC
Found one more 'dict' leak in metadata self-heal. This leak is present even in 3.5.x. Will be cloning this bug. Thanks a lot Anirban for raising the issue.

Comment 3 Pranith Kumar K 2014-07-16 10:34:57 UTC
'dict' leak I mentioned above only exists in 3.5.x it seems. So the only leak in 3.4.2 is the one mentioned in comment-1

Comment 4 Anand Avati 2014-08-26 08:20:42 UTC
REVIEW: http://review.gluster.org/8541 (cluster/afr: Fix memory leak of file-path in self-heal-daemon) posted (#1) for review on release-3.4 by Pranith Kumar Karampuri (pkarampu)

Comment 5 Anand Avati 2014-08-26 08:26:23 UTC
REVIEW: http://review.gluster.org/8541 (cluster/afr: Fix memory leak of file-path in self-heal-daemon) posted (#2) for review on release-3.4 by Pranith Kumar Karampuri (pkarampu)

Comment 6 Anand Avati 2014-08-27 13:46:41 UTC
COMMIT: http://review.gluster.org/8541 committed in release-3.4 by Kaleb KEITHLEY (kkeithle) 
------
commit f0ddba7e0913db505f1295e9b3b7d35ead9c4407
Author: Pranith Kumar K <pkarampu>
Date:   Tue Aug 26 12:59:47 2014 +0530

    cluster/afr: Fix memory leak of file-path in self-heal-daemon
    
            Backport of http://review.gluster.org/4790
    
    Note: Only the part which fixes the memory leak is backported
    
    shd event has path which needs to be freed as part of circular buffer cleanup.
    This patch introduces the functionality so that self-heal-daemon can use it.
    
    Change-Id: I3f3823d5587eda2fcb278f0fdb89123a31c9d786
    BUG: 1119894
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/8541
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Ravishankar N <ravishankar>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>