1119894 – Glustershd memory usage too high

Bug 1119894 - Glustershd memory usage too high

Summary: Glustershd memory usage too high

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	3.4.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1120151 1120245 glusterfs-3.4.6 1139586
TreeView+	depends on / blocked

Reported:	2014-07-15 18:58 UTC by Anirban Ghoshal
Modified:	2015-12-01 16:45 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Clones:	1120151 (view as bug list)
Environment:
Last Closed:	2015-04-13 07:11:10 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Anirban Ghoshal 2014-07-15 18:58:15 UTC

Description of problem:

I am running glusterfs 3.4.2 on linux kernel version 2.6.34.12 on two x86_64 board with 16 GB of RAM each. I have several gluster file-systems (close to 10)in twin-replicated mode containing around 4 GB of data aggregate.

Sometimes, following reboot of boards, I observe that glustershd memory % in top output increases above 50% (over 8 GB) causing problems when trying to run other key processes.

Version-Release number of selected component (if applicable):
glusterfs 3.4.2
linux kernel 2.6.34.12

How reproducible:
Intermittent. Our systems reboot very frequently and during testing we often format our disks to clean out the bricks and then add them back. So, there is quite a lot of 'uncontrolled' self heal going on on our systems.

Steps to Reproduce:
1. Remove all the bricks on one of the serves from all replicated volumes.
2. Erase the logical volumes that comprise these brcks.
3. Re-create the bricks and add them back to the replicated volumes causing massive heal of data.

Actual results:
Sometimes, maybe around once in 20-30 times glustershd memory usage exceeds 50% (8 GB) causing other applications to fail spawn/terminate abruptly. Work around is to kill glustershd, and then restart /etc/init.d/glusterd to get the former to spawn back.

Expected results:

We would expect the memory usage to fall within a reasonable ceiling, say, 20%?

Additional info:

Please note that this bug is specifically for high memory consumption by the glusterfs self-heal daemon. I am aware that several other bugs exist in bugzilla catering to generic high memory consumption by glusterfs daemons, or maybe specific ones such as those pertaining to gfs nfs.

Comment 1 Pranith Kumar K 2014-07-16 07:11:16 UTC

I took the statedump and found that the process is leaking 'path' from circular buffers it uses to remember the last 1024 entries that healed/failed/split-brain.
http://review.gluster.org/4790 has the fix which enables the data structure to give a cleanup function for freeing the data structure.

Comment 2 Pranith Kumar K 2014-07-16 09:55:16 UTC

Found one more 'dict' leak in metadata self-heal. This leak is present even in 3.5.x. Will be cloning this bug. Thanks a lot Anirban for raising the issue.

Comment 3 Pranith Kumar K 2014-07-16 10:34:57 UTC

'dict' leak I mentioned above only exists in 3.5.x it seems. So the only leak in 3.4.2 is the one mentioned in comment-1

Comment 4 Anand Avati 2014-08-26 08:20:42 UTC

REVIEW: http://review.gluster.org/8541 (cluster/afr: Fix memory leak of file-path in self-heal-daemon) posted (#1) for review on release-3.4 by Pranith Kumar Karampuri (pkarampu)

Comment 5 Anand Avati 2014-08-26 08:26:23 UTC

REVIEW: http://review.gluster.org/8541 (cluster/afr: Fix memory leak of file-path in self-heal-daemon) posted (#2) for review on release-3.4 by Pranith Kumar Karampuri (pkarampu)

Comment 6 Anand Avati 2014-08-27 13:46:41 UTC

COMMIT: http://review.gluster.org/8541 committed in release-3.4 by Kaleb KEITHLEY (kkeithle) 
------
commit f0ddba7e0913db505f1295e9b3b7d35ead9c4407
Author: Pranith Kumar K <pkarampu>
Date:   Tue Aug 26 12:59:47 2014 +0530

    cluster/afr: Fix memory leak of file-path in self-heal-daemon
    
            Backport of http://review.gluster.org/4790
    
    Note: Only the part which fixes the memory leak is backported
    
    shd event has path which needs to be freed as part of circular buffer cleanup.
    This patch introduces the functionality so that self-heal-daemon can use it.
    
    Change-Id: I3f3823d5587eda2fcb278f0fdb89123a31c9d786
    BUG: 1119894
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/8541
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Ravishankar N <ravishankar>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

Note You need to log in before you can comment on or make changes to this bug.