1120151 – Glustershd memory usage too high

Bug 1120151 - Glustershd memory usage too high

Summary: Glustershd memory usage too high

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	3.5.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	GlusterFS Bugs list
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1119894
Blocks:	glusterfs-3.5.2 1120245 1139586
TreeView+	depends on / blocked

Reported:	2014-07-16 10:37 UTC by Pranith Kumar K
Modified:	2014-09-09 09:12 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-3.5.2beta1
Clone Of:	1119894
Clones:	1120245 (view as bug list)
Environment:
Last Closed:	2014-07-31 11:43:51 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pranith Kumar K 2014-07-16 10:37:17 UTC

+++ This bug was initially created as a clone of Bug #1119894 +++

Description of problem:

I am running glusterfs 3.4.2 on linux kernel version 2.6.34.12 on two x86_64 board with 16 GB of RAM each. I have several gluster file-systems (close to 10)in twin-replicated mode containing around 4 GB of data aggregate. 

Sometimes, following reboot of boards, I observe that glustershd memory % in top output increases above 50% (over 8 GB) causing problems when trying to run other key processes. 

Version-Release number of selected component (if applicable):
glusterfs 3.4.2
linux kernel 2.6.34.12

How reproducible:
Intermittent. Our systems reboot very frequently and during testing we often format our disks to clean out the bricks and then add them back. So, there is quite a lot of 'uncontrolled' self heal going on on our systems.

Steps to Reproduce:
1. Remove all the bricks on one of the serves from all replicated volumes.
2. Erase the logical volumes that comprise these brcks.
3. Re-create the bricks and add them back to the replicated volumes causing massive heal of data.


Actual results:
Sometimes, maybe around once in 20-30 times glustershd memory usage exceeds 50% (8 GB) causing other applications to fail spawn/terminate abruptly. Work around is to  kill glustershd, and then restart /etc/init.d/glusterd to get the former to spawn back.

Expected results:

We would expect the memory usage to fall within a reasonable ceiling, say, 20%?

Additional info:

Please note that this bug is specifically for high memory consumption by the glusterfs self-heal daemon. I am aware that several other bugs exist in bugzilla catering to generic high memory consumption by glusterfs daemons, or maybe specific ones such as those pertaining to gfs nfs.

--- Additional comment from Pranith Kumar K on 2014-07-16 03:11:16 EDT ---

I took the statedump and found that the process is leaking 'path' from circular buffers it uses to remember the last 1024 entries that healed/failed/split-brain.
http://review.gluster.org/4790 has the fix which enables the data structure to give a cleanup function for freeing the data structure.

--- Additional comment from Pranith Kumar K on 2014-07-16 05:55:16 EDT ---

Found one more 'dict' leak in metadata self-heal. This leak is present even in 3.5.x. Will be cloning this bug. Thanks a lot Anirban for raising the issue.

--- Additional comment from Pranith Kumar K on 2014-07-16 06:34:57 EDT ---

'dict' leak I mentioned above only exists in 3.5.x it seems. So the only leak in 3.4.2 is the one mentioned in comment-1

Comment 1 Anand Avati 2014-07-16 10:45:54 UTC

REVIEW: http://review.gluster.org/8316 (cluster/afr: Fix leaks in self-heal code path) posted (#1) for review on release-3.5 by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2014-07-18 10:28:28 UTC

COMMIT: http://review.gluster.org/8316 committed in release-3.5 by Niels de Vos (ndevos) 
------
commit c7fbb78ec198968069821cb0769071d17df1c58b
Author: Pranith Kumar K <pkarampu>
Date:   Wed Jul 16 15:03:19 2014 +0530

    cluster/afr: Fix leaks in self-heal code path
    
    Change-Id: I5301ec9ebac27afe52e85cad75e6395d7f891355
    BUG: 1120151
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/8316
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Reviewed-by: Ravishankar N <ravishankar>
    Reviewed-by: Niels de Vos <ndevos>

Comment 3 Niels de Vos 2014-07-21 15:42:12 UTC

The first (and last?) Beta for GlusterFS 3.5.2 has been released [1]. Please verify if the release solves this bug report for you. In case the glusterfs-3.5.2beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-devel/2014-July/041636.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 4 Niels de Vos 2014-07-31 11:43:51 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.2, please reopen this bug report.

glusterfs-3.5.2 has been announced on the Gluster Users mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-July/041217.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.