1266877 – Possible memory leak during rebalance with large quantity of files

Bug 1266877 - Possible memory leak during rebalance with large quantity of files

Summary: Possible memory leak during rebalance with large quantity of files

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	---
Assignee:	Susant Kumar Palai
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1261234
Blocks:	1272933
TreeView+	depends on / blocked

Reported:	2015-09-28 10:49 UTC by Susant Kumar Palai
Modified:	2016-06-16 13:39 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.8rc2
Clone Of:	1261234
Clones:	1272933 (view as bug list)
Environment:
Last Closed:	2016-06-16 13:39:10 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Susant Kumar Palai 2015-09-28 10:49:45 UTC

+++ This bug was initially created as a clone of Bug #1261234 +++

Description of problem:
Gluster distributed volume with 4 bricks fails to rebalance due to memory exhaustion.

I have a gluster distributed volume with 4 bricks on one physical server (this seems strange but there are reasons for this). Bricks are formatted with ext4. Volume spans 57T of storage space and currently contains ~2.5T in 30M files, mostly located on brick 1. Rebalance fix-layout completed successfully, but main rebalance fails to complete as server runs out of memory.

I've tried running
echo 2 > /proc/sys/vm/drop_caches

After approximately 24hrs server starts thrashing.

Version-Release number of selected component (if applicable):
glusterfs 3.7.3 built on Jul 28 2015 14:28:57

How reproducible:
Always

Steps to Reproduce:
1. Start rebalance
2. Wait ~24hrs

Actual results:
Server starts thrashing due to memory exhaustion.

Expected results:
Memory occupated by gluster remains relatively constant.

--- Additional comment from Susant Kumar Palai on 2015-09-16 14:26:12 MVT ---

Hi Max,
Can you share rebalance logs? What was the mem-usage of rebalance process when it was OOM killed?

--- Additional comment from Max Gashkov on 2015-09-16 14:31:52 MVT ---

Hi,

Rebalance log is rather large (about 600M), I can grep for specific strings if needed or share whole file privately (please indicate method for contacting you directly).

OOM didn't kill the process, I did. It was around 2G RES at the time and with the other glusterfsd processes it started swapping to the point when system became unstable.

--- Additional comment from Susant Kumar Palai on 2015-09-16 14:35:15 MVT ---

(In reply to Max Gashkov from comment #2)
> Hi,
> 
> Rebalance log is rather large (about 600M), I can grep for specific strings
> if needed or share whole file privately (please indicate method for
> contacting you directly).
Can you grep for Error messages in rebalance log and update?
For contact:On IRC [#gluster]  nick: [spalai]
> 
> OOM didn't kill the process, I did. It was around 2G RES at the time and
> with the other glusterfsd processes it started swapping to the point when
> system became unstable.

Comment 1 Nithya Balachandran 2015-09-29 06:14:44 UTC

Patch available at:

http://review.gluster.org/#/c/12235/

Comment 2 Niels de Vos 2016-06-16 13:39:10 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.