Red Hat Bugzilla – Bug 1266877
Possible memory leak during rebalance with large quantity of files
Last modified: 2016-06-16 09:39:10 EDT
+++ This bug was initially created as a clone of Bug #1261234 +++
Description of problem:
Gluster distributed volume with 4 bricks fails to rebalance due to memory exhaustion.
I have a gluster distributed volume with 4 bricks on one physical server (this seems strange but there are reasons for this). Bricks are formatted with ext4. Volume spans 57T of storage space and currently contains ~2.5T in 30M files, mostly located on brick 1. Rebalance fix-layout completed successfully, but main rebalance fails to complete as server runs out of memory.
I've tried running
echo 2 > /proc/sys/vm/drop_caches
After approximately 24hrs server starts thrashing.
Version-Release number of selected component (if applicable):
glusterfs 3.7.3 built on Jul 28 2015 14:28:57
Steps to Reproduce:
1. Start rebalance
2. Wait ~24hrs
Server starts thrashing due to memory exhaustion.
Memory occupated by gluster remains relatively constant.
--- Additional comment from Susant Kumar Palai on 2015-09-16 14:26:12 MVT ---
Can you share rebalance logs? What was the mem-usage of rebalance process when it was OOM killed?
--- Additional comment from Max Gashkov on 2015-09-16 14:31:52 MVT ---
Rebalance log is rather large (about 600M), I can grep for specific strings if needed or share whole file privately (please indicate method for contacting you directly).
OOM didn't kill the process, I did. It was around 2G RES at the time and with the other glusterfsd processes it started swapping to the point when system became unstable.
--- Additional comment from Susant Kumar Palai on 2015-09-16 14:35:15 MVT ---
(In reply to Max Gashkov from comment #2)
> Rebalance log is rather large (about 600M), I can grep for specific strings
> if needed or share whole file privately (please indicate method for
> contacting you directly).
Can you grep for Error messages in rebalance log and update?
For contact:On IRC [#gluster] nick: [spalai]
> OOM didn't kill the process, I did. It was around 2G RES at the time and
> with the other glusterfsd processes it started swapping to the point when
> system became unstable.
Patch available at:
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.
glusterfs-3.8.0 has been announced on the Gluster mailinglists , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.