Bug 1147427 - High memory usage by rebalance process
Summary: High memory usage by rebalance process
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.0.3
Assignee: Krutika Dhananjay
QA Contact: Amit Chaurasia
URL:
Whiteboard:
Depends On:
Blocks: 1162694
TreeView+ depends on / blocked
 
Reported: 2014-09-29 09:06 UTC by Krutika Dhananjay
Modified: 2015-10-28 00:10 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.6.0.31-1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1144413
Environment:
Last Closed: 2015-01-15 13:40:33 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0038 0 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #3 2015-01-15 18:35:28 UTC

Description Krutika Dhananjay 2014-09-29 09:06:14 UTC
+++ This bug was initially created as a clone of Bug #1144413 +++

Description of problem:

There are 2 dict_t memory leaks in rebalance process' codepath for every file that is migrated successfully.

In other words, the amount of memory leaked would be equal to 2*sizeof(each dict_t)*(number of files successfully migrated).

One community user had reported OOM kill of rebalance process while he was trying to migrate data of the order of few TBs. The bug report can be found at 
https://bugzilla.redhat.com/show_bug.cgi?id=1142052.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Add-brick + rebalance with large amount of data to be migrated.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Krutika Dhananjay on 2014-09-19 06:39:21 EDT ---

I performed this test locally on about 5GB of data on the mount, which contained 10 linux kernel untars, and took statedump of the rebalance daemons once in every 30 seconds. At the end of migration, there were about 3 lakh dict_t objects that were allocated and not freed.

Comment 2 Krutika Dhananjay 2014-10-31 07:24:16 UTC
Patch merged.

Comment 4 Amit Chaurasia 2014-11-26 08:17:22 UTC
Verified the bug using multiple scenarios:

1. First created nearly 50k files on the root of the mount point and performed rebalance.
2. The files were put in a sub-folder and performed the rebalance.
3. Then created a deep directory structure with the depth of 25 sub-folders and nearly 17 lakh(1.7 million) files scattered in those folders. 
4. Performed rebalance after adding bricks.

Each time recorded the statedump of the rebalance process and monitored the memory usage using top and vmstat. 

In statedump, the hot-count for dict_t hovered between 20-30 while the cold-count between 4060-4080.

The memory consumption of the whole glusterd process never crossed more than 4%.

Seems there is no memory leak issue. Marking the bug verified.

Comment 5 Amit Chaurasia 2014-11-26 08:18:01 UTC
Verified the bug using multiple scenarios:

1. First created nearly 50k files on the root of the mount point and performed rebalance.
2. The files were put in a sub-folder and performed the rebalance.
3. Then created a deep directory structure with the depth of 25 sub-folders and nearly 17 lakh(1.7 million) files scattered in those folders. 
4. Performed rebalance after adding bricks.

Each time recorded the statedump of the rebalance process and monitored the memory usage using top and vmstat. 

In statedump, the hot-count for dict_t hovered between 20-30 while the cold-count between 4060-4080.

The memory consumption of the whole glusterd process never crossed more than 4%.

Seems there is no memory leak issue. Marking the bug verified.

Comment 7 errata-xmlrpc 2015-01-15 13:40:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html


Note You need to log in before you can comment on or make changes to this bug.