1201203 – glusterd OOM killed, when repeating volume set operation in a loop

Bug 1201203 - glusterd OOM killed, when repeating volume set operation in a loop

Summary: glusterd OOM killed, when repeating volume set operation in a loop

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:	core
Depends On:
Blocks:	1198968
TreeView+	depends on / blocked

Reported:	2015-03-12 10:11 UTC by Atin Mukherjee
Modified:	2016-04-28 23:49 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.7.0
Clone Of:	1198968
Environment:
Last Closed:	2015-05-14 17:29:19 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Atin Mukherjee 2015-03-12 10:11:20 UTC

+++ This bug was initially created as a clone of Bug #1198968 +++

Description of problem:
-----------------------
While repeating volume set operation in the loop, glusterd gets OOM killed

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
Mainline

How reproducible:
-----------------
Consistent

Steps to Reproduce:
-------------------
1. Create a 2 node 'Trusted Storage Pool' ( cluster )
2. Create 2 distribute volumes with 2 bricks ( 1 brick per node ) and start it
3. From NODE1, in one loop, keep repeating 'volume set' on one volume (say vol1)
4. Frome NODE2, in other loop, keep repeating 'volume set' on other volume ( say vol2 )

Actual results:
---------------
After an hour, glusterd got OOM killed.

Expected results:
-----------------
glusterd should get OOM Killed

--- Additional comment from SATHEESARAN on 2015-03-05 04:10:11 EST ---

glusterd statedump file while the memory consumption was high enough.
This was taken 10 minutes before glusterd got OOM Killed

--- Additional comment from SATHEESARAN on 2015-03-05 04:12:52 EST ---

sosreport as taken from NODE1, where the glusterd got OOM killed

--- Additional comment from SATHEESARAN on 2015-03-05 04:16:54 EST ---

Reproducible test case1 :
--------------------------
0. Create a 2 node cluster
1. Create a distribute volume and start it.
2. Create a shell script as follows :
while true; do gluster volume set <vol-name1> read-ahead on; done
3. Run the above script in the background.
4. From RHSS command line execute the following,
while true; do gluster volume set <vol-name2> write-behind on; done
5. Monitor the memory consumed by glusterd

Comment 1 Anand Avati 2015-03-12 10:19:20 UTC

REVIEW: http://review.gluster.org/9862 (core : free up mem_acct.rec in xlator_members_free) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 2 Anand Avati 2015-03-13 06:02:54 UTC

REVIEW: http://review.gluster.org/9862 (core : free up mem_acct.rec in xlator_members_free) posted (#2) for review on master by Atin Mukherjee (amukherj)

Comment 3 Anand Avati 2015-03-20 09:07:06 UTC

REVIEW: http://review.gluster.org/9862 (core : free up mem_acct.rec in xlator_destroy) posted (#3) for review on master by Atin Mukherjee (amukherj)

Comment 4 Anand Avati 2015-03-30 12:40:23 UTC

COMMIT: http://review.gluster.org/9862 committed in master by Kaleb KEITHLEY (kkeithle) 
------
commit 4481b03ae2e8ebbd091b0436b96e97707b4ec41f
Author: Atin Mukherjee <amukherj>
Date:   Thu Mar 12 15:43:56 2015 +0530

    core : free up mem_acct.rec in xlator_destroy
    
    Problem:
    We've observed that glusterd was OOM killed after some minutes when volume set
    command was run in a loop.
    
    Analysis:
    
    Initially the suspection was in glusterd code, but a deep dive into the codebase
    revealed that while validating all the options as part of graph reconfiguration
    at the time of freeing up the xlator object its one of the member mem_acct is
    left over which causes memory leak.
    
    Solution:
    
    Free up xlator's mem_acct.rec in xlator_destroy ()
    
    Change-Id: Ie9e7267e1ac4ab7b8af6e4d7c6660dfe99b4d641
    BUG: 1201203
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/9862
    Reviewed-by: Niels de Vos <ndevos>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Reviewed-by: Raghavendra Bhat <raghavendra>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

Comment 5 Niels de Vos 2015-05-14 17:29:19 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:35:53 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:38:15 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 8 Niels de Vos 2015-05-14 17:46:25 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.