+++ This bug was initially created as a clone of Bug #1198968 +++ Description of problem: ----------------------- While repeating volume set operation in the loop, glusterd gets OOM killed Version-Release number of selected component (if applicable): ------------------------------------------------------------- Mainline How reproducible: ----------------- Consistent Steps to Reproduce: ------------------- 1. Create a 2 node 'Trusted Storage Pool' ( cluster ) 2. Create 2 distribute volumes with 2 bricks ( 1 brick per node ) and start it 3. From NODE1, in one loop, keep repeating 'volume set' on one volume (say vol1) 4. Frome NODE2, in other loop, keep repeating 'volume set' on other volume ( say vol2 ) Actual results: --------------- After an hour, glusterd got OOM killed. Expected results: ----------------- glusterd should get OOM Killed --- Additional comment from SATHEESARAN on 2015-03-05 04:10:11 EST --- glusterd statedump file while the memory consumption was high enough. This was taken 10 minutes before glusterd got OOM Killed --- Additional comment from SATHEESARAN on 2015-03-05 04:12:52 EST --- sosreport as taken from NODE1, where the glusterd got OOM killed --- Additional comment from SATHEESARAN on 2015-03-05 04:16:54 EST --- Reproducible test case1 : -------------------------- 0. Create a 2 node cluster 1. Create a distribute volume and start it. 2. Create a shell script as follows : while true; do gluster volume set <vol-name1> read-ahead on; done 3. Run the above script in the background. 4. From RHSS command line execute the following, while true; do gluster volume set <vol-name2> write-behind on; done 5. Monitor the memory consumed by glusterd
REVIEW: http://review.gluster.org/9862 (core : free up mem_acct.rec in xlator_members_free) posted (#1) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9862 (core : free up mem_acct.rec in xlator_members_free) posted (#2) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/9862 (core : free up mem_acct.rec in xlator_destroy) posted (#3) for review on master by Atin Mukherjee (amukherj)
COMMIT: http://review.gluster.org/9862 committed in master by Kaleb KEITHLEY (kkeithle) ------ commit 4481b03ae2e8ebbd091b0436b96e97707b4ec41f Author: Atin Mukherjee <amukherj> Date: Thu Mar 12 15:43:56 2015 +0530 core : free up mem_acct.rec in xlator_destroy Problem: We've observed that glusterd was OOM killed after some minutes when volume set command was run in a loop. Analysis: Initially the suspection was in glusterd code, but a deep dive into the codebase revealed that while validating all the options as part of graph reconfiguration at the time of freeing up the xlator object its one of the member mem_acct is left over which causes memory leak. Solution: Free up xlator's mem_acct.rec in xlator_destroy () Change-Id: Ie9e7267e1ac4ab7b8af6e4d7c6660dfe99b4d641 BUG: 1201203 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: http://review.gluster.org/9862 Reviewed-by: Niels de Vos <ndevos> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Krishnan Parthasarathi <kparthas> Reviewed-by: Raghavendra Bhat <raghavendra> Reviewed-by: Kaleb KEITHLEY <kkeithle>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user