Bug 1198968 - glusterd OOM killed, when repeating volume set operation in a loop
Summary: glusterd OOM killed, when repeating volume set operation in a loop
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
Target Milestone: ---
: ---
Assignee: Atin Mukherjee
Whiteboard: glusterd
Depends On: 1201203
Blocks: 1202842
TreeView+ depends on / blocked
Reported: 2015-03-05 09:07 UTC by SATHEESARAN
Modified: 2016-02-11 17:24 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1201203 (view as bug list)
Last Closed: 2016-02-11 17:24:57 UTC
Target Upstream Version:

Attachments (Terms of Use)
glusterd statedump file (14.43 KB, text/plain)
2015-03-05 09:10 UTC, SATHEESARAN
no flags Details
sosreport from the machine (13.05 MB, application/x-xz)
2015-03-05 09:12 UTC, SATHEESARAN
no flags Details

Description SATHEESARAN 2015-03-05 09:07:49 UTC
Description of problem:
While repeating volume set operation in the loop, glusterd gets OOM killed

Version-Release number of selected component (if applicable):
RHS 3.0.4 Nightly build ( glusterfs- )

How reproducible:

Steps to Reproduce:
1. Create a 2 node 'Trusted Storage Pool' ( cluster )
2. Create 2 distribute volumes with 2 bricks ( 1 brick per node ) and start it
3. From NODE1, in one loop, keep repeating 'volume set' on one volume (say vol1)
4. Frome NODE2, in other loop, keep repeating 'volume set' on other volume ( say vol2 )

Actual results:
After an hour, glusterd got OOM killed.

Expected results:
glusterd should get OOM Killed

Comment 1 SATHEESARAN 2015-03-05 09:10:11 UTC
Created attachment 998243 [details]
glusterd statedump file

glusterd statedump file while the memory consumption was high enough.
This was taken 10 minutes before glusterd got OOM Killed

Comment 2 SATHEESARAN 2015-03-05 09:12:52 UTC
Created attachment 998246 [details]
sosreport from the machine

sosreport as taken from NODE1, where the glusterd got OOM killed

Comment 3 SATHEESARAN 2015-03-05 09:16:54 UTC
Reproducible test case1 :
0. Create a 2 node cluster
1. Create a distribute volume and start it.
2. Create a shell script as follows :
while true; do gluster volume set <vol-name1> read-ahead on; done
3. Run the above script in the background.
4. From RHSS command line execute the following,
while true; do gluster volume set <vol-name2> write-behind on; done
5. Monitor the memory consumed by glusterd

Comment 4 SATHEESARAN 2015-03-05 09:18:45 UTC
Setting the volume option repeatedly in a loop is neither a viable usecase nor any customer would do that in loop for that prolonged time. Not raising this bug as a BLOCKER based on that fact.

Comment 6 Atin Mukherjee 2015-06-22 05:15:57 UTC
Moving back to Post state as this bug doesn't have all the acks to get to the ON_QA state.

Comment 8 Atin Mukherjee 2015-06-22 06:19:54 UTC
As this bug has got all the acks now, moving it to ON_QA

Comment 9 SATHEESARAN 2015-07-01 10:22:09 UTC
Tested with RHGS 3.1 Nightly build ( glusterfs-3.7.1-6.el6rhs )

Still the memory consumption of glusterd seems to hike, when performing operation as mentioned in comment3

I have captured memory consumed by glusterd at various point of times :

 9181 root      20   0  648m  51m 3984 S 20.6  0.6   0:10.72 glusterd

 9181 root      20   0 2696m 2.1g 3984 S 53.8 27.0  16:53.95 glusterd 

 9181 root      20   0 3230m 2.6g 4344 S 43.8 33.4  23:39.32 glusterd  

 9181 root      20   0 3592m 3.0g 3984 S 50.5 38.0  29:07.59 glusterd 
 9181 root      20   0 4040m 3.4g 3984 S 54.8 43.6  36:59.16 glusterd 

 9181 root      20   0 4616m 4.0g 3984 S 47.8 51.2  49:08.89 glusterd 

Marking this bug as FailedQA, as the memory usage of glusterd is still growing with volume set operations in loop

Comment 15 Atin Mukherjee 2016-02-11 17:24:57 UTC
We are not planning to fix this sooner since this is a use case which wouldn't be tried out in production setup. Considering that I am closing this bug. Please feel to reopen if you think otherwise.

Note You need to log in before you can comment on or make changes to this bug.