Bug 1198968

Summary: glusterd OOM killed, when repeating volume set operation in a loop
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED WONTFIX QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: amukherj, nlevinki, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: glusterd
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1201203 (view as bug list) Environment:
Last Closed: 2016-02-11 17:24:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1201203    
Bug Blocks: 1202842    
Attachments:
Description Flags
glusterd statedump file
none
sosreport from the machine none

Description SATHEESARAN 2015-03-05 09:07:49 UTC
Description of problem:
-----------------------
While repeating volume set operation in the loop, glusterd gets OOM killed

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHS 3.0.4 Nightly build ( glusterfs-3.6.0.48-1.el6rhs )

How reproducible:
-----------------
Consistent

Steps to Reproduce:
-------------------
1. Create a 2 node 'Trusted Storage Pool' ( cluster )
2. Create 2 distribute volumes with 2 bricks ( 1 brick per node ) and start it
3. From NODE1, in one loop, keep repeating 'volume set' on one volume (say vol1)
4. Frome NODE2, in other loop, keep repeating 'volume set' on other volume ( say vol2 )

Actual results:
---------------
After an hour, glusterd got OOM killed.

Expected results:
-----------------
glusterd should get OOM Killed

Comment 1 SATHEESARAN 2015-03-05 09:10:11 UTC
Created attachment 998243 [details]
glusterd statedump file

glusterd statedump file while the memory consumption was high enough.
This was taken 10 minutes before glusterd got OOM Killed

Comment 2 SATHEESARAN 2015-03-05 09:12:52 UTC
Created attachment 998246 [details]
sosreport from the machine

sosreport as taken from NODE1, where the glusterd got OOM killed

Comment 3 SATHEESARAN 2015-03-05 09:16:54 UTC
Reproducible test case1 :
--------------------------
0. Create a 2 node cluster
1. Create a distribute volume and start it.
2. Create a shell script as follows :
while true; do gluster volume set <vol-name1> read-ahead on; done
3. Run the above script in the background.
4. From RHSS command line execute the following,
while true; do gluster volume set <vol-name2> write-behind on; done
5. Monitor the memory consumed by glusterd

Comment 4 SATHEESARAN 2015-03-05 09:18:45 UTC
Setting the volume option repeatedly in a loop is neither a viable usecase nor any customer would do that in loop for that prolonged time. Not raising this bug as a BLOCKER based on that fact.

Comment 6 Atin Mukherjee 2015-06-22 05:15:57 UTC
Moving back to Post state as this bug doesn't have all the acks to get to the ON_QA state.

Comment 8 Atin Mukherjee 2015-06-22 06:19:54 UTC
As this bug has got all the acks now, moving it to ON_QA

Comment 9 SATHEESARAN 2015-07-01 10:22:09 UTC
Tested with RHGS 3.1 Nightly build ( glusterfs-3.7.1-6.el6rhs )

Still the memory consumption of glusterd seems to hike, when performing operation as mentioned in comment3

I have captured memory consumed by glusterd at various point of times :

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
 9181 root      20   0  648m  51m 3984 S 20.6  0.6   0:10.72 glusterd

 9181 root      20   0 2696m 2.1g 3984 S 53.8 27.0  16:53.95 glusterd 

 9181 root      20   0 3230m 2.6g 4344 S 43.8 33.4  23:39.32 glusterd  

 9181 root      20   0 3592m 3.0g 3984 S 50.5 38.0  29:07.59 glusterd 
                         
 9181 root      20   0 4040m 3.4g 3984 S 54.8 43.6  36:59.16 glusterd 

 9181 root      20   0 4616m 4.0g 3984 S 47.8 51.2  49:08.89 glusterd 

Marking this bug as FailedQA, as the memory usage of glusterd is still growing with volume set operations in loop

Comment 15 Atin Mukherjee 2016-02-11 17:24:57 UTC
We are not planning to fix this sooner since this is a use case which wouldn't be tried out in production setup. Considering that I am closing this bug. Please feel to reopen if you think otherwise.