Bug 1198968

Summary:

glusterd OOM killed, when repeating volume set operation in a loop

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

SATHEESARAN <sasundar>

Component:

glusterd

Assignee:

Atin Mukherjee <amukherj>

Status:

CLOSED WONTFIX

QA Contact:

SATHEESARAN <sasundar>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

rhgs-3.0

CC:

amukherj, nlevinki, vbellur

Target Milestone:

---

Keywords:

ZStream

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

glusterd

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

1201203 (view as bug list)

Environment:

Last Closed:

2016-02-11 17:24:57 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1201203

Bug Blocks:

1202842

Attachments:

Description	Flags
glusterd statedump file	none
sosreport from the machine	none

Description SATHEESARAN 2015-03-05 09:07:49 UTC

Description of problem:
-----------------------
While repeating volume set operation in the loop, glusterd gets OOM killed

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHS 3.0.4 Nightly build ( glusterfs-3.6.0.48-1.el6rhs )

How reproducible:
-----------------
Consistent

Steps to Reproduce:
-------------------
1. Create a 2 node 'Trusted Storage Pool' ( cluster )
2. Create 2 distribute volumes with 2 bricks ( 1 brick per node ) and start it
3. From NODE1, in one loop, keep repeating 'volume set' on one volume (say vol1)
4. Frome NODE2, in other loop, keep repeating 'volume set' on other volume ( say vol2 )

Actual results:
---------------
After an hour, glusterd got OOM killed.

Expected results:
-----------------
glusterd should get OOM Killed

Comment 1 SATHEESARAN 2015-03-05 09:10:11 UTC

Created attachment 998243 [details]
glusterd statedump file

glusterd statedump file while the memory consumption was high enough.
This was taken 10 minutes before glusterd got OOM Killed

Comment 2 SATHEESARAN 2015-03-05 09:12:52 UTC

Created attachment 998246 [details]
sosreport from the machine

sosreport as taken from NODE1, where the glusterd got OOM killed

Comment 3 SATHEESARAN 2015-03-05 09:16:54 UTC

Reproducible test case1 :
--------------------------
0. Create a 2 node cluster
1. Create a distribute volume and start it.
2. Create a shell script as follows :
while true; do gluster volume set <vol-name1> read-ahead on; done
3. Run the above script in the background.
4. From RHSS command line execute the following,
while true; do gluster volume set <vol-name2> write-behind on; done
5. Monitor the memory consumed by glusterd

Comment 4 SATHEESARAN 2015-03-05 09:18:45 UTC

Setting the volume option repeatedly in a loop is neither a viable usecase nor any customer would do that in loop for that prolonged time. Not raising this bug as a BLOCKER based on that fact.

Comment 6 Atin Mukherjee 2015-06-22 05:15:57 UTC

Moving back to Post state as this bug doesn't have all the acks to get to the ON_QA state.

Comment 8 Atin Mukherjee 2015-06-22 06:19:54 UTC

As this bug has got all the acks now, moving it to ON_QA

Comment 9 SATHEESARAN 2015-07-01 10:22:09 UTC

Tested with RHGS 3.1 Nightly build ( glusterfs-3.7.1-6.el6rhs )

Still the memory consumption of glusterd seems to hike, when performing operation as mentioned in comment3

I have captured memory consumed by glusterd at various point of times :

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
 9181 root      20   0  648m  51m 3984 S 20.6  0.6   0:10.72 glusterd

 9181 root      20   0 2696m 2.1g 3984 S 53.8 27.0  16:53.95 glusterd 

 9181 root      20   0 3230m 2.6g 4344 S 43.8 33.4  23:39.32 glusterd  

 9181 root      20   0 3592m 3.0g 3984 S 50.5 38.0  29:07.59 glusterd 
                         
 9181 root      20   0 4040m 3.4g 3984 S 54.8 43.6  36:59.16 glusterd 

 9181 root      20   0 4616m 4.0g 3984 S 47.8 51.2  49:08.89 glusterd 

Marking this bug as FailedQA, as the memory usage of glusterd is still growing with volume set operations in loop

Comment 15 Atin Mukherjee 2016-02-11 17:24:57 UTC

We are not planning to fix this sooner since this is a use case which wouldn't be tried out in production setup. Considering that I am closing this bug. Please feel to reopen if you think otherwise.