Bug 1529501

Summary:	[shd] : shd occupies ~7.8G in memory while toggling cluster.self-heal-daemon in loop , possibly leaky.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Ambarish <asoman>
Component:	core	Assignee:	Mohit Agrawal <moagrawa>
Status:	CLOSED ERRATA	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.3	CC:	amukherj, bkunal, moagrawa, nchilaka, ravishankar, rhinduja, rhs-bugs, sheggodu, storage-qa-internal, vdas
Target Milestone:	---	Keywords:	Rebase, Reopened
Target Release:	RHGS 3.5.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-6.0-1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-30 12:19:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1647277, 1725024, 1727329
Bug Blocks:	1696807

Description Ambarish 2017-12-28 12:31:54 UTC

Description of problem:
-------------------------

As a part of  verification of https://bugzilla.redhat.com/show_bug.cgi?id=1526363 , created 300  dist rep volumes

Bricks are multiplexed.

Then proceeded to do vol set in loop.

<snip>

for i in {1..300};do gluster volume create butcher$i replica 2 gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i;gluster v start butcher$i;sleep 2;done ;

followed by 

for i in {1..300};do gluster v set butcher$i cluster.self-heal-daemon off;sleep 3 ;gluster v set butcher$i group metadata-cache;sleep 3;gluster v set butcher$i cluster.lookup-optimize on;sleep 3 ;done


<snip>



Self heal daemon occupies almost 4.6G of Resident space in memory post all volume set operations.


**BEFORE VOL SET ** : 

[root@gqas008 /]# ps aux|grep glus
root      8078 12.4  2.6 28807468 1315220 ?    Ssl  05:13   0:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/e32d8903c5b60efed5cc4e725235c143.socket --xlator-option *replicate*.node-uuid=cedc8e7d-d3a0-47f2-a50e-ebe12fe964bc


**AFTER VOL SET ** :


[root@gqas008 /]# ps aux|grep glustershd
root      8078  3.0  9.4 31756588 4677648 ?    Ssl  05:13   3:56 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/e32d8903c5b60efed5cc4e725235c143.socket --xlator-option *replicate*.node-uuid=cedc8e7d-d3a0-47f2-a50e-ebe12fe964bc



Mem consumption increased from 1.3G to 4.6G.

Since the delta is massive , raising with high prio.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------

[root@gqas008 /]# rpm -qa|grep glus
glusterfs-libs-3.8.4-52.3.el7rhgs.x86_64
glusterfs-server-3.8.4-52.3.el7rhgs.x86_64


How reproducible:
------------------

2/2

Steps to Reproduce:
-------------------

1. Create lots of volumes of type dist-rep (say,300)

2. Disable self heal in loop



Actual results:
----------------

Drastic increase in mem consumption by shd post vol set operations.

Expected results:
------------------

Controlled memory consumption by shd.

Comment 18 RHEL Program Management 2018-05-18 07:12:46 UTC

Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 24 Nag Pavan Chilakam 2019-07-18 05:48:21 UTC

have rerun the test mentioned in description.
Along with that I kept running  shd off/on for about ~520 volumes{inluding disp/arb/rep type}.
I saw that even after 5 iterations the resident memory was more or less stable at about 1.8-2GB.
Hence moving the bug to verified as I don't observe any substantial leak as mentioned in description.

Version:6.0.8 , test build issued after reverting shd-mux feature.

Comment 28 errata-xmlrpc 2019-10-30 12:19:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249