1529501 – [shd] : shd occupies ~7.8G in memory while toggling cluster.self-heal-daemon in loop , possibly leaky.

Bug 1529501 - [shd] : shd occupies ~7.8G in memory while toggling cluster.self-heal-daemon in loop , possibly leaky.

Summary: [shd] : shd occupies ~7.8G in memory while toggling cluster.self-heal-daemon ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.3
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.0
Assignee:	Mohit Agrawal
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:	RHGS34MemoryLeak 1725024 1727329
Blocks:	1696807
TreeView+	depends on / blocked

Reported:	2017-12-28 12:31 UTC by Ambarish
Modified:	2019-10-30 12:20 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-6.0-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-30 12:19:37 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3319451	0	None	None	None	2018-01-16 08:24:38 UTC
Red Hat Product Errata	RHEA-2019:3249	0	None	None	None	2019-10-30 12:20:11 UTC

Description Ambarish 2017-12-28 12:31:54 UTC

Description of problem:
-------------------------

As a part of  verification of https://bugzilla.redhat.com/show_bug.cgi?id=1526363 , created 300  dist rep volumes

Bricks are multiplexed.

Then proceeded to do vol set in loop.

<snip>

for i in {1..300};do gluster volume create butcher$i replica 2 gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i;gluster v start butcher$i;sleep 2;done ;

followed by 

for i in {1..300};do gluster v set butcher$i cluster.self-heal-daemon off;sleep 3 ;gluster v set butcher$i group metadata-cache;sleep 3;gluster v set butcher$i cluster.lookup-optimize on;sleep 3 ;done


<snip>



Self heal daemon occupies almost 4.6G of Resident space in memory post all volume set operations.


**BEFORE VOL SET ** : 

[root@gqas008 /]# ps aux|grep glus
root      8078 12.4  2.6 28807468 1315220 ?    Ssl  05:13   0:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/e32d8903c5b60efed5cc4e725235c143.socket --xlator-option *replicate*.node-uuid=cedc8e7d-d3a0-47f2-a50e-ebe12fe964bc


**AFTER VOL SET ** :


[root@gqas008 /]# ps aux|grep glustershd
root      8078  3.0  9.4 31756588 4677648 ?    Ssl  05:13   3:56 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/e32d8903c5b60efed5cc4e725235c143.socket --xlator-option *replicate*.node-uuid=cedc8e7d-d3a0-47f2-a50e-ebe12fe964bc



Mem consumption increased from 1.3G to 4.6G.

Since the delta is massive , raising with high prio.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------

[root@gqas008 /]# rpm -qa|grep glus
glusterfs-libs-3.8.4-52.3.el7rhgs.x86_64
glusterfs-server-3.8.4-52.3.el7rhgs.x86_64


How reproducible:
------------------

2/2

Steps to Reproduce:
-------------------

1. Create lots of volumes of type dist-rep (say,300)

2. Disable self heal in loop



Actual results:
----------------

Drastic increase in mem consumption by shd post vol set operations.

Expected results:
------------------

Controlled memory consumption by shd.

Comment 18 RHEL Program Management 2018-05-18 07:12:46 UTC

Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 24 Nag Pavan Chilakam 2019-07-18 05:48:21 UTC

have rerun the test mentioned in description.
Along with that I kept running  shd off/on for about ~520 volumes{inluding disp/arb/rep type}.
I saw that even after 5 iterations the resident memory was more or less stable at about 1.8-2GB.
Hence moving the bug to verified as I don't observe any substantial leak as mentioned in description.

Version:6.0.8 , test build issued after reverting shd-mux feature.

Comment 28 errata-xmlrpc 2019-10-30 12:19:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249

Note You need to log in before you can comment on or make changes to this bug.