Bug 1529501

Summary: [shd] : shd occupies ~7.8G in memory while toggling cluster.self-heal-daemon in loop , possibly leaky.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: coreAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED ERRATA QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.3CC: amukherj, bkunal, moagrawa, nchilaka, ravishankar, rhinduja, rhs-bugs, sheggodu, storage-qa-internal, vdas
Target Milestone: ---Keywords: Rebase, Reopened
Target Release: RHGS 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-6.0-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-30 12:19:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1647277, 1725024, 1727329    
Bug Blocks: 1696807    

Description Ambarish 2017-12-28 12:31:54 UTC
Description of problem:
-------------------------

As a part of  verification of https://bugzilla.redhat.com/show_bug.cgi?id=1526363 , created 300  dist rep volumes

Bricks are multiplexed.

Then proceeded to do vol set in loop.

<snip>

for i in {1..300};do gluster volume create butcher$i replica 2 gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brickA$i;gluster v start butcher$i;sleep 2;done ;

followed by 

for i in {1..300};do gluster v set butcher$i cluster.self-heal-daemon off;sleep 3 ;gluster v set butcher$i group metadata-cache;sleep 3;gluster v set butcher$i cluster.lookup-optimize on;sleep 3 ;done


<snip>



Self heal daemon occupies almost 4.6G of Resident space in memory post all volume set operations.


**BEFORE VOL SET ** : 

[root@gqas008 /]# ps aux|grep glus
root      8078 12.4  2.6 28807468 1315220 ?    Ssl  05:13   0:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/e32d8903c5b60efed5cc4e725235c143.socket --xlator-option *replicate*.node-uuid=cedc8e7d-d3a0-47f2-a50e-ebe12fe964bc


**AFTER VOL SET ** :


[root@gqas008 /]# ps aux|grep glustershd
root      8078  3.0  9.4 31756588 4677648 ?    Ssl  05:13   3:56 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/e32d8903c5b60efed5cc4e725235c143.socket --xlator-option *replicate*.node-uuid=cedc8e7d-d3a0-47f2-a50e-ebe12fe964bc



Mem consumption increased from 1.3G to 4.6G.

Since the delta is massive , raising with high prio.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------

[root@gqas008 /]# rpm -qa|grep glus
glusterfs-libs-3.8.4-52.3.el7rhgs.x86_64
glusterfs-server-3.8.4-52.3.el7rhgs.x86_64


How reproducible:
------------------

2/2

Steps to Reproduce:
-------------------

1. Create lots of volumes of type dist-rep (say,300)

2. Disable self heal in loop



Actual results:
----------------

Drastic increase in mem consumption by shd post vol set operations.

Expected results:
------------------

Controlled memory consumption by shd.

Comment 18 RHEL Program Management 2018-05-18 07:12:46 UTC
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 24 Nag Pavan Chilakam 2019-07-18 05:48:21 UTC
have rerun the test mentioned in description.
Along with that I kept running  shd off/on for about ~520 volumes{inluding disp/arb/rep type}.
I saw that even after 5 iterations the resident memory was more or less stable at about 1.8-2GB.
Hence moving the bug to verified as I don't observe any substantial leak as mentioned in description.

Version:6.0.8 , test build issued after reverting shd-mux feature.

Comment 28 errata-xmlrpc 2019-10-30 12:19:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249