Bug 1550339

Summary: glusterd leaks memory when vol status is issued
Product: [Community] GlusterFS Reporter: Gaurav Yadav <gyadav>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: amukherj, bmekala, bugs, nchilaka, rhinduja, rhs-bugs, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-v4.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1529451 Environment:
Last Closed: 2018-06-20 18:01:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gaurav Yadav 2018-03-01 04:02:19 UTC
Description of problem:
====================
glusterd seems to leak memory when vol status is issued
memory is not being released

GlusterD Observations:
Glusterd memory consumption has increased by about 200MB (ie 125.4MB before vol status -> 327.4 MB at end of vol status loop)
Also the memory is not being released
On a average the resident memory increases by about 30KB-400KB , if we issue vol status for 300volumes (one time)



Version-Release number of selected component (if applicable):
glusterfs-3.8.4-52.3.el7rhgs.x86_64

How reproducible:
============
consistent

Steps to Reproduce:
1.Created a 6 node cluster
2.Started monitoring resources
3.Created 300 volumes all 1x(4+2) ec volumes
Note: there are 3 lvs in each node, and each lv hosts about 20 bricks , but on different sub-dirs
4.Started all volumes in sequence
5.Now issued volume status in a loop of 2000 times as below
for l in {1..2000};do gluster v status;done

 Bala Konda Reddy M 2018-01-03 05:31:44 EST

Glusterd memory is increasing, when issued vol status concurrently on all the nodes in the trusted storage pool of 3 nodes. 


1. Created 30 1*3 replica volumes on a 3 node cluster
2. Performed vol status on all three nodes in a loop for 1 hour
3. glusterd memory shooted upto 500 MB in 1 hour

Before starting vol status in loop
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
20499 root      20   0  672228  27032   4392 S   0.0  0.3   0:32.34 glusterd

After 1 hour

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
20499 root      20   0 1139176 528012   4392 S  64.7  6.6  63:21.12 glusterd

Statedumps before and after the vol status
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/bmekala/bug.1529451/

As mer my debugging I found that, gf_common_mt_strdup structure is consuming more moemory. Num_alloc are keep on growing for every status call for a volume.

In simple mathematical terms if we have 10 volumes and volume status is being executed 10 times, then num_alloc for gf_common_mt_strdup is gettinng increased by 10 * 10 = 100.

In locking phase a timer is being introduced, to take care of stale lock issue, which has caused this leak.

Comment 1 Worker Ant 2018-03-01 09:32:18 UTC
REVIEW: https://review.gluster.org/19651 (glusterd : memory leak in mgmt_v3 lock functionality) posted (#1) for review on master by Gaurav Yadav

Comment 2 Worker Ant 2018-03-06 08:05:53 UTC
COMMIT: https://review.gluster.org/19651 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd : memory leak in mgmt_v3 lock functionality

In order to take care of stale lock issue, a timer was intrduced
in mgmt_v3 lock. This timer is not freeing the memory due to
which this leak got introduced

With this fix now memory cleanup in locking is handled properly

Change-Id: I2e1ce3ebba3520f7660321f3d97554080e4e22f4
BUG: 1550339
Signed-off-by: Gaurav Yadav <gyadav>

Comment 3 Worker Ant 2018-03-15 05:02:37 UTC
REVIEW: https://review.gluster.org/19723 (glusterd: glusterd crash in gd_mgmt_v3_unlock_timer_cbk) posted (#1) for review on master by Gaurav Yadav

Comment 4 Worker Ant 2018-03-19 03:08:19 UTC
COMMIT: https://review.gluster.org/19723 committed in master by "Gaurav Yadav" <gyadav> with a commit message- glusterd: glusterd crash in gd_mgmt_v3_unlock_timer_cbk

Memory cleanup of same pointer twice inside gd_mgmt_v3_unlock_timer_cbk
causing glusterd to crash.

Change-Id: I9147241d995780619474047b1010317a89b9965a
BUG: 1550339

Comment 6 Worker Ant 2018-04-01 04:46:22 UTC
REVIEW: https://review.gluster.org/19801 (glusterd: fix txn_opinfo memory leak) posted (#1) for review on master by Atin Mukherjee

Comment 7 Worker Ant 2018-04-04 02:34:13 UTC
COMMIT: https://review.gluster.org/19801 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: fix txn_opinfo memory leak

For transactions where there's no volname involved (eg : gluster v
status), the originator node initiates with staging phase and what that
means in op-sm there's no unlock event triggered which resulted into a
txn_opinfo dictionary leak.

Credits : cynthia.zhou

Change-Id: I92fffbc2e8e1b010f489060f461be78aa2b86615
Fixes: bz#1550339
Signed-off-by: Atin Mukherjee <amukherj>

Comment 8 Shyamsundar 2018-06-20 18:01:20 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/