1529451 – glusterd leaks memory when vol status is issued

Bug 1529451 - glusterd leaks memory when vol status is issued

Summary: glusterd leaks memory when vol status is issued

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Atin Mukherjee
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2017-12-28 07:28 UTC by Nag Pavan Chilakam
Modified:	2019-01-09 14:58 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.12.2-7
Doc Type:	Bug Fix
Doc Text:	Previously, executing gluster volume status command multiple times caused memory consumption of glusterd process to go up. With this fix, this problem has been addressed.
Clone Of:
Clones:	1550339 (view as bug list)
Environment:
Last Closed:	2018-09-04 06:40:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:42:11 UTC

Description Nag Pavan Chilakam 2017-12-28 07:28:01 UTC

Description of problem:
====================
glusterd seems to leak memory when vol status is issued
memory is not being released

GlusterD Observations:
Glusterd memory consumption has increased by about 200MB (ie 125.4MB before vol status -> 327.4 MB at end of vol status loop)
Also the memory is not being released
On a average the resident memory increases by about 30KB-400KB , if we issue vol status for 300volumes (one time)



Version-Release number of selected component (if applicable):
glusterfs-3.8.4-52.3.el7rhgs.x86_64

How reproducible:
============
consistent

Steps to Reproduce:
1.Created a 6 node cluster
2.Started monitoring resources
3.Created 300 volumes all 1x(4+2) ec volumes
Note: there are 3 lvs in each node, and each lv hosts about 20 bricks , but on different sub-dirs
4.Started all volumes in sequence
5.Now issued volume status in a loop of 2000 times as below
for l in {1..2000};do gluster v status;done

Comment 2 Nag Pavan Chilakam 2017-12-28 07:35:19 UTC

sosreport along with 2 statedumps(with difference of one iteration of vol status)
@http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1529451/




#############vols are now STARTED ;no brickmux; 300vols ###########
Wed Dec 27 21:06:55 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        4.2G        2.2G         15M        1.2G        3.0G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26560 root      20   0  604876 125432   4236 S  35.0  1.6  11:10.54 glusterd
############# loop 1 ###########
Wed Dec 27 21:06:55 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        4.2G        2.2G         15M        1.2G        3.0G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26560 root      20   0  604876 125432   4236 S  29.4  1.6  11:10.59 glusterd
############# loop 2 ###########
Wed Dec 27 21:21:55 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        4.3G        1.9G         15M        1.5G        2.9G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26560 root      20   0  604876 156960   4236 S  50.0  2.0  21:57.55 glusterd
############# loop 3 ###########
Wed Dec 27 21:36:56 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        4.4G        1.5G         15M        1.7G        2.8G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26560 root      20   0  670412 187476   4236 S  62.5  2.3  32:39.39 glusterd
############# loop 4 ###########
Wed Dec 27 21:51:56 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        4.5G        1.2G         15M        1.9G        2.6G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26560 root      20   0  670412 225880   4236 S  73.7  2.8  43:15.54 glusterd
############# loop 5 ###########
Wed Dec 27 22:06:56 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        4.6G        862M         15M        2.2G        2.5G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26560 root      20   0  735948 264340   4236 S  65.2  3.3  53:49.32 glusterd
############# loop 6 ###########
Wed Dec 27 22:21:56 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        4.7G        521M         15M        2.4G        2.4G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26560 root      20   0  735948 301480   4236 S  71.4  3.8  64:18.65 glusterd
############# loop 7 ###########
Wed Dec 27 22:36:57 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        4.8G        294M         15M        2.5G        2.4G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26560 root      20   0  801484 327488   4236 S   0.0  4.1  71:44.56 glusterd
############# loop 8 ###########
Wed Dec 27 22:51:57 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        4.8G        286M         15M        2.5G        2.4G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26560 root      20   0  801484 327488   4236 S   0.0  4.1  71:44.60 glusterd
############# loop 59 ###########
Thu Dec 28 11:37:11 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        5.2G        227M         13M        2.2G        1.9G
Swap:          7.9G        2.1M        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26560 root      20   0  801484 327488   4236 S   0.0  4.1  71:47.20 glusterd

Comment 3 Atin Mukherjee 2017-12-28 10:29:13 UTC

Statedump comparison in RHGS-3.4.0 bits:

Statedump data set 2 vs set 1 where set 2 was collected a minute later than set 1.

[mgmt/glusterd.management - usage-type gf_common_mt_gf_timer_t memusage]
size=192
num_allocs=3
max_size=384
max_num_allocs=6
total_allocs=167077

vs

[mgmt/glusterd.management - usage-type gf_common_mt_gf_timer_t memusage]
size=256
num_allocs=4
max_size=384
max_num_allocs=6
total_allocs=431416


[mgmt/glusterd.management - usage-type gf_common_mt_asprintf memusage]
size=2000
num_allocs=106
max_size=4614
max_num_allocs=231
total_allocs=5059477

vs

[mgmt/glusterd.management - usage-type gf_common_mt_asprintf memusage]
size=2003
num_allocs=107
max_size=4614
max_num_allocs=231
total_allocs=12971322


[mgmt/glusterd.management - usage-type gf_common_mt_strdup memusage]
size=2468525
num_allocs=167048
max_size=2469156
max_num_allocs=167057
total_allocs=2616401

vs

[mgmt/glusterd.management - usage-type gf_common_mt_strdup memusage]
size=6333253
num_allocs=429924
max_size=6333379
max_num_allocs=429929
total_allocs=6686295


[mgmt/glusterd.management - usage-type gf_common_mt_socket_private_t memusage]
size=70040
num_allocs=103
max_size=80240
max_num_allocs=118
total_allocs=7080


vs

[mgmt/glusterd.management - usage-type gf_common_mt_socket_private_t memusage]
size=70040
num_allocs=103
max_size=80240
max_num_allocs=118
total_allocs=15842


[mgmt/glusterd.management - usage-type gf_common_mt_char memusage]
size=12570
num_allocs=543
max_size=40750
max_num_allocs=1807
total_allocs=24288723


vs

[mgmt/glusterd.management - usage-type gf_common_mt_char memusage]
size=12830
num_allocs=561
max_size=40750
max_num_allocs=1807
total_allocs=62658480


[mgmt/glusterd.management - usage-type gf_common_mt_rpcsvc_wrapper_t memusage]
size=64
num_allocs=2
max_size=96
max_num_allocs=3
total_allocs=6979

vs

[mgmt/glusterd.management - usage-type gf_common_mt_rpcsvc_wrapper_t memusage]
size=64
num_allocs=2
max_size=96
max_num_allocs=3
total_allocs=15741


[mgmt/glusterd.management - usage-type gf_common_mt_rpc_trans_t memusage]
size=131016
num_allocs=103
max_size=150096
max_num_allocs=118
total_allocs=7081


vs

[mgmt/glusterd.management - usage-type gf_common_mt_rpc_trans_t memusage]
size=131016
num_allocs=103
max_size=150096
max_num_allocs=118
total_allocs=15843





The below data structures are not there in 1st set of statedump report which makes me to believe that there are certain different code path which is hit post the 1st statedump is collected.

[mgmt/glusterd.management - usage-type gf_common_mt_iobref memusage]
size=64
num_allocs=1
max_size=192
max_num_allocs=3
total_allocs=7872201

[mgmt/glusterd.management - usage-type gf_common_mt_memdup memusage]
size=24
num_allocs=3
max_size=700
max_num_allocs=71
total_allocs=45676311


[mgmt/glusterd.management - usage-type gf_common_mt_iobrefs memusage]
size=128
num_allocs=1
max_size=384
max_num_allocs=3
total_allocs=7872201

[mgmt/glusterd.management - usage-type gf_common_mt_uuid_t memusage]
size=64
num_allocs=4
max_size=64
max_num_allocs=4
total_allocs=4376562

[mgmt/glusterd.management - usage-type gf_common_mt_mgmt_v3_lock_obj_t memusage]
size=16
num_allocs=1
max_size=16
max_num_allocs=1
total_allocs=429073

[mgmt/glusterd.management - usage-type gf_common_mt_txn_opinfo_obj_t memusage]
size=112
num_allocs=1
max_size=112
max_num_allocs=1
total_allocs=443344

[mgmt/glusterd.management - usage-type gf_common_mt_synctask memusage]
size=5264
num_allocs=1
max_size=10528
max_num_allocs=2
total_allocs=443379

[mgmt/glusterd.management - usage-type gf_common_mt_syncstack memusage]
size=2097152
num_allocs=1
max_size=4194304
max_num_allocs=2
total_allocs=443379

[mgmt/glusterd.management - usage-type gf_common_mt_mgmt_v3_lock_timer_t memusage]
size=16
num_allocs=1
max_size=16
max_num_allocs=1
total_allocs=429073

Comment 4 Bala Konda Reddy M 2018-01-03 10:31:44 UTC

Glusterd memory is increasing, when issued vol status concurrently on all the nodes in the trusted storage pool of 3 nodes. 


1. Created 30 1*3 replica volumes on a 3 node cluster
2. Performed vol status on all three nodes in a loop for 1 hour
3. glusterd memory shooted upto 500 MB in 1 hour

Before starting vol status in loop
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
20499 root      20   0  672228  27032   4392 S   0.0  0.3   0:32.34 glusterd

After 1 hour

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
20499 root      20   0 1139176 528012   4392 S  64.7  6.6  63:21.12 glusterd

Statedumps before and after the vol status
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/bmekala/bug.1529451/

Comment 5 Gaurav Yadav 2018-01-11 18:37:09 UTC

As mer my debugging I found that, gf_common_mt_strdup structure is consuming more moemory. Num_alloc are keep on growing for every status call for a volume.

In simple mathematical terms if we have 10 volumes and volume status is being executed 10 times, then num_alloc for gf_common_mt_strdup is gettinng increased by 10 * 10 = 100.

In locking phase a timer is being introduced, to take care of stale lock issue, which has caused this leak.

Debugging for other structure is under progress.

Comment 6 Gaurav Yadav 2018-01-11 19:11:44 UTC

(In reply to Gaurav Yadav from comment #5)
> As mer my debugging I found that, gf_common_mt_strdup structure is consuming
> more moemory. Num_alloc are keep on growing for every status call for a
> volume.
> 

> In simple mathematical terms if we have 10 volumes and volume status is
> being executed 10 times, then num_alloc for gf_common_mt_strdup is gettinng
> increased by 10 * 10 = 100.
> 
> In locking phase a timer is being introduced, to take care of stale lock
> issue, which has caused this leak.
As this leak is in locking phase hence it will be there for other commands as well like create vol, volume set etc.

> Debugging for other structure is under progress.

Comment 14 Bala Konda Reddy M 2018-05-09 04:28:37 UTC

Build: 3.12.2-8

On a three node setup with brick-mux enabled, with 30 volumes 1 X 3. Executed volume status more than 30,000 iterations the glusterd memory increased from 20MB to 140 MB approx.

###############0 iteration  #############
              total        used        free      shared  buff/cache   available
Mem:           7.6G        495M        6.9G        9.0M        227M        6.9G
Swap:          2.0G          0B        2.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1634 root      20   0  670584  20232   4336 S   0.0  0.3   0:03.03 glusterd
 3352 root      20   0 5430220 194448   5256 S   0.0  2.4   0:06.22 glusterfsd

###############32008 iteration  #############
              total        used        free      shared  buff/cache   available
Mem:           7.6G        798M        4.8G        9.0M        2.1G        6.5G
Swap:          2.0G          0B        2.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1634 root      20   0  736120 142612   4388 S   0.0  1.8  69:54.29 glusterd
 3352 root      20   0 6217036 365096   5256 S   0.0  4.6   2:36.96 glusterfsd

As there isn't much increase of glusterd memory and this is acceptable.

Hence marking it as verified.

Comment 17 errata-xmlrpc 2018-09-04 06:40:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.