Bug 1580352

Summary:	Glusterd memory leaking in gf_gld_mt_linebuf
Product:	[Community] GlusterFS	Reporter:	Sanju <srakonde>
Component:	glusterd	Assignee:	Sanju <srakonde>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	mainline	CC:	amukherj, bmekala, bugs, khiremat, nravinas, rhs-bugs, sankarshan, sheggodu, srakonde, storage-qa-internal, vbellur, vdas
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-5.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1575539
Clones:	1611110 (view as bug list)		Environment:
Last Closed:	2018-10-23 15:09:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1575539
Bug Blocks:	1611110

Comment 1 Worker Ant 2018-05-21 10:53:09 UTC

REVIEW: https://review.gluster.org/20046 (glusterd: memory leak in geo-rep status) posted (#1) for review on master by Sanju Rakonde

Comment 3 Kotresh HR 2018-05-21 12:24:09 UTC

Description of problem:

Four node cluster.


In all four hosts, the glusterd process has a memory leak.

Looking at the ps output, the resident set size of the process is 1.6 GB on the QA nodes
==> Here, the process is consuming 1.6 GB and is taking nearly 14% of the memory:

- sosreport glusteredc1fs2uq.owfg.com

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     20590  0.5  6.2 1681632 753284 ?      Ssl  Apr27  41:38 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

- sosreport glusterldc1fs1up.owfg.com

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     18199  1.8  7.5 1999648 1230012 ?     Ssl  May01  20:14 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/9920bccf2a4c92d44d9f991404c5765d.socket

- sosreport glusterldc1fs2up.owfg.com

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      9102  3.5 36.7 8990320 5975468 ?     Ssl  Feb14 3927:15 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

==> On this gluster node, it's taking 36% of memory and it's consuming nearly 6 GB.

Node glusteredc1fs1uq.owfg.com

glusterdump.1573.dump.1525458209

Looking at the highest memory size:

[mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=909495296 --> 909 MB leaked here
num_allocs=888179
max_size=909495296
max_num_allocs=888179
total_allocs=888179

[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=170826728  --> 170 MB leaked here
num_allocs=1607174
max_size=170839680
max_num_allocs=1607398
total_allocs=80039329

Same thing for the second iteration - the same structures keep growing: 

glusterdump.1573.dump.1525466693


[mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=919127040  --> On this second iteration we have 919 MB
num_allocs=897585
max_size=919127040
max_num_allocs=897585
total_allocs=897585

[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=172089816
num_allocs=1619086
max_size=172099544
max_num_allocs=1619240
total_allocs=80707302

Identical results for node glusteredc1fs2uq.owfg.com

glusterdump.20590.dump.1525458352


mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=476495872
num_allocs=465328
max_size=476495872  --> 476 MB leaked here:
max_num_allocs=465328
total_allocs=465328

[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=70239188  --> 70 MB here
num_allocs=627665
max_size=70284104
max_num_allocs=628212
total_allocs=86062168

glusterdump.20590.dump.1525466708


[mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=485989376
num_allocs=474599
max_size=485989376  --> On the second iteration, the memory has increased on 485 MB
max_num_allocs=474599
total_allocs=474599

[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=71332824
num_allocs=637632
max_size=71335796
max_num_allocs=637669
total_allocs=87385904

The only place where I can find such allocation is in geo-replication code:

https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-geo-rep.c

Exactly here:

glusterd_urltransform

...

  for (;;) {
                size_t len;
                line = GF_MALLOC (1024, gf_gld_mt_linebuf);
                if (!line) {
                        error = _gf_true;
                        goto out;
                }


...

I believe this is caused by geo-replication. Further assistance from engineering is required to understand the source of this memory leak.

Comment 4 Worker Ant 2018-05-28 02:43:52 UTC

COMMIT: https://review.gluster.org/20046 committed in master by "Amar Tumballi" <amarts> with a commit message- glusterd: memory leak in geo-rep status

Fixes: bz#1580352

Change-Id: I9648e73090f5a2edbac663a6fb49acdb702cdc49
Signed-off-by: Sanju Rakonde <srakonde>

Comment 5 Shyamsundar 2018-10-23 15:09:43 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/