Bug 1529249

Summary: Brick Multiplexing: glusterfsd (brick) process leaks memory on an idle brick multiplex setup
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: coreAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED ERRATA QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.3CC: moagrawa, nchilaka, pasik, pprakash, puebele, rhs-bugs, rkothiya, sheggodu, storage-qa-internal, ubansal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.5.z Batch Update 3Flags: moagrawa: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: brick-multiplexing
Fixed In Version: glusterfs-6.0-38 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-17 04:50:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
331-ga mem difference in 1hr for glusterfsd
none
331-async mem difference in 1hr for glusterfsd none

Description Nag Pavan Chilakam 2017-12-27 10:07:27 UTC
Description of problem:
=====================
on an idle brick mux setup glusterfsd process is leaking memory  at about 6MB per hour

This leak exists even on 3.3.1-GA build.

in about 16 hrs below were the findings:

3.3.1-GA leaked about 170MB, while 3.3.1-ASync build leaked about 140MB
(note that there is a delta to this across 6nodes)




Version-Release number of selected component (if applicable):
==================
glusterfs-client-xlators-3.8.4-52.3.el7rhgs.x86_64
glusterfs-rdma-3.8.4-52.3.el7rhgs.x86_64
python-gluster-3.8.4-52.3.el7rhgs.noarch
glusterfs-libs-3.8.4-52.3.el7rhgs.x86_64
glusterfs-fuse-3.8.4-52.3.el7rhgs.x86_64
glusterfs-events-3.8.4-52.3.el7rhgs.x86_64
glusterfs-3.8.4-52.3.el7rhgs.x86_64
glusterfs-cli-3.8.4-52.3.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-52.3.el7rhgs.x86_64
glusterfs-server-3.8.4-52.3.el7rhgs.x86_64
glusterfs-api-3.8.4-52.3.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-52.3.el7rhgs.x86_64


How reproducible:
==================
observed for close to a day, and memory foot print is only increasing



Steps to Reproduce:
=====================
1.Created a 6 node cluster
2.Enabled brick mux
3.Started monitoring resources
4.Created 60 volumes all 1x(4+2) ec volumes
Note: there are 3 lvs in each node, and each lv hosts about 20 bricks , but on different sub-dirs
5.Started all volumes in sequence
6. kept monitoring resources

Comment 2 Nag Pavan Chilakam 2017-12-27 10:09:06 UTC
Over a period of 1hr:




3.3.1-async:6mb
[root@dhcp35-205 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:00:26 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.3G        5.9G         11M        401M        6.0G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
24834 root      20   0 9640864 872484   6080 S   0.0 10.9  34:33.02 glusterfsd

[root@dhcp35-205 ~]# sleep 10;date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:11 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.3G        5.9G         11M        401M        6.0G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
24834 root      20   0 9714444 878944   6080 S   0.0 11.0  36:11.54 glusterfsd
[root@dhcp35-205 ~]# 

n2:
[root@dhcp35-57 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:00:26 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.1G        6.1G         11M        393M        6.2G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
19719 root      20   0 8155744 730816   6092 S   0.0  9.1  35:17.07 glusterfsd

[root@dhcp35-57 ~]# sleep 10;date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:11 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.1G        6.1G         11M        393M        6.2G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
19719 root      20   0 8168224 723392   6092 S   0.0  9.0  36:55.94 glusterfsd
[root@dhcp35-57 ~]# ^C

n3:
[root@dhcp35-169 ~]# sleep 1800;date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:40:55 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.1G        6.1G         11M        395M        6.2G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21637 root      20   0 8030728 722664   6092 S   0.0  9.0  28:57.93 glusterfsd
[root@dhcp35-169 ~]# 
[root@dhcp35-169 ~]# 
[root@dhcp35-169 ~]# 
[root@dhcp35-169 ~]# sleep 1800;date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
^C
[root@dhcp35-169 ~]# ^C
[root@dhcp35-169 ~]# date
Wed Dec 27 13:59:43 IST 2017
[root@dhcp35-169 ~]# sleep 10;date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:11 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.1G        6.1G         11M        395M        6.2G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21637 root      20   0 8032288 728540   6092 S   0.0  9.1  29:22.78 glusterfsd

n4:2mb
[root@dhcp35-145 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:00:26 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.3G        5.8G         11M        540M        5.9G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3659 root      20   0 8207900 738180   6092 S   0.0  9.2  33:21.81 glusterfsd

[root@dhcp35-145 ~]# sleep 10;date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:11 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.3G        5.8G         11M        540M        5.9G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3659 root      20   0 8221420 739768   6092 S   0.0  9.2  34:56.23 glusterfsd

n5:1.5mb
[root@dhcp35-177 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:00:26 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.1G        6.1G         11M        396M        6.2G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21642 root      20   0 8137948 730032   6068 S   0.0  9.1  27:49.56 glusterfsd

[root@dhcp35-177 ~]# sleep 10;date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:11 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.1G        6.1G         11M        396M        6.2G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21642 root      20   0 8145488 731984   6068 S   6.2  9.1  29:04.44 glusterfsd



n6:6mb

[root@dhcp35-237 ~]# 
[root@dhcp35-237 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:00:26 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.2G        6.1G         11M        394M        6.2G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
19603 root      20   0 8385376 757876   6092 S   0.0  9.5  36:42.35 glusterfsd
[root@dhcp35-237 ~]# sleep 10;date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:11 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.2G        6.1G         11M        394M        6.2G
Swap:          7.9G          0B        7.9G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
19603 root      20   0 8409296 763800   6092 S   0.0  9.5  38:24.84 glusterfsd


##############################################################################
3.3.1
=======
n1 ~12mb

Wed Dec 27 13:00:20 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.4G        5.4G         48M        891M        5.9G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16614 root      20   0 9023600 816204   6024 S   0.0 10.2  39:51.15 glusterfsd
[root@dhcp42-44 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:52 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.4G        5.4G         48M        892M        5.9G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16614 root      20   0 9159320 828104   6024 S   0.0 10.3  41:56.49 glusterfsd
[root@dhcp42-44 ~]# 


n2:~20mb

[root@dhcp41-169 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:00:20 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.3G        5.5G         48M        872M        6.0G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15283 root      20   0 9577948 844664   6032 S   0.0 10.5  38:13.33 glusterfsd
[root@dhcp41-169 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:52 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.3G        5.5G         48M        873M        6.0G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15283 root      20   0 9747988 864216   6032 S   0.0 10.8  40:15.04 glusterfsd
[root@dhcp41-169 ~]# 

n3:~20mb

[root@dhcp42-174 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:00:20 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.3G        5.5G         48M        910M        6.0G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5832 root      20   0 8948936 805980   6032 S   0.0 10.1  38:58.78 glusterfsd
[root@dhcp42-174 ~]# 

[root@dhcp42-174 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:52 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.3G        5.5G         48M        911M        6.0G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5832 root      20   0 9101036 826264   6032 S   0.0 10.3  41:02.06 glusterfsd
[root@dhcp42-174 ~]# 


n4: ~16mb
[root@dhcp41-216 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:00:20 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.4G        5.4G         48M        874M        5.9G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16063 root      20   0 9982356 892360   6028 S   0.0 11.1  36:47.00 glusterfsd

[root@dhcp41-216 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:52 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.4G        5.4G         48M        875M        5.9G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16063 root      20   0 9924.9m 908072   6028 S   0.0 11.3  38:44.79 glusterfsd
[root@dhcp41-216 ~]# 



n5:7mb


[root@dhcp41-230 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:00:20 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.2G        5.6G         46M        868M        6.1G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15024 root      20   0 8344840 760992   6024 S   0.0  9.5  39:03.90 glusterfsd

[root@dhcp41-230 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:52 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.2G        5.6G         46M        868M        6.1G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15024 root      20   0 8426740 767732   6024 S   5.9  9.6  41:07.52 glusterfsd


n6:4mb	

[root@dhcp42-150 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 13:00:20 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.2G        5.6G         46M        867M        6.1G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15091 root      20   0 8180088 741924   6024 S   0.0  9.3  37:49.72 glusterfsd
[root@dhcp42-150 ~]# date;free -h;top -n 1 -b|egrep "RES|glusterfsd"
Wed Dec 27 14:00:52 IST 2017
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.2G        5.6G         46M        868M        6.1G
Swap:          4.0G          0B        4.0G
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15091 root      20   0 8261724 744476   6024 S   0.0  9.3  39:50.88 glusterfsd

Comment 3 Atin Mukherjee 2017-12-27 10:24:14 UTC
From the periodic statedumps of the brick process:

[global.glusterfs - usage-type gf_common_mt_gf_timer_t memusage]
size=128
num_allocs=2
max_size=192
max_num_allocs=3
total_allocs=222


Versus

[global.glusterfs - usage-type gf_common_mt_gf_timer_t memusage]
size=128
num_allocs=2
max_size=192
max_num_allocs=3
total_allocs=346


There's a suspected leak here.

Comment 4 Nag Pavan Chilakam 2017-12-27 12:43:44 UTC
Created attachment 1372786 [details]
331-ga mem difference in 1hr for glusterfsd

Comment 5 Nag Pavan Chilakam 2017-12-27 12:44:10 UTC
Created attachment 1372787 [details]
331-async mem difference in 1hr for glusterfsd

Comment 6 Nag Pavan Chilakam 2017-12-27 13:07:47 UTC
sosreports,2 statedumps with 1 hr interval are available @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1529249/

Comment 7 Atin Mukherjee 2017-12-29 05:36:54 UTC
*** Bug 1529563 has been marked as a duplicate of this bug. ***

Comment 12 Nag Pavan Chilakam 2018-11-14 12:56:28 UTC
So here is data on 3.12.2-24

[root@dhcp35-140 ~]# date; top -n 1 -b|grep gluster
Wed Nov 14 14:57:51 IST 2018
12460 root      20   0   13.7g 403728   3688 S   5.0  5.0   1:03.91 glusterfsd
10673 root      20   0  607108  37372   4368 S   0.0  0.5   0:16.74 glusterd
18984 root      20   0   11.1g 269664   7472 S   0.0  3.4   0:12.48 glusterfs
[root@dhcp35-140 ~]# sleep 300;date; top -n 1 -b|grep gluster
^C
[root@dhcp35-140 ~]# sleep 300;date; top -n 1 -b|grep gluster; echo "##################";sleep 3600;date; top -n 1 -b|grep gluster; echo "##################"
Wed Nov 14 15:03:22 IST 2018
12460 root      20   0   13.8g 413728   3688 S   5.0  5.2   1:22.38 glusterfsd
10673 root      20   0  607108  37372   4368 S   0.0  0.5   0:17.12 glusterd
18984 root      20   0   11.1g 268428   7472 S   0.0  3.4   0:17.71 glusterfs
##################
Wed Nov 14 16:03:22 IST 2018
10673 root      20   0  607108  37372   4368 S   4.8  0.5   0:21.99 glusterd
12460 root      20   0   13.8g 419908   3688 S   0.0  5.2   4:27.84 glusterfsd
18984 root      20   0   11.1g 268792   7472 S   0.0  3.4   1:10.48 glusterfs
##################
[root@dhcp35-140 ~]# date; top -n 1 -b|grep gluster; echo "##################";sleep 3600;date; top -n 1 -b|grep gluster; echo "##################"
Wed Nov 14 18:23:03 IST 2018
10673 root      20   0  607108  37372   4368 S   0.0  0.5   0:33.89 glusterd
12460 root      20   0   13.8g 421536   3688 S   0.0  5.3  11:42.12 glusterfsd
18984 root      20   0   11.1g 268792   7472 S   0.0  3.4   3:13.39 glusterfs
##################


so there is still a leak of anything b/w 1-5 mb/hr

Comment 13 Nag Pavan Chilakam 2018-11-14 16:25:58 UTC
updating resoruce details after 3.5 hrs, another 4 mb leaked

[root@dhcp35-140 ~]#  top -n 1 -b|grep gluster
10673 root      20   0  607108  37372   4368 S   0.0  0.5   0:51.77 glusterd
12460 root      20   0   13.8g 425616   3688 S   0.0  5.3  22:44.26 glusterfsd
18984 root      20   0   11.1g 268792   7472 S   0.0  3.4   6:21.55 glusterfs
[root@dhcp35-140 ~]# date
Wed Nov 14 21:53:24 IST 2018

Comment 14 Nag Pavan Chilakam 2018-11-15 05:42:12 UTC
seems like the leak subsides over time 
 12 hrs difference
(note: this is on 3.12.2-27, but the above observations on 3.12.2-24 was similar even for 3.12.2-27)

checked over 12 hrs difference below is the o/p
at 11:08 PM nov14
21511 root      20   0   13.9g 343336   3812 S   4.8  4.3   7:39.76 glusterfsd
21311 root      20   0  838484  52120   4400 S   0.0  0.7   0:26.25 glusterd
21501 root      20   0   20.7g 288988   7636 S   0.0  3.6   5:49.55 glusterfs

at 11:08AM nov15
[root@dhcp35-140 ~]#  top -n 1 -b|grep gluster
21311 root      20   0  838484  52120   4400 S   0.0  0.7   1:36.10 glusterd
21501 root      20   0   20.7g 288992   7636 S   0.0  3.6  55:21.63 glusterfs
21511 root      20   0   13.9g 347336   3812 S   0.0  4.3 113:30.57 glusterfsd

an increase of 4mb

Comment 22 Mohit Agrawal 2019-11-19 03:47:06 UTC
In 3.5.0 we fixed io-thread detach leak issue. 
Can we try to reproduce on the latest 3.5.0 release?

Thanks,
Mohit Agrawal

Comment 24 Nag Pavan Chilakam 2019-12-02 06:41:54 UTC
(In reply to Mohit Agrawal from comment #22)
> In 3.5.0 we fixed io-thread detach leak issue. 
> Can we try to reproduce on the latest 3.5.0 release?
> 
> Thanks,
> Mohit Agrawal

Removing the needinfo as QE had requested dev to move it to onqa in case it is fixed, as we would need to test it formally

Comment 25 Nag Pavan Chilakam 2019-12-02 06:42:30 UTC
refer above comments

Comment 42 errata-xmlrpc 2020-12-17 04:50:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603