Bug 1751014

Summary: Memory leak when often run gluster volume status
Product: [Community] GlusterFS Reporter: padner <padner2002>
Component: glusterdAssignee: Sanju <srakonde>
Status: CLOSED NOTABUG QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6CC: amukherj, bugs, nchilaka, srakonde
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-16 03:31:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
first statedump
none
second statedump of glusterd process
none
gluster volume info none

Description padner 2019-09-11 03:32:27 UTC
Description of problem:
A few days after installing new three node cluster with glusterfs 4.1.8, process glusterd on the one node was killed by OOM. We restarted it and saw high memory consumption by glusterd process on other two nodes. Then we updated the version: 4.1.8 -> 4.1.9 -> 5.9 -> 6.5. But after update the situation has not changed. After comparing this setup with another, it turned out that the difference in monitoring. Monitoring run 'gluster volume status' command every minute on each node in this new cluster. After disable monitoring of first node, we saw, that glusterd RSS memory growth slowed down on other(2,3) nodes. And if stop monitoring of second node, RSS growth has stopped only on third, but on 1,2 slowed down. If disable monitoring on all nodes, RSS growth has stopped on all nodes.

Version-Release number of selected component (if applicable):
glusterfs-libs-6.5-1.el7.x86_64
glusterfs-api-6.5-1.el7.x86_64
glusterfs-6.5-1.el7.x86_64
glusterfs-fuse-6.5-1.el7.x86_64
glusterfs-cli-6.5-1.el7.x86_64
glusterfs-client-xlators-6.5-1.el7.x86_64
glusterfs-server-6.5-1.el7.x86_64


How reproducible:


Steps to Reproduce:
1. Setup 3 node cluster with 1 repcate volume
2. Run 'gluster volume status' every minute on one node
3. See how glusterd memory consumption is growing on other 2 nodes

Actual results:


Expected results:
no increase in memory consumption of glusterd process

Additional info:

Comment 1 padner 2019-09-11 03:34:07 UTC
Created attachment 1613884 [details]
first statedump

Comment 2 padner 2019-09-11 03:34:58 UTC
Created attachment 1613885 [details]
second statedump of glusterd process

Comment 3 padner 2019-09-11 03:35:22 UTC
Created attachment 1613886 [details]
gluster volume info

Comment 4 Sanju 2019-09-12 08:39:38 UTC
Hi,

I see the increase in memory of gf_common_mt_txn_opinfo_obj_t structure, which has been fixed in release-6. Can you please check whether your cluster is running at appropriate op-version? With the above-mentioned fix, the memory leak has greatly reduced.

[mgmt/glusterd.management - usage-type gf_common_mt_txn_opinfo_obj_t memusage]   |  [mgmt/glusterd.management - usage-type gf_common_mt_txn_opinfo_obj_t memusage]
  size=610288                                                                    |  size=1427664                                                                                           
  num_allocs=5449                                                                |  num_allocs=12747                                                                                       
  max_size=610400                                                                |  max_size=1427776                                                                                       
  max_num_allocs=5450                                                            |  max_num_allocs=12748                                                                                   
  total_allocs=22184                                                             |  total_allocs=51138            

Also, I tried running gluster volume status in loop for 1000 times and I don't see the leak in gf_common_mt_txn_opinfo_obj_t. Please get back with output of "gluster v get all cluster.op-version".

Thanks,
Sanju

Comment 5 padner 2019-09-16 01:30:02 UTC
Hi, Sanju!

Thanks for the answer. Yes, you're right about the op-version. After set to 60000, RSS memory growth stopped.

gluster v get all cluster.op-version
Option                                  Value                                   
------                                  -----                                   
cluster.op-version                      50400