Bug 1751014 - Memory leak when often run gluster volume status
Summary: Memory leak when often run gluster volume status
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 6
Hardware: x86_64
OS: Linux
Target Milestone: ---
Assignee: Sanju
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2019-09-11 03:32 UTC by padner
Modified: 2019-09-16 03:31 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-09-16 03:31:21 UTC
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:

Attachments (Terms of Use)
first statedump (14.80 KB, text/plain)
2019-09-11 03:34 UTC, padner
no flags Details
second statedump of glusterd process (14.81 KB, text/plain)
2019-09-11 03:34 UTC, padner
no flags Details
gluster volume info (579 bytes, text/plain)
2019-09-11 03:35 UTC, padner
no flags Details

Description padner 2019-09-11 03:32:27 UTC
Description of problem:
A few days after installing new three node cluster with glusterfs 4.1.8, process glusterd on the one node was killed by OOM. We restarted it and saw high memory consumption by glusterd process on other two nodes. Then we updated the version: 4.1.8 -> 4.1.9 -> 5.9 -> 6.5. But after update the situation has not changed. After comparing this setup with another, it turned out that the difference in monitoring. Monitoring run 'gluster volume status' command every minute on each node in this new cluster. After disable monitoring of first node, we saw, that glusterd RSS memory growth slowed down on other(2,3) nodes. And if stop monitoring of second node, RSS growth has stopped only on third, but on 1,2 slowed down. If disable monitoring on all nodes, RSS growth has stopped on all nodes.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Setup 3 node cluster with 1 repcate volume
2. Run 'gluster volume status' every minute on one node
3. See how glusterd memory consumption is growing on other 2 nodes

Actual results:

Expected results:
no increase in memory consumption of glusterd process

Additional info:

Comment 1 padner 2019-09-11 03:34:07 UTC
Created attachment 1613884 [details]
first statedump

Comment 2 padner 2019-09-11 03:34:58 UTC
Created attachment 1613885 [details]
second statedump of glusterd process

Comment 3 padner 2019-09-11 03:35:22 UTC
Created attachment 1613886 [details]
gluster volume info

Comment 4 Sanju 2019-09-12 08:39:38 UTC

I see the increase in memory of gf_common_mt_txn_opinfo_obj_t structure, which has been fixed in release-6. Can you please check whether your cluster is running at appropriate op-version? With the above-mentioned fix, the memory leak has greatly reduced.

[mgmt/glusterd.management - usage-type gf_common_mt_txn_opinfo_obj_t memusage]   |  [mgmt/glusterd.management - usage-type gf_common_mt_txn_opinfo_obj_t memusage]
  size=610288                                                                    |  size=1427664                                                                                           
  num_allocs=5449                                                                |  num_allocs=12747                                                                                       
  max_size=610400                                                                |  max_size=1427776                                                                                       
  max_num_allocs=5450                                                            |  max_num_allocs=12748                                                                                   
  total_allocs=22184                                                             |  total_allocs=51138            

Also, I tried running gluster volume status in loop for 1000 times and I don't see the leak in gf_common_mt_txn_opinfo_obj_t. Please get back with output of "gluster v get all cluster.op-version".


Comment 5 padner 2019-09-16 01:30:02 UTC
Hi, Sanju!

Thanks for the answer. Yes, you're right about the op-version. After set to 60000, RSS memory growth stopped.

gluster v get all cluster.op-version
Option                                  Value                                   
------                                  -----                                   
cluster.op-version                      50400

Note You need to log in before you can comment on or make changes to this bug.