Description of problem: A few days after installing new three node cluster with glusterfs 4.1.8, process glusterd on the one node was killed by OOM. We restarted it and saw high memory consumption by glusterd process on other two nodes. Then we updated the version: 4.1.8 -> 4.1.9 -> 5.9 -> 6.5. But after update the situation has not changed. After comparing this setup with another, it turned out that the difference in monitoring. Monitoring run 'gluster volume status' command every minute on each node in this new cluster. After disable monitoring of first node, we saw, that glusterd RSS memory growth slowed down on other(2,3) nodes. And if stop monitoring of second node, RSS growth has stopped only on third, but on 1,2 slowed down. If disable monitoring on all nodes, RSS growth has stopped on all nodes. Version-Release number of selected component (if applicable): glusterfs-libs-6.5-1.el7.x86_64 glusterfs-api-6.5-1.el7.x86_64 glusterfs-6.5-1.el7.x86_64 glusterfs-fuse-6.5-1.el7.x86_64 glusterfs-cli-6.5-1.el7.x86_64 glusterfs-client-xlators-6.5-1.el7.x86_64 glusterfs-server-6.5-1.el7.x86_64 How reproducible: Steps to Reproduce: 1. Setup 3 node cluster with 1 repcate volume 2. Run 'gluster volume status' every minute on one node 3. See how glusterd memory consumption is growing on other 2 nodes Actual results: Expected results: no increase in memory consumption of glusterd process Additional info:
Created attachment 1613884 [details] first statedump
Created attachment 1613885 [details] second statedump of glusterd process
Created attachment 1613886 [details] gluster volume info
Hi, I see the increase in memory of gf_common_mt_txn_opinfo_obj_t structure, which has been fixed in release-6. Can you please check whether your cluster is running at appropriate op-version? With the above-mentioned fix, the memory leak has greatly reduced. [mgmt/glusterd.management - usage-type gf_common_mt_txn_opinfo_obj_t memusage] | [mgmt/glusterd.management - usage-type gf_common_mt_txn_opinfo_obj_t memusage] size=610288 | size=1427664 num_allocs=5449 | num_allocs=12747 max_size=610400 | max_size=1427776 max_num_allocs=5450 | max_num_allocs=12748 total_allocs=22184 | total_allocs=51138 Also, I tried running gluster volume status in loop for 1000 times and I don't see the leak in gf_common_mt_txn_opinfo_obj_t. Please get back with output of "gluster v get all cluster.op-version". Thanks, Sanju
Hi, Sanju! Thanks for the answer. Yes, you're right about the op-version. After set to 60000, RSS memory growth stopped. gluster v get all cluster.op-version Option Value ------ ----- cluster.op-version 50400