+++ This bug was initially created as a clone of Bug #1395687 +++ +++ This bug was initially created as a clone of Bug #1393709 +++ Description of problem: ======================= On my systemic setup seeing lot of iobuff leaks in statedump even without much load on the client I have setup a systemic testbed, where I have a 4x2 volume spanning 4 nodes. I hav enabled below features, look at vol info: Volume Name: drvol Type: Distributed-Replicate Volume ID: 2f0e5510-fe47-4ce8-906e-6ddc7f9334ca Status: Started Snapshot Count: 0 Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: 10.70.35.191:/rhs/brick1/drvol Brick2: 10.70.37.108:/rhs/brick1/drvol Brick3: 10.70.35.3:/rhs/brick1/drvol Brick4: 10.70.37.66:/rhs/brick1/drvol Brick5: 10.70.35.191:/rhs/brick2/drvol Brick6: 10.70.37.108:/rhs/brick2/drvol Brick7: 10.70.35.3:/rhs/brick2/drvol Brick8: 10.70.37.66:/rhs/brick2/drvol Options Reconfigured: cluster.use-compound-fops: on diagnostics.count-fop-hits: on diagnostics.latency-measurement: on features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on features.barrier: disable cluster.shd-max-threads: 16 performance.md-cache-timeout: 600 performance.cache-invalidation: true features.cache-invalidation-timeout: 300 features.cache-invalidation: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on I then mounted the volume on 10 different clients and did following IOs: From all clients:===> started taking statedump of the fuse mount process every 5 minutes and moving them to a dedicated directory for each host on the mount point(so into gluster vol) From all clients:====>collecting top and cpu usage every 2 mins and appending the contents into a file for each host on the mount point(so into gluster vol) I see that even 16 GB clients have consumed almost complete memory byjust doing the above two actions in just 1.5 days Version-Release number of selected component (if applicable): [root@rhs-client23 gluster]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 (Maipo) [root@rhs-client23 gluster]# rpm -qa|grep gluster glusterfs-libs-3.8.4-3.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-3.el7rhgs.x86_64 glusterfs-api-3.8.4-3.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-3.el7rhgs.x86_64 glusterfs-3.8.4-3.el7rhgs.x86_64 glusterfs-fuse-3.8.4-3.el7rhgs.x86_64 glusterfs-cli-3.8.4-3.el7rhgs.x86_64 [root@rhs-client23 gluster]# Statedumps attached --- Additional comment from nchilaka on 2016-11-10 05:22:17 EST --- From further tests, it seems with all likeliness that it is due to appends I just appended a file in the mount with TOP and FREE -h command output every 2 minutes. With every loop the mem consumption seemed to increase by 5MB I have taken statedumps in below intervals [root@rhs-client45 gluster]# date;free -h;date ;kill -USR1 16078;date Thu Nov 10 15:39:39 IST 2016 total used free shared buff/cache available Mem: 15G 504M 11G 24M 3.2G 14G Swap: 7.9G 51M 7.8G Thu Nov 10 15:39:39 IST 2016 Thu Nov 10 15:39:39 IST 2016 [root@rhs-client45 gluster]# ls glusterdump.16078.dump.1478772579 [root@rhs-client45 gluster]# date;free -h;date ;kill -USR1 16078;date Thu Nov 10 15:41:05 IST 2016 total used free shared buff/cache available Mem: 15G 509M 11G 24M 3.2G 14G Swap: 7.9G 51M 7.8G Thu Nov 10 15:41:05 IST 2016 Thu Nov 10 15:41:05 IST 2016 [root@rhs-client45 gluster]# ll total 596 -rw-------. 1 root root 301703 Nov 10 15:39 glusterdump.16078.dump.1478772579 -rw-------. 1 root root 305566 Nov 10 15:41 glusterdump.16078.dump.1478772665 [root@rhs-client45 gluster]# ll total 596 -rw-------. 1 root root 301703 Nov 10 15:39 glusterdump.16078.dump.1478772579 -rw-------. 1 root root 305566 Nov 10 15:41 glusterdump.16078.dump.1478772665 [root@rhs-client45 gluster]# [root@rhs-client45 gluster]# ll total 596 -rw-------. 1 root root 301703 Nov 10 15:39 glusterdump.16078.dump.1478772579 -rw-------. 1 root root 305566 Nov 10 15:41 glusterdump.16078.dump.1478772665 [root@rhs-client45 gluster]# date;free -h;date ;kill -USR1 16078;date Thu Nov 10 15:44:20 IST 2016 total used free shared buff/cache available Mem: 15G 518M 11G 24M 3.2G 14G Swap: 7.9G 51M 7.8G Thu Nov 10 15:44:20 IST 2016 Thu Nov 10 15:44:20 IST 2016 [root@rhs-client45 gluster]# Attached are new statedumps --- Additional comment from nchilaka on 2016-11-10 05:23 EST --- --- Additional comment from nchilaka on 2016-11-10 05:33:56 EST --- Just a Note: Compound FOPs is enabled :) --- Additional comment from Krutika Dhananjay on 2016-11-11 05:14:31 EST --- So there is indeed iobuf leak as per the statedump attached. Just to confirm that it is compound fops indeed that is contributing to the leak (and not any of the several other options enabled on the volume), I did a dd on a plain replicate volume without compound-fops, captured statedump before and after the dd - there was no increase in active iobufs. Once I enabled compound fops and did a dd again, the number of active iobufs had increased from 1 to 6 and after another dd from 6 to 13. Also checked afr changes for compound fops - I didn't find any mem leaks there. So the only possibility is leaks at the protocol/client layer which I am investigating through code reading at the moment. --- Additional comment from Krutika Dhananjay on 2016-11-16 07:54:40 EST --- Found the leaks. Nice catch, Nag! :) --- Additional comment from Worker Ant on 2016-11-16 07:59:51 EST --- REVIEW: http://review.gluster.org/15860 (protocol/client: Fix iobref and iobuf leaks in COMPOUND fop) posted (#1) for review on master by Krutika Dhananjay (kdhananj)
REVIEW: http://review.gluster.org/15861 (protocol/client: Fix iobref and iobuf leaks in COMPOUND fop) posted (#1) for review on release-3.9 by Krutika Dhananjay (kdhananj)
COMMIT: http://review.gluster.org/15861 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) ------ commit 65f785bb9b484e42eac39af0e468a0a9f46c8e99 Author: Krutika Dhananjay <kdhananj> Date: Wed Nov 16 18:26:52 2016 +0530 protocol/client: Fix iobref and iobuf leaks in COMPOUND fop Backport of: http://review.gluster.org/15860 Change-Id: I5873bd0ae9a2df91876b1c9c9d8afdec9c5151f9 BUG: 1395694 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/15861 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report. glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html [2] https://www.gluster.org/pipermail/gluster-users/