Bug 1393709 - [Compound FOPs] Client side IObuff leaks at a high pace consumes complete client memory and hence making gluster volume inaccessible
Summary: [Compound FOPs] Client side IObuff leaks at a high pace consumes complete cli...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: RHGS 3.2.0
Assignee: Krutika Dhananjay
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard: compund-fop
Depends On:
Blocks: 1351528 1395687 1395694
TreeView+ depends on / blocked
 
Reported: 2016-11-10 08:19 UTC by Nag Pavan Chilakam
Modified: 2017-03-23 06:18 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.4-6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1395687 (view as bug list)
Environment:
Last Closed: 2017-03-23 06:18:07 UTC
Embargoed:


Attachments (Terms of Use)
statedumps (3.56 MB, application/x-gzip)
2016-11-10 08:22 UTC, Nag Pavan Chilakam
no flags Details
new statedumps with append only (77.49 KB, application/x-gzip)
2016-11-10 10:23 UTC, Nag Pavan Chilakam
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Nag Pavan Chilakam 2016-11-10 08:19:56 UTC
Description of problem:
=======================
On my systemic setup seeing lot of iobuff leaks in statedump even without much load on the client

I have setup a systemic testbed, where I have a 4x2 volume spanning 4 nodes.
I hav enabled below features, look at vol info:
Volume Name: drvol
Type: Distributed-Replicate
Volume ID: 2f0e5510-fe47-4ce8-906e-6ddc7f9334ca
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 10.70.35.191:/rhs/brick1/drvol
Brick2: 10.70.37.108:/rhs/brick1/drvol
Brick3: 10.70.35.3:/rhs/brick1/drvol
Brick4: 10.70.37.66:/rhs/brick1/drvol
Brick5: 10.70.35.191:/rhs/brick2/drvol
Brick6: 10.70.37.108:/rhs/brick2/drvol
Brick7: 10.70.35.3:/rhs/brick2/drvol
Brick8: 10.70.37.66:/rhs/brick2/drvol
Options Reconfigured:
cluster.use-compound-fops: on
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
features.barrier: disable
cluster.shd-max-threads: 16
performance.md-cache-timeout: 600
performance.cache-invalidation: true
features.cache-invalidation-timeout: 300
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on


I then mounted the volume on 10 different clients and did following IOs:
From all clients:===> started taking statedump of the fuse mount process every 5 minutes and moving them to a dedicated directory for each host on the mount point(so into gluster vol)
From all clients:====>collecting top and cpu usage every 2 mins and appending the contents into a file for each host on the mount point(so into gluster vol)

I see that even 16 GB clients have consumed almost complete memory
byjust doing the above two actions in just 1.5 days

Version-Release number of selected component (if applicable):
[root@rhs-client23 gluster]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@rhs-client23 gluster]# rpm -qa|grep gluster
glusterfs-libs-3.8.4-3.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-3.el7rhgs.x86_64
glusterfs-api-3.8.4-3.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-3.el7rhgs.x86_64
glusterfs-3.8.4-3.el7rhgs.x86_64
glusterfs-fuse-3.8.4-3.el7rhgs.x86_64
glusterfs-cli-3.8.4-3.el7rhgs.x86_64
[root@rhs-client23 gluster]# 


Statedumps attached

Comment 2 Nag Pavan Chilakam 2016-11-10 08:22:50 UTC
Created attachment 1219242 [details]
statedumps

Comment 3 Nag Pavan Chilakam 2016-11-10 10:22:17 UTC
From further tests, it seems with all likeliness that it is due to appends

I just appended a file in the mount with TOP and FREE -h command output every 2 minutes.
With every loop the mem consumption seemed to increase by 5MB

I have taken statedumps in below intervals
[root@rhs-client45 gluster]# date;free -h;date ;kill -USR1 16078;date
Thu Nov 10 15:39:39 IST 2016
              total        used        free      shared  buff/cache   available
Mem:            15G        504M         11G         24M        3.2G         14G
Swap:          7.9G         51M        7.8G
Thu Nov 10 15:39:39 IST 2016
Thu Nov 10 15:39:39 IST 2016
[root@rhs-client45 gluster]# ls
glusterdump.16078.dump.1478772579
[root@rhs-client45 gluster]# date;free -h;date ;kill -USR1 16078;date
Thu Nov 10 15:41:05 IST 2016
              total        used        free      shared  buff/cache   available
Mem:            15G        509M         11G         24M        3.2G         14G
Swap:          7.9G         51M        7.8G
Thu Nov 10 15:41:05 IST 2016
Thu Nov 10 15:41:05 IST 2016
[root@rhs-client45 gluster]# ll
total 596
-rw-------. 1 root root 301703 Nov 10 15:39 glusterdump.16078.dump.1478772579
-rw-------. 1 root root 305566 Nov 10 15:41 glusterdump.16078.dump.1478772665
[root@rhs-client45 gluster]# ll
total 596
-rw-------. 1 root root 301703 Nov 10 15:39 glusterdump.16078.dump.1478772579
-rw-------. 1 root root 305566 Nov 10 15:41 glusterdump.16078.dump.1478772665
[root@rhs-client45 gluster]# 
[root@rhs-client45 gluster]# ll
total 596
-rw-------. 1 root root 301703 Nov 10 15:39 glusterdump.16078.dump.1478772579
-rw-------. 1 root root 305566 Nov 10 15:41 glusterdump.16078.dump.1478772665
[root@rhs-client45 gluster]# date;free -h;date ;kill -USR1 16078;date
Thu Nov 10 15:44:20 IST 2016
              total        used        free      shared  buff/cache   available
Mem:            15G        518M         11G         24M        3.2G         14G
Swap:          7.9G         51M        7.8G
Thu Nov 10 15:44:20 IST 2016
Thu Nov 10 15:44:20 IST 2016
[root@rhs-client45 gluster]# 


Attached are new statedumps

Comment 4 Nag Pavan Chilakam 2016-11-10 10:23:00 UTC
Created attachment 1219301 [details]
new statedumps with append only

Comment 5 Nag Pavan Chilakam 2016-11-10 10:33:56 UTC
Just a Note: Compound FOPs is enabled :)

Comment 6 Krutika Dhananjay 2016-11-11 10:14:31 UTC
So there is indeed iobuf leak as per the statedump attached.

Just to confirm that it is compound fops indeed that is contributing to the leak (and not any of the several other options enabled on the volume), I did a dd on a plain replicate volume without compound-fops, captured statedump before and after the dd - there was no increase in active iobufs. Once I enabled compound fops and did a dd again, the number of active iobufs had increased from 1 to 6 and after another dd from 6 to 13.
Also checked afr changes for compound fops - I didn't find any mem leaks there. So the only possibility is leaks at the protocol/client layer which I am investigating through code reading at the moment.

Comment 7 Krutika Dhananjay 2016-11-16 12:54:40 UTC
Found the leaks. Nice catch, Nag! :)

Comment 8 Krutika Dhananjay 2016-11-16 13:01:45 UTC
The patch on upstream master is out for review - http://review.gluster.org/#/c/15860/
Moving this bug to POST state.

Comment 9 Krutika Dhananjay 2016-11-17 06:49:35 UTC
https://code.engineering.redhat.com/gerrit/#/c/90421/ <--- patch posted

Waiting on QE and PM ack before it can be merged downstream.

Comment 14 Nag Pavan Chilakam 2016-12-05 09:47:23 UTC
QA validation:
Raised https://bugzilla.redhat.com/show_bug.cgi?id=1401380 while on_qa verification

Comment 15 Nag Pavan Chilakam 2016-12-16 10:17:12 UTC
I have done the same case in same environment
I did see fuse mem leaks, but this has nothing to do with compound fops as it was seen without even cfops enabled.
Also, should kruthika the statedumps and they didn't have any pointrrs to cfops buffs, hence moving to verified


tested on 3.8.4-8

Comment 17 errata-xmlrpc 2017-03-23 06:18:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.