1501146 – FUSE client Memory usage issue

Bug 1501146 - FUSE client Memory usage issue

Summary: FUSE client Memory usage issue

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	fuse
Sub Component:
Version:	3.10
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-12 07:15 UTC by Josh Coyle
Modified:	2018-06-20 18:24 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-06-20 18:24:25 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Gluster State Dump (366.09 KB, text/plain) 2017-10-12 07:15 UTC, Josh Coyle	no flags	Details
Mount log file (15.29 KB, text/plain) 2017-10-12 07:24 UTC, Josh Coyle	no flags	Details
glusterfs process statedump (108.04 KB, text/plain) 2017-10-19 08:30 UTC, Josh Coyle	no flags	Details
glusterfs process statedump (108.07 KB, text/plain) 2017-10-19 08:30 UTC, Josh Coyle	no flags	Details
glusterfs process statedump (272.24 KB, text/plain) 2017-10-19 08:30 UTC, Josh Coyle	no flags	Details
statedump [Gluster Client - High Memory Usage] (46.16 KB, text/plain) 2017-10-26 13:55 UTC, aboubacar.toure	no flags	Details
View All

Description Josh Coyle 2017-10-12 07:15:59 UTC

Created attachment 1337554 [details]
Gluster State Dump

Description of problem:

The glusterfs process on client which use the client FUSE mount consume as much system memory and swap allocation as they can over time, eventually leading to the process being killed due to OOM and the mount dropping.
This occurs after a large amount of data (Both size and file count, although I've not been able to rule out one over the other, as this machine does both regularly) has been transferred over the mount point. 

Version-Release number of selected component (if applicable):

glusterfs 3.10.3

How reproducible:

Highly consistently

Steps to Reproduce:
1.Mount gluster volume via FUSE client
2.Transfer a lot of data
3.Watch Mem usage on glusterfs process increase over time

Actual results:

Memory usage increases over time eventually leading to the glusterfs process being killed by OOM and the mount dropping

Expected results:

For the glusterfs process to release the memory it is consuming to avoid OOM issues.

Additional info:

Gluster volume version is 3.10.3
I have one client on 3.10.3 and one client on 3.11.3, both experience the same issue.
This only occurs on clients which pass a large amount of traffic consistently (100s of GB daily).
These mounts also process a large number of concurrent connections (up to 50 at a time) which may be playing some part in the issue.

Comment 1 Josh Coyle 2017-10-12 07:18:01 UTC

Forgot to mention, this is on Ubuntu 16.04.2 and 16.04.3

Comment 2 Josh Coyle 2017-10-12 07:21:28 UTC

More additional info based on guidelines from gluster docs.

GlusterFS Cluster Information:

    Number of volumes: 1
    Volume Names: gvAA01
    Volume on which the particular issue is seen [ if applicable ]: gvAA01
    Type of volumes: Distributed Replicated
    Volume options if available:

Options Reconfigured:
cluster.data-self-heal: off
cluster.lookup-unhashed: auto
cluster.lookup-optimize: on
cluster.self-heal-daemon: enable
client.bind-insecure: on
server.allow-insecure: on
nfs.disable: off
transport.address-family: inet
cluster.favorite-child-policy: size

    Output of gluster volume info

Volume Name: gvAA01
Type: Distributed-Replicate
Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118
Status: Started
Snapshot Count: 0
Number of Bricks: 5 x (2 + 1) = 15
Transport-type: tcp
Bricks:
Brick1: PB-WA-AA-01-B:/brick1/gvAA01/brick
Brick2: PB-WA-AA-02-B:/brick1/gvAA01/brick
Brick3: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick1 (arbiter)
Brick4: PB-WA-AA-01-B:/brick2/gvAA01/brick
Brick5: PB-WA-AA-02-B:/brick2/gvAA01/brick
Brick6: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick2 (arbiter)
Brick7: PB-WA-AA-01-B:/brick3/gvAA01/brick
Brick8: PB-WA-AA-02-B:/brick3/gvAA01/brick
Brick9: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick3 (arbiter)
Brick10: PB-WA-AA-01-B:/brick4/gvAA01/brick
Brick11: PB-WA-AA-02-B:/brick4/gvAA01/brick
Brick12: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick4 (arbiter)
Brick13: PB-WA-AA-01-B:/brick5/gvAA01/brick
Brick14: PB-WA-AA-02-B:/brick5/gvAA01/brick
Brick15: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick5 (arbiter)
Options Reconfigured:
cluster.data-self-heal: off
cluster.lookup-unhashed: auto
cluster.lookup-optimize: on
cluster.self-heal-daemon: enable
client.bind-insecure: on
server.allow-insecure: on
nfs.disable: off
transport.address-family: inet
cluster.favorite-child-policy: size

    Output of gluster volume status

root@PB-WA-AA-00-A:/# gluster volume status
Status of volume: gvAA01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick PB-WA-AA-01-B:/brick1/gvAA01/brick    49152     0          Y       10547
Brick PB-WA-AA-02-B:/brick1/gvAA01/brick    49152     0          Y       10380
Brick PB-WA-AA-00-A:/arbiterAA01/gvAA01/bri
ck1                                         49152     0          Y       16770
Brick PB-WA-AA-01-B:/brick2/gvAA01/brick    49153     0          Y       10554
Brick PB-WA-AA-02-B:/brick2/gvAA01/brick    49153     0          Y       10388
Brick PB-WA-AA-00-A:/arbiterAA01/gvAA01/bri
ck2                                         49153     0          Y       16789
Brick PB-WA-AA-01-B:/brick3/gvAA01/brick    49154     0          Y       10565
Brick PB-WA-AA-02-B:/brick3/gvAA01/brick    49154     0          Y       10396
Brick PB-WA-AA-00-A:/arbiterAA01/gvAA01/bri
ck3                                         49154     0          Y       20685
Brick PB-WA-AA-01-B:/brick4/gvAA01/brick    49155     0          Y       10571
Brick PB-WA-AA-02-B:/brick4/gvAA01/brick    49155     0          Y       10404
Brick PB-WA-AA-00-A:/arbiterAA01/gvAA01/bri
ck4                                         49155     0          Y       14312
Brick PB-WA-AA-01-B:/brick5/gvAA01/brick    49156     0          Y       990
Brick PB-WA-AA-02-B:/brick5/gvAA01/brick    49156     0          Y       14869
Brick PB-WA-AA-00-A:/arbiterAA01/gvAA01/bri
ck5                                         49156     0          Y       19462
NFS Server on localhost                     2049      0          Y       2950
Self-heal Daemon on localhost               N/A       N/A        Y       2959
NFS Server on PB-WA-AA-01-B                 2049      0          Y       23815
Self-heal Daemon on PB-WA-AA-01-B           N/A       N/A        Y       23824
NFS Server on PB-WA-AA-02-B                 2049      0          Y       14889
Self-heal Daemon on PB-WA-AA-02-B           N/A       N/A        Y       14898

Task Status of Volume gvAA01
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 5930cdcd-bb76-4d32-aeca-c41aea8f832d
Status               : in progress


    Client Information
        OS Type: Ubuntu Linux
        Mount type: gluster FUSE client
        OS Version: 16.04.3

Comment 3 Josh Coyle 2017-10-12 07:24:07 UTC

Created attachment 1337560 [details]
Mount log file

Comment 4 Nithya Balachandran 2017-10-17 04:29:02 UTC

The only large allocations I see in the statedump are:

[mount/fuse.fuse - usage-type gf_common_mt_circular_buffer_t memusage]          
size=32768                                                                      
num_allocs=1025                                                                 
max_size=32768                                                                  
max_num_allocs=1025                                                             
total_allocs=1197200 


[mount/fuse.fuse - usage-type gf_common_mt_char memusage]                       
size=128063                                                                     
num_allocs=1024                                                                 
max_size=152481                                                                 
max_num_allocs=1028                                                             
total_allocs=1268262 



Do you have any more statedumps taken at intervals?

Comment 5 Josh Coyle 2017-10-18 01:49:12 UTC

I did have state dumps being collected at regular intervals, however these appear to have cleared themselves.
As of current, the issue appears to have ceased.
We have also moved some workload from this machine to another, which may have resolved the issue.

The new machine is currently displaying the same behaviour, where it gradually consumes additional memory without releasing it.
I'll begin taking state dumps on this machine, however previous attempts at this have not been successful.
Would you like me to raise a new bug report for the new machine?
Or dump all the info into this bug report?

Comment 6 Nithya Balachandran 2017-10-18 03:14:25 UTC

You can add them to this BZ.

There is a known issue where the Fuse mount process doesn't release inodes so as you process more files, the size of the inode table grows.However, I would like to rule out other memory leaks.

Comment 7 Josh Coyle 2017-10-19 08:30:00 UTC

Created attachment 1340591 [details]
glusterfs process statedump

Comment 8 Josh Coyle 2017-10-19 08:30:35 UTC

Created attachment 1340592 [details]
glusterfs process statedump

Comment 9 Josh Coyle 2017-10-19 08:30:51 UTC

Created attachment 1340593 [details]
glusterfs process statedump

Comment 10 Josh Coyle 2017-10-19 08:32:54 UTC

I've added 3 new statedumps for this one.
I do have another one, however it's 6GB in size, and I'm pretty certain it's not complete.
It was filling my /run/ partition.
I can truncate it, but would like to know, would the most useful info be at the start or the end of the file?

Comment 11 aboubacar.toure 2017-10-25 14:16:06 UTC

We are experiencing the same problem. Our cluster is made up of 3 nodes. We created around 160K small files (4K each), then removed them. Our fuse client is still using around half a GB (after a day).

Comment 12 Josh Coyle 2017-10-26 08:49:48 UTC

I've got a couple of new statedumps for this one, however they're too large to upload to the bug report (45MB).
Do you guys have somewhere I can send these?
Thanks.

Comment 13 aboubacar.toure 2017-10-26 13:55:37 UTC

Created attachment 1343753 [details]
statedump [Gluster Client - High Memory Usage]

Gluster Client - High Memory Usage

We created around 160K files using:

`./smallfile/smallfile_cli.py --top /usr/local/gfs/data/mirrored-data/test --threads 16 --file-size 16 --files 10000 --response-times Y"`

After deleting them, used memory barely went down.

OS: Centos 7
Gluster Versions: 3.10.5, 3.12.1, 3.12.2

Comment 14 Raghavendra G 2017-11-19 14:02:13 UTC

(In reply to Nithya Balachandran from comment #6)
> You can add them to this BZ.
> 
> There is a known issue where the Fuse mount process doesn't release inodes
> so as you process more files, the size of the inode table grows.However, I
> would like to rule out other memory leaks.

statedumps attached don't show large number of inodes (both active and inactive) in itable. Maximum count of inodes in active and lru list on client is less than 50. Hence its not the case of memory consumption due to kernel not forgetting inodes.

Comment 15 Danny Lee 2018-02-22 06:25:37 UTC

I tried running the smallfiles test on various types of EC2 servers (m4.large, m4.xlarge & m4.2xlarge).  The total amount of memory on these servers is 8GB, 16GB, and 32GB, respectively.  The amount of memory used after writing and reading 1 million files was ~1GB, ~2GB, ~3GB, respectively.

Then, I checked the statedump files for the m4.large and m4.2xlarge.  There was one noticeably large difference.  Under "xlator.mount.fuse.priv", the "iobuf" for the m4.large was approximately half of the m4.xlarge, which was taking about 2 times more memory for the fuse mount.

I'm guessing there is some correlation with the amount of memory used and the total amount of memory on the box.  Does anyone know if there is a way to place a limit on this "iobuf"?

Comment 16 Danny Lee 2018-02-22 06:27:25 UTC

Josh Coyle's statedumps also show very large numbers for the "iobuf" value under the "xlator.mount.fuse.priv" section.

Comment 17 Shyamsundar 2018-06-20 18:24:25 UTC

This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.

Note You need to log in before you can comment on or make changes to this bug.