Created attachment 1337554 [details]
Gluster State Dump
Description of problem:
The glusterfs process on client which use the client FUSE mount consume as much system memory and swap allocation as they can over time, eventually leading to the process being killed due to OOM and the mount dropping.
This occurs after a large amount of data (Both size and file count, although I've not been able to rule out one over the other, as this machine does both regularly) has been transferred over the mount point.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Mount gluster volume via FUSE client
2.Transfer a lot of data
3.Watch Mem usage on glusterfs process increase over time
Memory usage increases over time eventually leading to the glusterfs process being killed by OOM and the mount dropping
For the glusterfs process to release the memory it is consuming to avoid OOM issues.
Gluster volume version is 3.10.3
I have one client on 3.10.3 and one client on 3.11.3, both experience the same issue.
This only occurs on clients which pass a large amount of traffic consistently (100s of GB daily).
These mounts also process a large number of concurrent connections (up to 50 at a time) which may be playing some part in the issue.
Forgot to mention, this is on Ubuntu 16.04.2 and 16.04.3
More additional info based on guidelines from gluster docs.
GlusterFS Cluster Information:
Number of volumes: 1
Volume Names: gvAA01
Volume on which the particular issue is seen [ if applicable ]: gvAA01
Type of volumes: Distributed Replicated
Volume options if available:
Output of gluster volume info
Volume Name: gvAA01
Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118
Snapshot Count: 0
Number of Bricks: 5 x (2 + 1) = 15
Brick3: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick1 (arbiter)
Brick6: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick2 (arbiter)
Brick9: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick3 (arbiter)
Brick12: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick4 (arbiter)
Brick15: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick5 (arbiter)
Output of gluster volume status
root@PB-WA-AA-00-A:/# gluster volume status
Status of volume: gvAA01
Gluster process TCP Port RDMA Port Online Pid
Brick PB-WA-AA-01-B:/brick1/gvAA01/brick 49152 0 Y 10547
Brick PB-WA-AA-02-B:/brick1/gvAA01/brick 49152 0 Y 10380
ck1 49152 0 Y 16770
Brick PB-WA-AA-01-B:/brick2/gvAA01/brick 49153 0 Y 10554
Brick PB-WA-AA-02-B:/brick2/gvAA01/brick 49153 0 Y 10388
ck2 49153 0 Y 16789
Brick PB-WA-AA-01-B:/brick3/gvAA01/brick 49154 0 Y 10565
Brick PB-WA-AA-02-B:/brick3/gvAA01/brick 49154 0 Y 10396
ck3 49154 0 Y 20685
Brick PB-WA-AA-01-B:/brick4/gvAA01/brick 49155 0 Y 10571
Brick PB-WA-AA-02-B:/brick4/gvAA01/brick 49155 0 Y 10404
ck4 49155 0 Y 14312
Brick PB-WA-AA-01-B:/brick5/gvAA01/brick 49156 0 Y 990
Brick PB-WA-AA-02-B:/brick5/gvAA01/brick 49156 0 Y 14869
ck5 49156 0 Y 19462
NFS Server on localhost 2049 0 Y 2950
Self-heal Daemon on localhost N/A N/A Y 2959
NFS Server on PB-WA-AA-01-B 2049 0 Y 23815
Self-heal Daemon on PB-WA-AA-01-B N/A N/A Y 23824
NFS Server on PB-WA-AA-02-B 2049 0 Y 14889
Self-heal Daemon on PB-WA-AA-02-B N/A N/A Y 14898
Task Status of Volume gvAA01
Task : Rebalance
ID : 5930cdcd-bb76-4d32-aeca-c41aea8f832d
Status : in progress
OS Type: Ubuntu Linux
Mount type: gluster FUSE client
OS Version: 16.04.3
Created attachment 1337560 [details]
Mount log file
The only large allocations I see in the statedump are:
[mount/fuse.fuse - usage-type gf_common_mt_circular_buffer_t memusage]
[mount/fuse.fuse - usage-type gf_common_mt_char memusage]
Do you have any more statedumps taken at intervals?
I did have state dumps being collected at regular intervals, however these appear to have cleared themselves.
As of current, the issue appears to have ceased.
We have also moved some workload from this machine to another, which may have resolved the issue.
The new machine is currently displaying the same behaviour, where it gradually consumes additional memory without releasing it.
I'll begin taking state dumps on this machine, however previous attempts at this have not been successful.
Would you like me to raise a new bug report for the new machine?
Or dump all the info into this bug report?
You can add them to this BZ.
There is a known issue where the Fuse mount process doesn't release inodes so as you process more files, the size of the inode table grows.However, I would like to rule out other memory leaks.
Created attachment 1340591 [details]
glusterfs process statedump
Created attachment 1340592 [details]
glusterfs process statedump
Created attachment 1340593 [details]
glusterfs process statedump
I've added 3 new statedumps for this one.
I do have another one, however it's 6GB in size, and I'm pretty certain it's not complete.
It was filling my /run/ partition.
I can truncate it, but would like to know, would the most useful info be at the start or the end of the file?
We are experiencing the same problem. Our cluster is made up of 3 nodes. We created around 160K small files (4K each), then removed them. Our fuse client is still using around half a GB (after a day).
I've got a couple of new statedumps for this one, however they're too large to upload to the bug report (45MB).
Do you guys have somewhere I can send these?
Created attachment 1343753 [details]
statedump [Gluster Client - High Memory Usage]
Gluster Client - High Memory Usage
We created around 160K files using:
`./smallfile/smallfile_cli.py --top /usr/local/gfs/data/mirrored-data/test --threads 16 --file-size 16 --files 10000 --response-times Y"`
After deleting them, used memory barely went down.
OS: Centos 7
Gluster Versions: 3.10.5, 3.12.1, 3.12.2
(In reply to Nithya Balachandran from comment #6)
> You can add them to this BZ.
> There is a known issue where the Fuse mount process doesn't release inodes
> so as you process more files, the size of the inode table grows.However, I
> would like to rule out other memory leaks.
statedumps attached don't show large number of inodes (both active and inactive) in itable. Maximum count of inodes in active and lru list on client is less than 50. Hence its not the case of memory consumption due to kernel not forgetting inodes.
I tried running the smallfiles test on various types of EC2 servers (m4.large, m4.xlarge & m4.2xlarge). The total amount of memory on these servers is 8GB, 16GB, and 32GB, respectively. The amount of memory used after writing and reading 1 million files was ~1GB, ~2GB, ~3GB, respectively.
Then, I checked the statedump files for the m4.large and m4.2xlarge. There was one noticeably large difference. Under "xlator.mount.fuse.priv", the "iobuf" for the m4.large was approximately half of the m4.xlarge, which was taking about 2 times more memory for the fuse mount.
I'm guessing there is some correlation with the amount of memory used and the total amount of memory on the box. Does anyone know if there is a way to place a limit on this "iobuf"?
Josh Coyle's statedumps also show very large numbers for the "iobuf" value under the "xlator.mount.fuse.priv" section.
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.
As a result this bug is being closed.
If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.