Created attachment 1337554 [details] Gluster State Dump Description of problem: The glusterfs process on client which use the client FUSE mount consume as much system memory and swap allocation as they can over time, eventually leading to the process being killed due to OOM and the mount dropping. This occurs after a large amount of data (Both size and file count, although I've not been able to rule out one over the other, as this machine does both regularly) has been transferred over the mount point. Version-Release number of selected component (if applicable): glusterfs 3.10.3 How reproducible: Highly consistently Steps to Reproduce: 1.Mount gluster volume via FUSE client 2.Transfer a lot of data 3.Watch Mem usage on glusterfs process increase over time Actual results: Memory usage increases over time eventually leading to the glusterfs process being killed by OOM and the mount dropping Expected results: For the glusterfs process to release the memory it is consuming to avoid OOM issues. Additional info: Gluster volume version is 3.10.3 I have one client on 3.10.3 and one client on 3.11.3, both experience the same issue. This only occurs on clients which pass a large amount of traffic consistently (100s of GB daily). These mounts also process a large number of concurrent connections (up to 50 at a time) which may be playing some part in the issue.
Forgot to mention, this is on Ubuntu 16.04.2 and 16.04.3
More additional info based on guidelines from gluster docs. GlusterFS Cluster Information: Number of volumes: 1 Volume Names: gvAA01 Volume on which the particular issue is seen [ if applicable ]: gvAA01 Type of volumes: Distributed Replicated Volume options if available: Options Reconfigured: cluster.data-self-heal: off cluster.lookup-unhashed: auto cluster.lookup-optimize: on cluster.self-heal-daemon: enable client.bind-insecure: on server.allow-insecure: on nfs.disable: off transport.address-family: inet cluster.favorite-child-policy: size Output of gluster volume info Volume Name: gvAA01 Type: Distributed-Replicate Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 Status: Started Snapshot Count: 0 Number of Bricks: 5 x (2 + 1) = 15 Transport-type: tcp Bricks: Brick1: PB-WA-AA-01-B:/brick1/gvAA01/brick Brick2: PB-WA-AA-02-B:/brick1/gvAA01/brick Brick3: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick1 (arbiter) Brick4: PB-WA-AA-01-B:/brick2/gvAA01/brick Brick5: PB-WA-AA-02-B:/brick2/gvAA01/brick Brick6: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick2 (arbiter) Brick7: PB-WA-AA-01-B:/brick3/gvAA01/brick Brick8: PB-WA-AA-02-B:/brick3/gvAA01/brick Brick9: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick3 (arbiter) Brick10: PB-WA-AA-01-B:/brick4/gvAA01/brick Brick11: PB-WA-AA-02-B:/brick4/gvAA01/brick Brick12: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick4 (arbiter) Brick13: PB-WA-AA-01-B:/brick5/gvAA01/brick Brick14: PB-WA-AA-02-B:/brick5/gvAA01/brick Brick15: PB-WA-AA-00-A:/arbiterAA01/gvAA01/brick5 (arbiter) Options Reconfigured: cluster.data-self-heal: off cluster.lookup-unhashed: auto cluster.lookup-optimize: on cluster.self-heal-daemon: enable client.bind-insecure: on server.allow-insecure: on nfs.disable: off transport.address-family: inet cluster.favorite-child-policy: size Output of gluster volume status root@PB-WA-AA-00-A:/# gluster volume status Status of volume: gvAA01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick PB-WA-AA-01-B:/brick1/gvAA01/brick 49152 0 Y 10547 Brick PB-WA-AA-02-B:/brick1/gvAA01/brick 49152 0 Y 10380 Brick PB-WA-AA-00-A:/arbiterAA01/gvAA01/bri ck1 49152 0 Y 16770 Brick PB-WA-AA-01-B:/brick2/gvAA01/brick 49153 0 Y 10554 Brick PB-WA-AA-02-B:/brick2/gvAA01/brick 49153 0 Y 10388 Brick PB-WA-AA-00-A:/arbiterAA01/gvAA01/bri ck2 49153 0 Y 16789 Brick PB-WA-AA-01-B:/brick3/gvAA01/brick 49154 0 Y 10565 Brick PB-WA-AA-02-B:/brick3/gvAA01/brick 49154 0 Y 10396 Brick PB-WA-AA-00-A:/arbiterAA01/gvAA01/bri ck3 49154 0 Y 20685 Brick PB-WA-AA-01-B:/brick4/gvAA01/brick 49155 0 Y 10571 Brick PB-WA-AA-02-B:/brick4/gvAA01/brick 49155 0 Y 10404 Brick PB-WA-AA-00-A:/arbiterAA01/gvAA01/bri ck4 49155 0 Y 14312 Brick PB-WA-AA-01-B:/brick5/gvAA01/brick 49156 0 Y 990 Brick PB-WA-AA-02-B:/brick5/gvAA01/brick 49156 0 Y 14869 Brick PB-WA-AA-00-A:/arbiterAA01/gvAA01/bri ck5 49156 0 Y 19462 NFS Server on localhost 2049 0 Y 2950 Self-heal Daemon on localhost N/A N/A Y 2959 NFS Server on PB-WA-AA-01-B 2049 0 Y 23815 Self-heal Daemon on PB-WA-AA-01-B N/A N/A Y 23824 NFS Server on PB-WA-AA-02-B 2049 0 Y 14889 Self-heal Daemon on PB-WA-AA-02-B N/A N/A Y 14898 Task Status of Volume gvAA01 ------------------------------------------------------------------------------ Task : Rebalance ID : 5930cdcd-bb76-4d32-aeca-c41aea8f832d Status : in progress Client Information OS Type: Ubuntu Linux Mount type: gluster FUSE client OS Version: 16.04.3
Created attachment 1337560 [details] Mount log file
The only large allocations I see in the statedump are: [mount/fuse.fuse - usage-type gf_common_mt_circular_buffer_t memusage] size=32768 num_allocs=1025 max_size=32768 max_num_allocs=1025 total_allocs=1197200 [mount/fuse.fuse - usage-type gf_common_mt_char memusage] size=128063 num_allocs=1024 max_size=152481 max_num_allocs=1028 total_allocs=1268262 Do you have any more statedumps taken at intervals?
I did have state dumps being collected at regular intervals, however these appear to have cleared themselves. As of current, the issue appears to have ceased. We have also moved some workload from this machine to another, which may have resolved the issue. The new machine is currently displaying the same behaviour, where it gradually consumes additional memory without releasing it. I'll begin taking state dumps on this machine, however previous attempts at this have not been successful. Would you like me to raise a new bug report for the new machine? Or dump all the info into this bug report?
You can add them to this BZ. There is a known issue where the Fuse mount process doesn't release inodes so as you process more files, the size of the inode table grows.However, I would like to rule out other memory leaks.
Created attachment 1340591 [details] glusterfs process statedump
Created attachment 1340592 [details] glusterfs process statedump
Created attachment 1340593 [details] glusterfs process statedump
I've added 3 new statedumps for this one. I do have another one, however it's 6GB in size, and I'm pretty certain it's not complete. It was filling my /run/ partition. I can truncate it, but would like to know, would the most useful info be at the start or the end of the file?
We are experiencing the same problem. Our cluster is made up of 3 nodes. We created around 160K small files (4K each), then removed them. Our fuse client is still using around half a GB (after a day).
I've got a couple of new statedumps for this one, however they're too large to upload to the bug report (45MB). Do you guys have somewhere I can send these? Thanks.
Created attachment 1343753 [details] statedump [Gluster Client - High Memory Usage] Gluster Client - High Memory Usage We created around 160K files using: `./smallfile/smallfile_cli.py --top /usr/local/gfs/data/mirrored-data/test --threads 16 --file-size 16 --files 10000 --response-times Y"` After deleting them, used memory barely went down. OS: Centos 7 Gluster Versions: 3.10.5, 3.12.1, 3.12.2
(In reply to Nithya Balachandran from comment #6) > You can add them to this BZ. > > There is a known issue where the Fuse mount process doesn't release inodes > so as you process more files, the size of the inode table grows.However, I > would like to rule out other memory leaks. statedumps attached don't show large number of inodes (both active and inactive) in itable. Maximum count of inodes in active and lru list on client is less than 50. Hence its not the case of memory consumption due to kernel not forgetting inodes.
I tried running the smallfiles test on various types of EC2 servers (m4.large, m4.xlarge & m4.2xlarge). The total amount of memory on these servers is 8GB, 16GB, and 32GB, respectively. The amount of memory used after writing and reading 1 million files was ~1GB, ~2GB, ~3GB, respectively. Then, I checked the statedump files for the m4.large and m4.2xlarge. There was one noticeably large difference. Under "xlator.mount.fuse.priv", the "iobuf" for the m4.large was approximately half of the m4.xlarge, which was taking about 2 times more memory for the fuse mount. I'm guessing there is some correlation with the amount of memory used and the total amount of memory on the box. Does anyone know if there is a way to place a limit on this "iobuf"?
Josh Coyle's statedumps also show very large numbers for the "iobuf" value under the "xlator.mount.fuse.priv" section.
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.