Bug 1647277 (RHGS34MemoryLeak)

Summary:	[Tracker]: Memory leak bugs
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Amar Tumballi <atumball>
Component:	core	Assignee:	Sunny Kumar <sunkumar>
Status:	CLOSED NOTABUG	QA Contact:	Rahul Hinduja <rhinduja>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.4	CC:	abhishku, amarts, aspandey, bkunal, jthottan, khiremat, pasik, ravishankar, rhs-bugs, sheggodu, storage-qa-internal, sunkumar
Target Milestone:	---	Keywords:	Tracking
Target Release:	---	Flags:	khiremat: needinfo- sunkumar: needinfo-
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-02-06 07:23:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1400067, 1408104, 1511779, 1530133, 1540403, 1576193, 1579151, 1626574, 1632465, 1637254, 1637574, 1642868, 1644934, 1648893
Bug Blocks:	1386658, 1529501, 1653205, 1655352, 1658979, 1677145

Description Amar Tumballi 2018-11-07 02:55:33 UTC

Description of problem:
There are many bugs which talk about memory consumption of glusterfs. This is a tracker bug for memory leak issues!


Additional info:
This bug is to make sure we consolidate the efforts related to memory leak fixes, and do a good job of addressing all of them together.

Comment 2 Raghavendra G 2018-11-12 04:08:05 UTC

Following is the summary of my observations on high memory usage when memory consumption is driven by large number of inodes so far (the list is not comprehensive to cover all leaks observed so far):

* large number of inodes looked up by kernel driving memory usage of client high. These inodes are in the lru list of itable and is a well known problem and solutions are WIP - bz 1511779.
* large number of "active" inodes (with refcount > 0) which are not looked up by kernel. These inodes are likely leaks or could be cached by readdir-ahead (especially if readdir-ahead rda-cache-limit is higher) - bz 1644934.
* large number of inodes in lru list on bricks. This is due to high network.inode-lru-limit (50000 usually) set by "group metadata-cache" tuning - https://bugzilla.redhat.com/show_bug.cgi?id=1637393#c78.

Comment 3 Raghavendra G 2018-11-12 11:59:01 UTC

[Bug 1648893] statedump doesn't contain information about newer mempools

Comment 4 Raghavendra G 2018-11-19 14:24:04 UTC

Amar, Sunny and others,

Another area of memory accumulation is graph switches. Note that caches/inode-ctxs of inodes in older/unused graphs are not freed up. Do you want to work on that? I am asking this question because graph switch is not a common operation. On the other hand I see customers/GSS experimenting too with turning on and off various translators resulting in graph switches.

If there is a consensus on graph switch being a fairly common operation, we need to cleanup old graphs as they can amount to significant memory consumption and we should file a bug and track that.

Leaving needinfo on Bipin, Amar and Sunny to drive the discussion on this point.

regards,
Raghavendra

Comment 5 Amar Tumballi 2018-12-03 07:23:58 UTC

> If there is a consensus on graph switch being a fairly common operation, we need to cleanup old graphs as they can amount to significant memory consumption and we should file a bug and track that.

Yes, this is very important, and we should focus on that. I see that Mohit is already working a lot on server-side graph cleanups.

Comment 6 Raghavendra G 2018-12-03 07:32:41 UTC

(In reply to Amar Tumballi from comment #5)
> > If there is a consensus on graph switch being a fairly common operation, we need to cleanup old graphs as they can amount to significant memory consumption and we should file a bug and track that.
> 
> Yes, this is very important, and we should focus on that. I see that Mohit
> is already working a lot on server-side graph cleanups.

This problem is present on the clients too. Especially fuse mounts (gfapi had some cleanup-drive long back).

Comment 7 Amar Tumballi 2018-12-03 07:36:17 UTC

> This problem is present on the clients too. Especially fuse mounts (gfapi had some cleanup-drive long back).

For client, my thinking is to get more stronger with profiles, so a user/admin doesn't keep changing the volume setup frequently, rather, doesn't change the volume setting at all.

Comment 8 Raghavendra G 2018-12-11 05:57:03 UTC

While working on bz 1657405, I found that many xlators like afr, EC, bit-rot and trash create their own inode tables. But, it might be the case that contents of all these itables are not dumped to statedumps. If not, its good to dump these itables.

[rgowdapp@rgowdapp rhs-glusterfs]$ git grep inode_table_new
api/src/glfs-master.c:                  itable = inode_table_new (131072, new_subvol);
doc/developer-guide/datastructure-inode.md:inode_table_new (size_t lru_limit, xlator_t *xl)
libglusterfs/src/inode.c:inode_table_new (size_t lru_limit, xlator_t *xl)
libglusterfs/src/inode.h:inode_table_new (size_t lru_limit, xlator_t *xl);
xlators/cluster/afr/src/afr-self-heald.c:       this->itable = inode_table_new (SHD_INODE_LRU_LIMIT, this);
xlators/cluster/dht/src/dht-rebalance.c:        itable = inode_table_new (0, this);
xlators/cluster/ec/src/ec.c:    this->itable = inode_table_new (EC_SHD_INODE_LRU_LIMIT, this);
xlators/features/bit-rot/src/bitd/bit-rot.c:                                child->table = inode_table_new (4096, subvol);
xlators/features/quota/src/quotad-helpers.c:                active_subvol->itable = inode_table_new (4096, active_subvol);
xlators/features/trash/src/trash.c:        priv->trash_itable = inode_table_new (0, this);
xlators/mount/fuse/src/fuse-bridge.c:                itable = inode_table_new (0, graph->top);
xlators/nfs/server/src/nfs.c:        xl->itable = inode_table_new (lrusize, xl);
xlators/protocol/server/src/server-handshake.c:                                inode_table_new (conf->inode_lru_limit,

Comment 12 Bipin Kunal 2019-01-18 08:35:01 UTC

Amar/Sunil,

  What is the target release for this tracker bug?

-Bipin Kunal