1426291 – possible memory leak in glusterfsd with multiplexing

Bug 1426291 - possible memory leak in glusterfsd with multiplexing

Summary: possible memory leak in glusterfsd with multiplexing

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.10
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Jeff Darcy
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1457936 1467986
TreeView+	depends on / blocked

Reported:	2017-02-23 15:51 UTC by krishnaram Karthick
Modified:	2023-09-14 03:54 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Clones:	1457936 1467986 (view as bug list)
Environment:
Last Closed:	2018-06-20 18:27:19 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description krishnaram Karthick 2017-02-23 15:51:12 UTC

Description of problem:
On a 3 node containerized gluster cluster with brick multiplexing enabled, with 500 volumes created, started and mounted, memory consumption on the gluster node seems to be slowly raising without any IO operation being run on any of the volumes. glusterfsd process seems to be consuming 60% of memory i.e., 28 GB of 48 GB of available memory.

Although it is not clear if there is actually a leak, filing this bug so dev can check if there is one.

I've collected statedump for one of the volume with a gap of 2 days. I'll be attaching them shortly.

How reproducible:
Yet to try

Steps to Reproduce:
1. create 3 node containerized gluster cluster
2. enable brick multiplexing - cluster.brick-multiplex on
3. create 500 volume and monitor memory consumption of glusterfsd process

Comment 3 krishnaram Karthick 2017-03-02 13:40:09 UTC

memleak issue seems to be a legitimate one. When IO was started and ran for a while, memory consumption increased and stayed at the same level even when IO was stopped.

Comment 4 krishnaram Karthick 2017-03-03 04:23:03 UTC

I've taken statedump for one of the volume once again after running IOs and attached.

Comment 5 Jeff Darcy 2017-03-09 19:25:56 UTC

Looking at the differences between the statedumps, these two stand out:

   protocol/server.vol1-server gf_common_mt_inode_ctx: 4000 -> 54000
   protocol/server.vol1-server gf_common_mt_strdup: 16007 -> 66007

So, exactly 50K of each, both from protocol/server.  This seems consistent with a memory leak when clients reconnect, if they do so many times, which raises two questions.

(1) Where *exactly* is the leak (or possibly two leaks)?

(2) Why do clients keep reconnecting?

The answer to the second question, unfortunately, might be that our network layer simply isn't capable of handling that many connections, creating queue effects that cause clients to time out.  Can you check for that in the client logs?  Or maybe for a consistent interval between disconnect/reconnect cycles?  Also, have you checked whether this happens *without* multiplexing, given the same rate of reconnections?  I have a strong suspicion that it would, and that the leak has been latent for a long time until multiplexing made it visible.

Comment 6 Atin Mukherjee 2017-06-05 15:19:47 UTC

Hi Jeff,

Do you think one of the way to mitigate problem 2 mentioned in comment 5 can be implementing https://github.com/gluster/glusterfs/issues/151 ?

Comment 7 Shyamsundar 2018-06-20 18:27:19 UTC

This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.

Comment 8 Shyamsundar 2018-06-20 18:27:50 UTC

This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.

Comment 9 krishnaram Karthick 2020-09-28 03:00:13 UTC

clearing stale needinfos.

Comment 10 Red Hat Bugzilla 2023-09-14 03:54:12 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.