1329335 – GlusterFS - Memory Leak - High Memory Utilization

Bug 1329335 - GlusterFS - Memory Leak - High Memory Utilization

Summary: GlusterFS - Memory Leak - High Memory Utilization

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.7.11
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Mohit Agrawal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1352854
TreeView+	depends on / blocked

Reported:	2016-04-21 16:06 UTC by Nagendran
Modified:	2017-03-08 10:57 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Clones:	1352854 (view as bug list)
Environment:
Last Closed:	2017-03-08 10:57:24 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
GlusterFS dump file (725.39 KB, text/plain) 2016-04-21 16:06 UTC, Nagendran	no flags	Details
First Dump File (17.70 KB, text/plain) 2016-04-25 12:46 UTC, Nagendran	no flags	Details
second dump file (17.72 KB, text/plain) 2016-04-25 12:46 UTC, Nagendran	no flags	Details
Command History for the first node. (1.49 MB, text/plain) 2016-04-26 09:48 UTC, Nagendran	no flags	Details
command history log for node2 (1.81 MB, text/plain) 2016-04-26 09:48 UTC, Nagendran	no flags	Details
Show Obsolete (1) View All

Description Nagendran 2016-04-21 16:06:30 UTC

Created attachment 1149509 [details]
GlusterFS dump file

We are using GlusterFS 3.7.11 (upgraded from 3.7.6 last week) on RHEL 7.x in AWS EC2.


We continue to see memory utilization going up once every 2 days. The memory utilization of the server demon(glusterd) in  NFS server is keep on increasing. In about 30+ hours the Memory utilization of glusterd service alone will reach 70% of memory available. Since we have alarms for this threshold, we get notified and only way to stop it so far is to restart the glusterd. 

This happens even where there’s not a lot of load in GlusterFS.

The GlusterFS is configured in the two server nodes with two mount location.

$ df -i
Filesystem        Inodes  IUsed     IFree IUse% Mounted on

/dev/xvdf      125829120 120186 125708934    1% /nfs_app1
/dev/xvdg      125829120 142937 125686183    1% /nfs_app2

As part of debugging, we tried the following:

1.	From the client side, in the mount location, we tried to read and write around 1000 files (each of 4MB size). There was no marked spike in memory utilization during this time. 
2.	We were using GlusterFS 3.7.6 and moved to 3.7.11 and despite that the problem persists.
3.	We created a dump of the volume in question. The dump file is attached. Some of memory allocations such as gf_common_mt_asptinlf_memoryusage has huge total_allocs. Specifically 3 of them that are listed below.


[global.glusterfs - usage-type gf_common_mt_asprintf memusage]
size=260
num_allocs=12
max_size=2464
max_num_allocs=294
total_allocs=927964

[global.glusterfs - usage-type gf_common_mt_char memusage]
size=6388
num_allocs=164
max_size=30134
max_num_allocs=645
total_allocs=1424017

[protocol/server.xyz-server - usage-type gf_common_mt_strdup memusage]
size=26055
num_allocs=2795
max_size=27198
max_num_allocs=2828
total_allocs=135503

4.	We also noticed that the mempool has nr_files as a negative number. Not sure if this is also a cause of the problem.

[mempool]
[storage/posix.xyz-posix]
base_path=/nfs_xyz/abc
base_path_length=25
max_read=44215866
max_write=104925485
nr_files=-418


This is happening in Prod and as expected generates a lot of problems. 
Has anybody seen this before? Any insights into what we can try would be greatly appreciated.

Comment 1 Kaushal 2016-04-25 07:37:03 UTC

Hi Nagendra,

Could you provide the statedumps of the GlusterD process? The dumps you've provided are of the brick processes (ie. glusterfsd).

You can get the statedump of the the GlusterD process by sending is a SIGUSR1 signal.
`kill -SIGUSR1 <pid of glusterd>`.
The statedump files will be created in /var/run/gluster.

It would be nice if you can provide statedumps from two different times, so that we can compare what changed.

Comment 2 Nagendran 2016-04-25 12:46:00 UTC

Created attachment 1150457 [details]
First Dump File

Comment 3 Nagendran 2016-04-25 12:46:36 UTC

Created attachment 1150458 [details]
second dump file

Comment 4 Nagendran 2016-04-25 12:48:29 UTC

Hi Kaushal,

    Thanks for your response.

    As you suggested, We have taken statedumps in different time interval, when we taken first dump, the memory utilization of glusterd is 5.6% , after one hour its increased by 7.6%. So both these dumps have been attached for your reference.

Note: The time between the dumps is 1hr.

     For any more information on this, please let us know.

Thanks,
Nagendran N

Comment 5 Atin Mukherjee 2016-04-25 15:24:51 UTC

What all commands have been run? Could you also attach the cmd_history.log file from all the nodes?

Comment 6 Nagendran 2016-04-26 09:48:02 UTC

Created attachment 1150789 [details]
Command History for the first node.

Command History for the first node.

Comment 7 Nagendran 2016-04-26 09:48:49 UTC

Created attachment 1150790 [details]
command history log for node2

Comment 8 Nagendran 2016-04-26 09:51:01 UTC

Hi Kaushal,
I have attached the command history for both the nodes as part of this as requested.

Please let us know if you have any thoughts on what's going on. 
Unfortunately, this is creating a serious problem (memory alarms in Prod getting triggered almost every alternate day) and we are ending up restarting during loads. 

Appreciate any pointers on the problem.

Comment 9 Mohammed Rafi KC 2016-04-26 12:51:45 UTC

From your cmd_history, I see your running lot of profile commands. So are you observing memory leak when you run profile commands continuously?. I'm asking this because that will help to figure out the memory leak path easily.

Comment 10 Nagendran 2016-04-26 13:42:38 UTC

We are using a custom NewRelic plugin to get some metrics about the GlusterFS peer for monitoring - basically a cron job that runs this command:

gluster volume profile ${1} info | grep -n "Brick" | sed -n 2p | cut -d":" -f 1

We have temporarily stopped the agent now and will check if this helps. Is this what you are suspecting as the root cause?

The reason why we added the agent was to monitor the following:
I/O operation per second.
Directories activities per second
Files activities per second
Files Information activities per second
Files Latency
Directory Latency

Are there any other ways of doing it without doing the volume profile?

Comment 11 Kaushal 2016-04-27 11:53:10 UTC

As Rafi has mentioned already, I too think it's the volume profile polling causing issues.

From the statedumps, I see that memory allocations for dict_t, data_t, data_pair_t, gf_common_mt_memdup, gf_common_mt_asprintf and gf_common_mt_strdup have increased quite a lot. These are the memory types generally associated with the GlusterFS dictionary data type and operations on it (including dict_serialize and unserialize).

Information in GlusterFS is passed between processes (brick to glusterd, glusterd to glusterd and glusterd to cli) using dictionaries as containers. But certain operations generate a large amount of data, like 'volume profile', which make dictionaries huge. Collecting information from multiple sources in volume profile involves a lot of data duplication happening, which uses a lot of memory.

While in most cases, memory allocated to dictionaries should be freed upon the dictionary being destroyed, it appears that there is a quite significant leak in volume profile path. We'll try to identify the leak as soon as we can.

In the meantime, I do hope that stopping the agent has helped. GlusterFS doesn't have any other way, apart from volume profile, to gather volume stats on server. A client side tool, glusterfsiostat[1], was implemented a couple of years back as a GSOC project. You could try it out.

If that doesn't work out, and you really need to monitor the stats, I suggest that you reduce the polling interval. From the logs, I see that the interval is 1 minute right now, which can be changed to say 5 minutes. Also, the polling is being done from both servers. This effectively make the polling period 30 seconds. You can just use one of the servers to get the stats, as volume profile gathers stats of a volume from the whole cluster.

Comment 12 Nagendran 2016-04-27 14:02:20 UTC

Hi Kaushal,
Stopping the agent helped. We have not seen the leak since then. So the theory is right.

Thanks much for your pointer. It helped us to zero in on the plugin.

Like you said if we decrease the frequency, wouldn't we still have this problem (just a little later)? We can try that in a lower environment - but I suspect it may not help much.

The client side is also a problem because there are many clients and they change as well. 

The best option seem to be identifying the way to clear the allocated memory right after the information is read. Is there a way to do that? If so, we can add it to the plugin so every time we read we reset the allocation.

Will that work?

Comment 13 Vijay Bellur 2016-07-05 09:12:56 UTC

REVIEW: http://review.gluster.org/14862 (Memory leak in gluster volume profile command) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 14 Mohit Agrawal 2016-08-10 07:39:03 UTC

http://review.gluster.org/#/c/15081/

Comment 15 Kaushal 2017-03-08 10:57:24 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.