1532192 – memory leak in glusterfsd process

Bug 1532192 - memory leak in glusterfsd process

Summary: memory leak in glusterfsd process

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.13
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-08 10:25 UTC by Alex
Modified:	2019-07-03 20:56 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
volumes configuration (3.39 KB, text/plain) 2018-01-08 10:25 UTC, Alex	no flags	Details
statedump for test case 1 (42.67 KB, text/plain) 2018-01-10 14:03 UTC, Alex	no flags	Details
statedump for test case 1 (after 1 day) (42.61 KB, text/plain) 2018-01-10 14:04 UTC, Alex	no flags	Details
statedump for test case 1 (node B) (42.60 KB, text/plain) 2018-01-10 14:04 UTC, Alex	no flags	Details
statedump for test case 1 (Node B) (after 1 day) (42.66 KB, text/plain) 2018-01-10 14:05 UTC, Alex	no flags	Details
statedump for test case 2 (Node A) (start) (41.97 KB, text/plain) 2018-01-10 14:05 UTC, Alex	no flags	Details
statedump for test case 2 (Node A) (end) (43.46 KB, text/plain) 2018-01-10 14:06 UTC, Alex	no flags	Details
statedump for test case 2 (Node B) (start) (41.90 KB, text/plain) 2018-01-10 14:07 UTC, Alex	no flags	Details
statedump for test case 2 (Node B) (end) (43.40 KB, text/plain) 2018-01-10 14:08 UTC, Alex	no flags	Details
rate of memory growth (4.15 KB, text/plain) 2018-01-10 14:09 UTC, Alex	no flags	Details
View All

Description Alex 2018-01-08 10:25:55 UTC

Created attachment 1378455 [details]
volumes configuration

Description of problem:

We have a production system with a glusterfs working as replication system between two servers. Replication directory has 2Gb total size and contains near 20-30 files. 
We noticed that glusterfsd process memory consumption permanently growth on both servers on 30Mb/day. Memory increase occurs irregularly (I mean memory can be constant for 3 hours but then abrutply 
increase on 16 Megabytes). We are working with 3.8.8-1 release, but have checked that issue is still reprodusible on the latest glusterfs 3.13 release.
Our operation system is Debian Jessie. I can provide any log or debug info that you need to fix this issue. Just tell me what I need to provide.

Version-Release number of selected component (if applicable):3.13


How reproducible:


Steps to Reproduce:
1. Configure two volumes on two different servers in replication mode(configuration in attachment).
2. Modify content of the files periodically (for instance once per minute).
3. Observe that memory of the glusterfsd permanently increasing.

Actual results:
Memory consumption of the glusterfsd process growth and never releases.

Expected results:
Memory consumption of the glusterfsd should not permanently growth while the
size of the mounted directory is not increasing.  

Additional info:

Comment 1 Xavi Hernandez 2018-01-08 15:30:04 UTC

On what consist the modifications you do on the files ?

If you increase the frequency of the changes, does the memory usage grow faster ?

How many days have you tracked this memory increase ? has it been growing at the same speed all the time or the rate of growing has decreased after some days ?

It would also be interesting to provide statedumps of one of the glusterfsd processes at two points in time when memory utilization is significantly different.

You can find information on how to generate statedumps here:

http://docs.gluster.org/en/latest/Troubleshooting/statedump/

Comment 2 Alex 2018-01-10 14:02:21 UTC

If you increase the frequency of the changes, does the memory usage grow faster ?
Yes.

How many days have you tracked this memory increase ?
On our system I see permanent memory growth for 3 weeks. 

Has it been growing at the same speed all the time or the rate of growing has decreased after some days ?
The same rate(see rate.txt file in attachement)

To collect debug statedumps two different test cases were performed:

Test case 1: on the system left for the night. No new files were created.
Every 5 seconds "sudo gluster volume heal home info" was called to check that replication is ok.

In the evening:
Node A:
~$ date
Tue Jan  9 16:28:43 UTC 2018
~$ sudo ps -ely | grep gluster
S     0   6060      1  0  80   0 38224 246871 -     ?        00:00:00 glusterfsd
S     0   6085      1  0  80   0 22772 155022 -     ?        00:00:00 glusterfs
S     0  39482      1  0  80   0 23552 117579 -     ?        00:00:01 glusterd
S     0  40111      1  0  80   0 50496 150073 -     ?        00:00:01 glusterfs

Node B:
~$ date
Tue Jan  9 16:27:33 UTC 2018
~$ sudo ps -ely | grep gluster
S     0 158949      1  0  80   0 20340 117579 -     ?        00:00:00 glusterd
S     0 160258      1  0  80   0 38308 246871 -     ?        00:00:00 glusterfsd
S     0 160278      1  0  80   0 22256 119670 -     ?        00:00:00 glusterfs
S     0 160379      1  0  80   0 39068 161520 -     ?        00:00:00 glusterfs

Attachment (statedumps): NodeA_test_case_1_evening.6060.dump, NodeB_test_case1_evening.160258.dump

In the morning:

Node A:
~$ date 
Wed Jan 10 07:46:11 UTC 2018
~$ sudo ps -ely | grep gluster
S     0   6060      1  0  80   0 39812 263255 -     ?        00:00:21 glusterfsd
S     0   6085      1  0  80   0 22772 155022 -     ?        00:00:01 glusterfs
S     0  39482      1  0  80   0 26092 117579 -     ?        00:00:04 glusterd
S     0  40111      1  0  80   0 50496 150073 -     ?        00:00:01 glusterfs

Node B:
~$ date
Wed Jan 10 07:51:50 UTC 2018
~$ sudo ps -ely | grep gluster
S     0 158949      1  0  80   0 20384 117579 -     ?        00:00:01 glusterd
S     0 160258      1  0  80   0 39892 263255 -     ?        00:00:18 glusterfsd
S     0 160278      1  0  80   0 22256 119670 -     ?        00:00:01 glusterfs
S     0 160379      1  0  80   0 39068 161520 -     ?        00:00:00 glusterfs

Attachment (statedumps): NodeA_test_case_1_morning.6060.dump, NodeB_test_case_1_morning.160258.dump


Test case 2: on the system left for hour with intensive file generation and removing on mounted glusterfs directory.
Every 5 seconds "sudo gluster volume heal home info" was called to check that replication is ok.
Script for file generation:

----------------------
#!/bin/bash

while
   dd if=/dev/urandom of=tmp_file bs=64M count=16 iflag=fullblock;
   sleep 1;
   rm -f tmp_file;
do 
   sleep 1;
done
----------------------

Node A:
~$ sudo ps -ely | grep gluster
S     0  10859      1  0  80   0 21528 117579 -     ?        00:00:00 glusterd
S     0  11900      1  0  80   0 37624 263255 -     ?        00:00:00 glusterfsd
S     0  11920      1  0  80   0 22116 119671 -     ?        00:00:00 glusterfs
S     0  12019      1  0  80   0 41440 145137 -     ?        00:00:00 glusterfs

Node B:
~$ sudo ps -ely | grep gluster
S     0  12076      1  0  80   0 22320 117579 -     ?        00:00:00 glusterd
S     0  12657      1  0  80   0 50268 150074 -     ?        00:00:00 glusterfs
S     0  32953      1  0  80   0 37664 263255 -     ?        00:00:00 glusterfsd
S     0  32973      1  0  80   0 22408 155023 -     ?        00:00:00 glusterfs

Attachment (statedumps): NodeA_test_case_2_start.11900.dump, NodeB_test_case_2_start.32953.dump

After one hour:

Node A:
~$ sudo ps -ely | grep gluster
S     0  10859      1  0  80   0 21528 117579 -     ?        00:00:00 glusterd
S     0  11900      1  3  80   0 38396 296280 -     ?        00:03:37 glusterfsd
S     0  11920      1  0  80   0 22844 136589 -     ?        00:00:00 glusterfs
S     0  12019      1  0  80   0 41628 145137 -     ?        00:00:00 glusterfs

Node B:
~$ sudo ps -ely | grep gluster
S     0  12076      1  0  80   0 22404 117579 -     ?        00:00:00 glusterd
S     0  12657      1  3  80   0 53500 150074 -     ?        00:04:07 glusterfs
S     0  32953      1  1  80   0 38584 312921 -     ?        00:02:09 glusterfsd
S     0  32973      1  0  80   0 22524 155023 -     ?        00:00:00 glusterfs

Attachment (statedumps): NodeA_test_case_2_end.11900.dump, NodeB_test_case_2_start.32953.dump

Comment 3 Alex 2018-01-10 14:03:08 UTC

Created attachment 1379547 [details]
statedump for test case 1

Comment 4 Alex 2018-01-10 14:04:04 UTC

Created attachment 1379548 [details]
statedump for test case 1 (after 1 day)

Comment 5 Alex 2018-01-10 14:04:28 UTC

Created attachment 1379549 [details]
statedump for test case 1 (node B)

Comment 6 Alex 2018-01-10 14:05:08 UTC

Created attachment 1379550 [details]
statedump for test case 1 (Node B) (after 1 day)

Comment 7 Alex 2018-01-10 14:05:49 UTC

Created attachment 1379551 [details]
statedump for test case 2 (Node A) (start)

Comment 8 Alex 2018-01-10 14:06:45 UTC

Created attachment 1379553 [details]
statedump for test case 2 (Node A) (end)

Comment 9 Alex 2018-01-10 14:07:26 UTC

Created attachment 1379554 [details]
statedump for test case 2 (Node B) (start)

Comment 10 Alex 2018-01-10 14:08:29 UTC

Created attachment 1379556 [details]
statedump for test case 2 (Node B) (end)

Comment 11 Alex 2018-01-10 14:09:33 UTC

Created attachment 1379557 [details]
rate of memory growth

Comment 12 Xavi Hernandez 2018-01-11 09:57:11 UTC

Thanks for the info. I'll take a look.

Comment 13 Xavi Hernandez 2018-01-17 12:56:21 UTC

I've tried to reproduce the problem but I haven't been able.

I've tested with 3.8.8, latest 3.13 and master branches. In all cases, after an initial increase when caches and some other internal data are being populated, I haven't observed a steady memory usage increase.

After looking at the code, I've identified a possible memory leak, but only if something fails. Can you attach the logs to see if there are any issues that could help in determine what's causing the memory increase ?

I've also seen that you have network.ping-timeout set to 3. This is considered a too small value.

Comment 14 Shyamsundar 2018-06-20 18:28:09 UTC

This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.

Comment 15 Shyamsundar 2018-06-20 18:28:11 UTC

This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.

Note You need to log in before you can comment on or make changes to this bug.