Bug 1718734 - Memory leak in glusterfsd process
Summary: Memory leak in glusterfsd process
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 5
Hardware: mips64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: hari gowtham
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1729107 1729108 1759366 1768726 1768742
TreeView+ depends on / blocked
 
Reported: 2019-06-10 06:17 UTC by Abhishek
Modified: 2019-11-06 11:03 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1729107 1768726 (view as bug list)
Environment:
Last Closed: 2019-11-06 11:03:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Script to see the memory leak (18.03 KB, text/x-python)
2019-06-10 06:17 UTC, Abhishek
no flags Details
logs (132.00 KB, text/plain)
2019-06-10 06:19 UTC, Abhishek
no flags Details
logs (132.00 KB, text/plain)
2019-06-10 06:19 UTC, Abhishek
no flags Details
Script to write and delete file in gluster mount point (661 bytes, application/x-shellscript)
2019-06-10 06:20 UTC, Abhishek
no flags Details
Statedumps (136.51 KB, application/zip)
2019-06-10 06:23 UTC, Abhishek
no flags Details
Graph for the memory increase (15.56 KB, image/png)
2019-06-10 06:23 UTC, Abhishek
no flags Details
statedumps log for 6.1 (13.95 KB, application/zip)
2019-06-12 14:53 UTC, Abhishek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 23657 0 None Merged Detach iot_worker to release its resources 2019-11-06 11:03:47 UTC

Description Abhishek 2019-06-10 06:17:59 UTC
Created attachment 1578935 [details]
Script to see the memory leak

Description of problem:

We are seeing the memory leak in glusterfsd process when writing and deleting the specific file in some interval

Version-Release number of selected component (if applicable): Glusterfs 5.4


How reproducible:

Here is the Setup details and test which we are doing as below:


One client, two gluster Server.
The client is writing and deleting one file each 15 minutes by script test_v4.15.sh.

IP
Server side:
128.224.98.157 /gluster/gv0/
128.224.98.159 /gluster/gv0/

Client side:
128.224.98.160 /gluster_mount/

Server side:
gluster volume create gv0 replica 2 128.224.98.157:/gluster/gv0/ 128.224.98.159:/gluster/gv0/ force
gluster volume start gv0

root@128:/tmp/brick/gv0# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: 7105a475-5929-4d60-ba23-be57445d97b5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 128.224.98.157:/gluster/gv0
Brick2: 128.224.98.159:/gluster/gv0
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

exec script: ./ps_mem.py -p 605 -w 61 > log
root@128:/# ./ps_mem.py -p 605
Private + Shared = RAM used Program
23668.0 KiB + 1188.0 KiB = 24856.0 KiB glusterfsd
---------------------------------
24856.0 KiB
=================================


Client side:
mount -t glusterfs -o acl -o resolve-gids 128.224.98.157:gv0 /gluster_mount


We are using the below script write and delete the file.

test_v4.15.sh

Also the below script to see the memory increase whihle the script is above script is running in background.

ps_mem.py

I am attaching the script files as well as the result got after testing the scenario.


Actual results: Memory leak is present 


Expected results: Leak should not be there


Additional info: Please see the attached file for more details also attaching the statedumps

Comment 1 Abhishek 2019-06-10 06:19:20 UTC
Created attachment 1578937 [details]
logs

Comment 2 Abhishek 2019-06-10 06:19:41 UTC
Created attachment 1578938 [details]
logs

Comment 3 Abhishek 2019-06-10 06:20:16 UTC
Created attachment 1578939 [details]
Script to write and delete file in gluster mount point

Comment 4 Abhishek 2019-06-10 06:23:06 UTC
Created attachment 1578940 [details]
Statedumps

Comment 5 Abhishek 2019-06-10 06:23:49 UTC
Created attachment 1578941 [details]
Graph for the memory increase

Comment 6 Abhishek 2019-06-10 06:24:50 UTC
Attached are some statedumps taken on the gluster fs server. An initial dump, then one from an hour or so later, then one from another 3 hours or so. I believe we have looked before at the statedumps though and not seen any evidence there what is going wrong, but please double-check this.

I have a system running with gluster 5.4 setup in replicate mode. So an Active and a Passive server, and one client that has mounted the gluster volume, and simply writes and deletes a file every 15 minutes to the gluster volume. That is all that is going on.

What we see is that the memory usage for glusterfsd process is increasing slowly in a liner fashion. I am running a python script every minute to log the memory usage, and then plot the result on a graph. I attach the graph showing glusterfsd private, shared and total memory usage over time (some 78 days running). I also attach two screenshots from 'top' taken at various stages.

This is the graph during the one file every 15 minutes write test:

Please see the attachment for image.png



And BTW, if we 'hammer' the gluster volume with file writes/deletes in a much faster fashion, ie many files written/deleted every second, or even every minute, we see that the glusterfsd memory usage increases only for a very short period, then it levels off and stays level forever at around 35MB total. So there is clearly something different happening in the 'slow' file access case where the total is at nearly 200MB and still increasing. 

If we run Valgrind, we see that memory allocates are freed up when the process is ended, but the problem we have is that this will be on a system where gluster is up and running all the time. So there seems to be a problem that memory is dynamically allocated each time there is a write/read on the gluster volume, but it is not dynamically freed in runtime. The worry is at some point glusterfsd will completely use up all the memory on the system - might take a long time but this is not acceptable.

My steps are here:

root@board0:/tmp# gluster --version
glusterfs 5.4

root@board0:/tmp# gluster volume status gv0
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick board0:/export/sdb1/brick 49152 0 Y 1702
Brick board1:/export/sdb1/brick 49152 0 Y 1652
Self-heal Daemon on localhost N/A N/A Y 1725
Self-heal Daemon on board1 N/A N/A Y 1675
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks

root@board0:/tmp# jobs
[1]+ Running ./ps_mem.py -w 61 > /tmp/ps_mem.log & (wd: ~)

root@board0:/tmp# ps -ef | grep gluster
root 1608 1 0 May08 ? 00:00:04 /usr/sbin/glusterd -p /var/run/glusterd.pid
root 1702 1 0 May08 ? 00:00:14 /usr/sbin/glusterfsd -s board0 --volfile-id gv0.board0.export-sdb1-brick -p /var/run/gluster/vols/gv0/board0-export-sdb1-brick.pid -S /var/run/gluster/6c09da8ec6e017c8.socket --brick-name /export/sdb1/brick -l /var/log/glusterfs/bricks/export-sdb1-brick.log --xlator-option *-posix.glusterd-uuid=336dc4a8-1371-4366-b2f9-003c35e12ca1 --process-name brick --brick-port 49152 --xlator-option gv0-server.listen-port=49152
root 1725 1 0 May08 ? 00:00:03 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7ae70daf2745f7d4.socket --xlator-option replicate.node-uuid=336dc4a8-1371-4366-b2f9-003c35e12ca1 --process-name glustershd
root 3115 1241 0 03:00 ttyS0 00:00:00 grep gluster 

This is the cmd used to create the gluster volume:

gluster volume create gv0 replica 2 board0:/export/sdb1/brick board1:/export/sdb1/brick

And on client I do like:

mount -t glusterfs board0:gv0 /mnt

and then just run the one file each 15 min test

./test_v4.sh

To get the data, I run like this after some time:

grep glusterfsd ps_mem.log | awk '{ print $1 "," $4 "," $7 }' > gluster54-glusterfsd.csv

Then plot the points in excel

Comment 7 Abhishek 2019-06-12 14:53:12 UTC
Created attachment 1579854 [details]
statedumps log for 6.1

Comment 8 Abhishek 2019-06-12 14:54:43 UTC
Hi,

As requested we have checked tested the same on Glusterfs 6.1 and memory leak is present here as well.

Please check the "statedumps log for 6.1" in attachment for more details.

Comment 9 Abhishek 2019-06-14 10:15:53 UTC
Hi Team,

is there any update on this?

Regards,
Abhishek

Comment 10 hari gowtham 2019-06-14 11:17:32 UTC
(In reply to Abhishek from comment #9)
> Hi Team,
> 
> is there any update on this?
> 
> Regards,
> Abhishek

Hi Abhishek,

I'll take a look into it. 
This will take some time. 
Will update once I have made some progress.

Regards,
Hari.

Comment 11 ryan 2019-06-19 16:02:37 UTC
We are also seeing this issue, with memory slowly increasing over time until all system resources (SWAP and RAM) are exhausted.

Comment 12 Worker Ant 2019-06-21 05:25:29 UTC
REVIEW: https://review.gluster.org/22918 (Detach iot_worker to release its resources) posted (#1) for review on master by Liguang Li

Comment 13 Worker Ant 2019-07-11 12:25:13 UTC
REVISION POSTED: https://review.gluster.org/22918 (Detach iot_worker to release its resources) posted (#2) for review on master by N Balachandran

Comment 14 Worker Ant 2019-11-05 08:28:10 UTC
REVIEW: https://review.gluster.org/23657 (Detach iot_worker to release its resources) posted (#1) for review on release-5 by N Balachandran

Comment 15 Worker Ant 2019-11-06 11:03:48 UTC
REVIEW: https://review.gluster.org/23657 (Detach iot_worker to release its resources) merged (#1) on release-5 by N Balachandran


Note You need to log in before you can comment on or make changes to this bug.