Bug 1120570 - glustershd high memory usage on FreeBSD
Summary: glustershd high memory usage on FreeBSD
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: x86_64
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2014-07-17 07:56 UTC by Harshavardhana
Modified: 2017-08-30 10:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2017-08-30 10:37:51 UTC
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:

Attachments (Terms of Use)
Statedump memory-leak (6.45 KB, application/gzip)
2014-07-17 07:57 UTC, Harshavardhana
no flags Details
FreeBSD memory usage sampled in 1sec (17.00 KB, image/png)
2014-07-17 07:57 UTC, Harshavardhana
no flags Details
FreeBSD memory usage sampled in 10sec (17.08 KB, image/png)
2014-07-17 07:57 UTC, Harshavardhana
no flags Details
Valgrind output beware its 3.2Gigs!! (12.46 MB, application/gzip)
2014-07-17 08:02 UTC, Harshavardhana
no flags Details
Gluster Management Daemon logs (13.71 KB, application/gzip)
2014-07-18 05:25 UTC, Harshavardhana
no flags Details

Description Harshavardhana 2014-07-17 07:56:14 UTC
Description of problem:
Memory leak on FreeBSD self-heal-daemon, brick, nfs server with no data on the volume 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start glusterd
2. Start/Create a replicated volume
3. self-heal-deamon uses a lot of memory and FreeBSD memory manager kills the process

Actual results:
Memory leak

Expected results:
No memory leak

Comment 1 Harshavardhana 2014-07-17 07:57:09 UTC
Created attachment 918627 [details]
Statedump memory-leak

Comment 2 Harshavardhana 2014-07-17 07:57:33 UTC
Created attachment 918628 [details]
FreeBSD memory usage sampled in 1sec

Comment 3 Harshavardhana 2014-07-17 07:57:57 UTC
Created attachment 918630 [details]
FreeBSD memory usage sampled in 10sec

Comment 4 Harshavardhana 2014-07-17 08:02:16 UTC
Created attachment 918632 [details]
Valgrind output beware its 3.2Gigs!!

Comment 5 krishnan parthasarathi 2014-07-17 10:41:00 UTC
I looked for the following things in the statedump for any memory allocation
related oddities.

1) grep "pool-misses" *dump*
This tells us if there were any objects whose allocated mem-pool wasn't sufficient
for the load it was working under.
I see that the pool-misses were zero, which means we are doing good with the mem-pools we allocated.

2) grep "hot-count" *dump*
This tells us the no. of objects of any kind that is 'active' in the process while the state-dump
was taken. This should allow us to see if the numbers we see are explicable.
I see the maximum hot-count across statedumps of processes is 50, which isn't alarming or pointing any obvious memory leaks.

The above observations indicate that some object that is not mem-pool allocated is being leaked or the statedump was taken 'prematurely'. That is the memory leak has not yet reached observably significant levels.

Comment 6 Harshavardhana 2014-07-18 05:25:50 UTC
Created attachment 918968 [details]
Gluster Management Daemon logs


I had some of my own debugging logs while debugging the issue of 'glustershd' pid showing as 'N/A' - You can observe RPC_CLNT_CONNECT without RPC_CLNT_DISCONNECT, btw i do not see RPC_CLNT_CONNECT always in a loop 

But without any logs memory does increase, in-fact the strange part is sometimes there is no RPC_CLNT_CONNECT notification from self-heal daemon - if i kill the self-heal daemon then brick memory utilization which was being incremented stops. Enable self-heal daemon again brick memory usage climbs up.

So we in-fact figured two issues here 

- N/A for a running glustershd process through volume status, glustershd doesn't register itself with Gluster management deamon on volume start, but on a Gluster management daemon restart it gets fixed
- glustershd also going OOM on FreeBSD, in-fact the issue fixed in previous run is now back again - in this case i do see a RPC_CLNT_DISCONNECT - now restarting the volume doesn't fix the N/A issue from volume status - you have to restart 'gluster management' daemon again. 

And this cycle continues!

Comment 8 Ravishankar N 2016-02-09 13:06:11 UTC
Assigning it to the maintainer.

Comment 9 Karthik U S 2017-08-30 10:37:51 UTC
Closing this bug as it is filed on a version which is EOL. We have new releases available with fixes to the memory leaks. If you were able to hit this issue on the latest maintained release, please feel free to reopen the bug against that version.

Note You need to log in before you can comment on or make changes to this bug.