Bug 1120570

Summary: glustershd high memory usage on FreeBSD
Product: [Community] GlusterFS Reporter: Harshavardhana <fharshav>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: amukherj, bugs, cww, ksubrahm, smohan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: FreeBSD   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-30 10:37:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Statedump memory-leak
none
FreeBSD memory usage sampled in 1sec
none
FreeBSD memory usage sampled in 10sec
none
Valgrind output beware its 3.2Gigs!!
none
Gluster Management Daemon logs none

Description Harshavardhana 2014-07-17 07:56:14 UTC
Description of problem:
Memory leak on FreeBSD self-heal-daemon, brick, nfs server with no data on the volume 

Version-Release number of selected component (if applicable):
mainline

How reproducible:
Always

Steps to Reproduce:
1. Start glusterd
2. Start/Create a replicated volume
3. self-heal-deamon uses a lot of memory and FreeBSD memory manager kills the process

Actual results:
Memory leak

Expected results:
No memory leak

Comment 1 Harshavardhana 2014-07-17 07:57:09 UTC
Created attachment 918627 [details]
Statedump memory-leak

Comment 2 Harshavardhana 2014-07-17 07:57:33 UTC
Created attachment 918628 [details]
FreeBSD memory usage sampled in 1sec

Comment 3 Harshavardhana 2014-07-17 07:57:57 UTC
Created attachment 918630 [details]
FreeBSD memory usage sampled in 10sec

Comment 4 Harshavardhana 2014-07-17 08:02:16 UTC
Created attachment 918632 [details]
Valgrind output beware its 3.2Gigs!!

Comment 5 krishnan parthasarathi 2014-07-17 10:41:00 UTC
I looked for the following things in the statedump for any memory allocation
related oddities.

1) grep "pool-misses" *dump*
This tells us if there were any objects whose allocated mem-pool wasn't sufficient
for the load it was working under.
I see that the pool-misses were zero, which means we are doing good with the mem-pools we allocated.

2) grep "hot-count" *dump*
This tells us the no. of objects of any kind that is 'active' in the process while the state-dump
was taken. This should allow us to see if the numbers we see are explicable.
I see the maximum hot-count across statedumps of processes is 50, which isn't alarming or pointing any obvious memory leaks.

The above observations indicate that some object that is not mem-pool allocated is being leaked or the statedump was taken 'prematurely'. That is the memory leak has not yet reached observably significant levels.

Comment 6 Harshavardhana 2014-07-18 05:25:50 UTC
Created attachment 918968 [details]
Gluster Management Daemon logs

KP, 

I had some of my own debugging logs while debugging the issue of 'glustershd' pid showing as 'N/A' - You can observe RPC_CLNT_CONNECT without RPC_CLNT_DISCONNECT, btw i do not see RPC_CLNT_CONNECT always in a loop 

But without any logs memory does increase, in-fact the strange part is sometimes there is no RPC_CLNT_CONNECT notification from self-heal daemon - if i kill the self-heal daemon then brick memory utilization which was being incremented stops. Enable self-heal daemon again brick memory usage climbs up.

So we in-fact figured two issues here 

- N/A for a running glustershd process through volume status, glustershd doesn't register itself with Gluster management deamon on volume start, but on a Gluster management daemon restart it gets fixed
- glustershd also going OOM on FreeBSD, in-fact the issue fixed in previous run is now back again - in this case i do see a RPC_CLNT_DISCONNECT - now restarting the volume doesn't fix the N/A issue from volume status - you have to restart 'gluster management' daemon again. 

And this cycle continues!

Comment 8 Ravishankar N 2016-02-09 13:06:11 UTC
Assigning it to the maintainer.

Comment 9 Karthik U S 2017-08-30 10:37:51 UTC
Closing this bug as it is filed on a version which is EOL. We have new releases available with fixes to the memory leaks. If you were able to hit this issue on the latest maintained release, please feel free to reopen the bug against that version.