848341 – glusterfs process is taking some 70% mem usage after some stress testing.

Bug 848341 - glusterfs process is taking some 70% mem usage after some stress testing.

Summary: glusterfs process is taking some 70% mem usage after some stress testing.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	fuse
Sub Component:
Version:	2.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra Bhat
QA Contact:	Sachidananda Urs
Docs Contact:
URL:
Whiteboard:
Depends On:	809063
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-15 09:56 UTC by Vidya Sakar
Modified:	2013-09-23 22:36 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	809063
Environment:
Last Closed:	2013-09-23 22:36:19 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vidya Sakar 2012-08-15 09:56:15 UTC

+++ This bug was initially created as a clone of Bug #809063 +++

Created attachment 574469 [details]
glusterfs process statedump.

Description of problem:
statedump output - 

[mallinfo]
mallinfo_arena=2080006144
mallinfo_ordblks=98988
mallinfo_smblks=4
mallinfo_hblks=13
mallinfo_hblkhd=18313216
mallinfo_usmblks=0
mallinfo_fsmblks=400
mallinfo_uordblks=2076200960
mallinfo_fordblks=3805184
mallinfo_keepcost=29040


pool-name=fuse:fd_t
hot-count=0
cold-count=1024
padded_sizeof=100
alloc-count=6911
max-alloc=1
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=fuse:dentry_t
hot-count=1672
cold-count=31096
padded_sizeof=84
alloc-count=120694
max-alloc=32768
pool-misses=2072
max-stdalloc=2072
-----=-----
pool-name=fuse:inode_t
hot-count=1674
cold-count=31094
padded_sizeof=148
alloc-count=249800
max-alloc=32768
pool-misses=4273
max-stdalloc=2074
-----=-----
pool-name=master-client-0:struct saved_frame
hot-count=1
cold-count=511
padded_sizeof=124
alloc-count=186227
max-alloc=4
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=master-client-0:struct rpc_req
hot-count=1
cold-count=511
padded_sizeof=2236
alloc-count=186227
max-alloc=4
pool-misses=0
max-stdalloc=0

pool-name=master-client-0:clnt_local_t
hot-count=1
cold-count=63
padded_sizeof=1284
alloc-count=179163
max-alloc=3
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=master-client-1:struct saved_frame
hot-count=0
cold-count=512
padded_sizeof=124
alloc-count=187084
max-alloc=4
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=master-client-1:struct rpc_req
hot-count=0
cold-count=512
padded_sizeof=2236
alloc-count=187084
max-alloc=4
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=master-client-1:clnt_local_t
hot-count=0
cold-count=64
padded_sizeof=1284
alloc-count=180021
max-alloc=3
pool-misses=0
max-stdalloc=0

--- Additional comment from amarts on 2012-04-11 07:20:19 EDT ---

mostly looks like some fragmentation looking at uord_blks and ord_blks

--- Additional comment from amarts on 2012-04-19 03:39:39 EDT ---

Next time you run these set of tests, please run it through Valgrind, so we can capture the leaks well.

One of the possibility is quick-read's dictionary getting cached in md-cache, which can lead to a huge leak (Thanks to Brian Foster/Avati on the md-cache/quick-read causing memory consumption)

--- Additional comment from amarts on 2012-04-19 03:42:24 EDT ---


> One of the possibility is quick-read's dictionary getting cached in md-cache,
> which can lead to a huge leak (Thanks to Brian Foster/Avati on the
> md-cache/quick-read causing memory consumption)

Correction:

Thanks to Brian Foster and Avati on *finding* the memory consumption issue when md-cache and quick-read are used together.

--- Additional comment from amarts on 2012-05-04 03:01:42 EDT ---

Taking this out of Beta Blocker considering multiple patches which have gone into fix the obvious memory leaks. Only seriously pending tasks would be to handle md-cache/quick-read memory consumption behavior, for which Brian Foster already sent a patch.

Comment 2 Amar Tumballi 2012-08-23 06:45:17 UTC

This bug is not seen in current master branch (which will get branched as RHS 2.1.0 soon). To consider it for fixing, want to make sure this bug still exists in RHS servers. If not reproduced, would like to close this.

Comment 3 Sachidananda Urs 2012-12-18 09:42:39 UTC

After a couple of day's of stress testing including:

* multiple kernel compiles.
* large directory tree copying
* metadata intensive workloads
* rsync huge directory trees
* rename tests in a loop

I'm not able to hit the issue, will continue tests for couple of more days before I mark this as fixed.

Comment 4 Sachidananda Urs 2012-12-21 09:22:31 UTC

After multiple stress tests, the memory consumption doesn't up drastically, stays down to a few megs.

Comment 6 Scott Haines 2013-09-23 22:36:19 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.