1285998 – Possible memory leak in the tiered daemon

Bug 1285998 - Possible memory leak in the tiered daemon

Summary: Possible memory leak in the tiered daemon

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	tier
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.1.2
Assignee:	Joseph Elwin Fernandes
QA Contact:	RajeshReddy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1260783 1288019 1288992
TreeView+	depends on / blocked

Reported:	2015-11-27 07:56 UTC by RajeshReddy
Modified:	2016-09-17 15:38 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.7.5-10
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1288019 (view as bug list)
Environment:
Last Closed:	2016-03-01 06:03:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0193	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 update 2	2016-03-01 10:20:36 UTC

Description RajeshReddy 2015-11-27 07:56:41 UTC

Description of problem:
=====================
Possible memory leak in the tiered daemon 


Version-Release number of selected component (if applicable):
================
glusterfs-server-3.7.5-7


How reproducible:


Steps to Reproduce:
============
1. Create 1x2 volume and attach 1x2 hot bricks 
2. Mount it on client on fuse and create a directory and using fallowing script modify files continuously

for i in {1..100}
do
for j in {1..1000000}
do
echo $i $j >> /mnt/afr2x2_tier/data3/file$j
done
done


3. Tiered daemon keep on increasing the consumed memory and end up consumed 11.2 % of total memory 

Actual results:


Expected results:


Additional info:
===========
[root@rhs-client19 afr2x2_tier]# gluster vol info afr2x2_tier
 
Volume Name: afr2x2_tier
Type: Tier
Volume ID: e8d8466d-4883-465c-868d-fd4330e6049e
Status: Started
Number of Bricks: 4
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/tier1
Brick2: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/tier1
Cold Tier:
Cold Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick3: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick7/afr2x2_tier
Brick4: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick7/afr2x2_tier
Options Reconfigured:
cluster.tier-mode: test
cluster.metadata-self-heal: off
cluster.data-self-heal: off
cluster.self-heal-daemon: off
features.ctr-enabled: on
performance.readdir-ahead: on
cluster.entry-self-heal: off

Using fallowing script i keep on collected the memory consumed by tiered daemon 

for i in {1..10000}
do
top -p 32118,32127,32237 >>  /root/scripts/afr2x2_tier/mem.txt
sleep 500
done

Comment 5 RajeshReddy 2015-12-01 07:25:46 UTC

Volume is mounted on client using fuse and collected volume brick process and tier daemon process CPU and Memory usage using top and out put of the same is available @/home/repo/sosreports/bug.1285998 on rhsqe-repo.lab.eng.blr.redhat.com

Comment 6 Joseph Elwin Fernandes 2015-12-01 08:45:56 UTC

I see a memory increase for migration of 1000 files 

From 


  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND        
23419 root      20   0  531032 188712  10692 S   0.3  4.7  12:06.24 memcheck-amd64-

to 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ 
23419 root      20   0  545576 203796  10692 S  91.7  5.0  17:23.31 memcheck-amd64-  



Plus while the tier migrator was run with valgrind



valgrind --leak-check=full --log-file=/root/tier-valgrind.log /usr/local/sbin/glusterfs -s localhost --volfile-id rebalance/test --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier.tier-dht --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=6174cfd6-8978-45f8-b2eb-6ffebc50fc29 --xlator-option *dht.commit-hash=3001687234 --socket-file /var/run/gluster/gluster-tier-b05bfe8d-939a-4111-b221-ca01e91753bc.sock --pid-file /var/lib/glusterd/vols/test/tier/6174cfd6-8978-45f8-b2eb-6ffebc50fc29.pid -l /var/log/glusterfs/test-tier.log


Saw continues errors in valgrind log "/root/tier-valgrind.log"

==23419== Invalid write of size 8
==23419==    at 0x4E87224: __gf_mem_invalidate (mem-pool.c:277)
==23419==    by 0x4E874EA: __gf_free (mem-pool.c:333)
==23419==    by 0x4E9A966: synctask_destroy (syncop.c:391)
==23419==    by 0x4E9A9EA: synctask_done (syncop.c:409)
==23419==    by 0x4E9B323: synctask_switchto (syncop.c:668)
==23419==    by 0x4E9B3DB: syncenv_processor (syncop.c:699)
==23419==    by 0x608A529: start_thread (in /usr/lib64/libpthread-2.20.so)
==23419==    by 0x678C22C: clone (in /usr/lib64/libc-2.20.so)
==23419==  Address 0x1a5741a9 is 2,072,937 bytes inside a block of size 2,097,224 alloc'd
==23419==    at 0x4C2B9C6: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==23419==    by 0x4E86CAE: __gf_calloc (mem-pool.c:117)
==23419==    by 0x4E9ADE1: synctask_create (syncop.c:497)
==23419==    by 0x4E9B020: synctask_new1 (syncop.c:571)
==23419==    by 0x4E9B090: synctask_new (syncop.c:586)
==23419==    by 0x1284CB5A: dht_rebalance_complete_check (dht-helper.c:1116)
==23419==    by 0x1287FE0B: dht_file_setxattr_cbk (dht-common.c:3455)
==23419==    by 0x125ED3F2: afr_fsetxattr_unwind (afr-inode-write.c:1575)
==23419==    by 0x125E65FA: __afr_inode_write_cbk (afr-inode-write.c:171)
==23419==    by 0x125ED480: afr_fsetxattr_wind_cbk (afr-inode-write.c:1585)
==23419==    by 0x123A4530: client3_3_fsetxattr_cbk (client-rpc-fops.c:1979)
==23419==    by 0x51271AE: rpc_clnt_handle_reply (rpc-clnt.c:759)

Comment 7 Joseph Elwin Fernandes 2015-12-02 14:41:03 UTC

After analyzing the statedumps found that the leak is in inode context variable of tier xlator.

This is the setup in short 
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4


Create 5 files in hot tier

This is the statedump output for inode context variable usage 

[cluster/tier.test-tier-dht - usage-type gf_dht_mt_inode_ctx_t memusage]
size=32
num_allocs=1  ----------------------> current allocations 
max_size=96
max_num_allocs=3
total_allocs=3

Demoted the 5 file, and this is the output of statedump

[cluster/tier.test-tier-dht - usage-type gf_dht_mt_inode_ctx_t memusage]
size=192
num_allocs=6  -----------------------------> current allocations
max_size=224
max_num_allocs=7
total_allocs=28

Observe its bumped up by 5 


Promoted the files again 

[cluster/tier.test-tier-dht - usage-type gf_dht_mt_inode_ctx_t memusage]
size=352
num_allocs=11  -----------------------------> current allocations
max_size=384
max_num_allocs=12
total_allocs=53


Observe its bumped up by 5 again.

Inference: The Leak is in the tier xlator of the tier migration process and the leak is of inode ctx variables.

Solution:
Had a discussion with Nithya and she suggested to reset inode context after the migration of the file in the tier-migration.

Comment 8 Joseph Elwin Fernandes 2015-12-03 10:23:32 UTC

Found another mem-leak. This happens because a response dict from parent inode lookup is not freed.

Comment 9 Joseph Elwin Fernandes 2015-12-03 10:28:40 UTC

As far as inode_ctx variable I see allocation through this path


(gdb) bt
#0  dht_inode_ctx_layout_set (inode=0x7f9c4000136c, this=0x7f9c7001b630, layout_int=0x7f9c400037b0)
    at dht-helper.c:1359
#1  0x00007f9c75e7d326 in dht_layout_set (this=0x7f9c7001b630, inode=0x7f9c4000136c, layout=0x7f9c400037b0)
    at dht-layout.c:104
#2  0x00007f9c75ea4882 in dht_discover_complete (this=0x7f9c7001b630, discover_frame=0x7f9c40003fdc)
    at dht-common.c:297
#3  0x00007f9c75ea58d0 in dht_discover_cbk (frame=0x7f9c40003fdc, cookie=0x7f9c4000a68c, this=0x7f9c7001b630, 
    op_ret=0, op_errno=117, inode=0x7f9c4000136c, stbuf=0x7f9c4000f724, xattr=0x7f9c700016dc, 
    postparent=0x7f9c4000f954) at dht-common.c:507
#4  0x00007f9c7613386d in dht_discover_complete (this=0x7f9c7001a7a0, discover_frame=0x7f9c4000a78c)
    at dht-common.c:366
#5  0x00007f9c76134190 in dht_discover_cbk (frame=0x7f9c4000a78c, cookie=0x7f9c4000ab7c, this=0x7f9c7001a7a0, 
    op_ret=0, op_errno=117, inode=0x7f9c4000136c, stbuf=0x7f9c40013ac0, xattr=0x7f9c700639bc, 
    postparent=0x7f9c40013b30) at dht-common.c:507
#6  0x00007f9c763dcc98 in afr_discover_done (frame=0x7f9c4000ab7c, this=0x7f9c70019920) at afr-common.c:2162
#7  0x00007f9c763dcfdd in afr_discover_cbk (frame=0x7f9c4000ab7c, cookie=0x1, this=0x7f9c70019920, op_ret=0, 
    op_errno=22, inode=0x7f9c4000136c, buf=0x7f9c67ffe970, xdata=0x7f9c6003cc3c, postparent=0x7f9c67ffe900)
    at afr-common.c:2207
#8  0x00007f9c766222e6 in client3_3_lookup_cbk (req=0x7f9c40015a9c, iov=0x7f9c40015adc, count=1, 
    myframe=0x7f9c400149bc) at client-rpc-fops.c:2976
#9  0x00007f9c834aa1af in rpc_clnt_handle_reply (clnt=0x7f9c70056990, pollin=0x7f9c60006cc0) at rpc-clnt.c:759
#10 0x00007f9c834aa625 in rpc_clnt_notify (trans=0x7f9c70056e20, mydata=0x7f9c700569c0, 
    event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f9c60006cc0) at rpc-clnt.c:900
#11 0x00007f9c834a6a1b in rpc_transport_notify (this=0x7f9c70056e20, event=RPC_TRANSPORT_MSG_RECEIVED, 
    data=0x7f9c60006cc0) at rpc-transport.c:541
#12 0x00007f9c78afbb3c in socket_event_poll_in (this=0x7f9c70056e20) at socket.c:2231
#13 0x00007f9c78afc06a in socket_event_handler (fd=12, idx=9, data=0x7f9c70056e20, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2344
#14 0x00007f9c8373e79c in event_dispatch_epoll_handler (event_pool=0x1329d70, event=0x7f9c67ffeea0)
    at event-epoll.c:571
---Type <return> to continue, or q <return> to quit---

Comment 10 Joseph Elwin Fernandes 2015-12-03 10:48:08 UTC

Fix for response dict is posted upstream

http://review.gluster.org/12867

Comment 11 Joseph Elwin Fernandes 2015-12-03 11:56:15 UTC

Tested the new fix with 30000 files demoted and promoted. No effect on the memory stays steady 0.5% tp 0.6% percent before and after the migration.

Comment 12 Joseph Elwin Fernandes 2015-12-07 08:06:18 UTC

https://code.engineering.redhat.com/gerrit/#/c/63125/

Comment 13 RajeshReddy 2015-12-14 06:30:50 UTC

Created 50k files and did continuous IO to the files for two days and observed Memory and CPU consumption and  there is no spike in memory so marking this bug as verified

Comment 15 errata-xmlrpc 2016-03-01 06:03:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.