Bug 1285998 - Possible memory leak in the tiered daemon
Possible memory leak in the tiered daemon
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: RHGS 3.1.2
Assigned To: Joseph Elwin Fernandes
: ZStream
Depends On:
Blocks: 1260783 1288019 1288992
  Show dependency treegraph
Reported: 2015-11-27 02:56 EST by RajeshReddy
Modified: 2016-09-17 11:38 EDT (History)
10 users (show)

See Also:
Fixed In Version: glusterfs-3.7.5-10
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1288019 (view as bug list)
Last Closed: 2016-03-01 01:03:20 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description RajeshReddy 2015-11-27 02:56:41 EST
Description of problem:
Possible memory leak in the tiered daemon 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create 1x2 volume and attach 1x2 hot bricks 
2. Mount it on client on fuse and create a directory and using fallowing script modify files continuously

for i in {1..100}
for j in {1..1000000}
echo $i $j >> /mnt/afr2x2_tier/data3/file$j

3. Tiered daemon keep on increasing the consumed memory and end up consumed 11.2 % of total memory 

Actual results:

Expected results:

Additional info:
[root@rhs-client19 afr2x2_tier]# gluster vol info afr2x2_tier
Volume Name: afr2x2_tier
Type: Tier
Volume ID: e8d8466d-4883-465c-868d-fd4330e6049e
Status: Started
Number of Bricks: 4
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/tier1
Brick2: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/tier1
Cold Tier:
Cold Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick3: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick7/afr2x2_tier
Brick4: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick7/afr2x2_tier
Options Reconfigured:
cluster.tier-mode: test
cluster.metadata-self-heal: off
cluster.data-self-heal: off
cluster.self-heal-daemon: off
features.ctr-enabled: on
performance.readdir-ahead: on
cluster.entry-self-heal: off

Using fallowing script i keep on collected the memory consumed by tiered daemon 

for i in {1..10000}
top -p 32118,32127,32237 >>  /root/scripts/afr2x2_tier/mem.txt
sleep 500
Comment 5 RajeshReddy 2015-12-01 02:25:46 EST
Volume is mounted on client using fuse and collected volume brick process and tier daemon process CPU and Memory usage using top and out put of the same is available @/home/repo/sosreports/bug.1285998 on rhsqe-repo.lab.eng.blr.redhat.com
Comment 6 Joseph Elwin Fernandes 2015-12-01 03:45:56 EST
I see a memory increase for migration of 1000 files 


  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND        
23419 root      20   0  531032 188712  10692 S   0.3  4.7  12:06.24 memcheck-amd64-


  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ 
23419 root      20   0  545576 203796  10692 S  91.7  5.0  17:23.31 memcheck-amd64-  

Plus while the tier migrator was run with valgrind

valgrind --leak-check=full --log-file=/root/tier-valgrind.log /usr/local/sbin/glusterfs -s localhost --volfile-id rebalance/test --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier.tier-dht --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=6174cfd6-8978-45f8-b2eb-6ffebc50fc29 --xlator-option *dht.commit-hash=3001687234 --socket-file /var/run/gluster/gluster-tier-b05bfe8d-939a-4111-b221-ca01e91753bc.sock --pid-file /var/lib/glusterd/vols/test/tier/6174cfd6-8978-45f8-b2eb-6ffebc50fc29.pid -l /var/log/glusterfs/test-tier.log

Saw continues errors in valgrind log "/root/tier-valgrind.log"

==23419== Invalid write of size 8
==23419==    at 0x4E87224: __gf_mem_invalidate (mem-pool.c:277)
==23419==    by 0x4E874EA: __gf_free (mem-pool.c:333)
==23419==    by 0x4E9A966: synctask_destroy (syncop.c:391)
==23419==    by 0x4E9A9EA: synctask_done (syncop.c:409)
==23419==    by 0x4E9B323: synctask_switchto (syncop.c:668)
==23419==    by 0x4E9B3DB: syncenv_processor (syncop.c:699)
==23419==    by 0x608A529: start_thread (in /usr/lib64/libpthread-2.20.so)
==23419==    by 0x678C22C: clone (in /usr/lib64/libc-2.20.so)
==23419==  Address 0x1a5741a9 is 2,072,937 bytes inside a block of size 2,097,224 alloc'd
==23419==    at 0x4C2B9C6: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==23419==    by 0x4E86CAE: __gf_calloc (mem-pool.c:117)
==23419==    by 0x4E9ADE1: synctask_create (syncop.c:497)
==23419==    by 0x4E9B020: synctask_new1 (syncop.c:571)
==23419==    by 0x4E9B090: synctask_new (syncop.c:586)
==23419==    by 0x1284CB5A: dht_rebalance_complete_check (dht-helper.c:1116)
==23419==    by 0x1287FE0B: dht_file_setxattr_cbk (dht-common.c:3455)
==23419==    by 0x125ED3F2: afr_fsetxattr_unwind (afr-inode-write.c:1575)
==23419==    by 0x125E65FA: __afr_inode_write_cbk (afr-inode-write.c:171)
==23419==    by 0x125ED480: afr_fsetxattr_wind_cbk (afr-inode-write.c:1585)
==23419==    by 0x123A4530: client3_3_fsetxattr_cbk (client-rpc-fops.c:1979)
==23419==    by 0x51271AE: rpc_clnt_handle_reply (rpc-clnt.c:759)
Comment 7 Joseph Elwin Fernandes 2015-12-02 09:41:03 EST
After analyzing the statedumps found that the leak is in inode context variable of tier xlator.

This is the setup in short 
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4

Create 5 files in hot tier

This is the statedump output for inode context variable usage 

[cluster/tier.test-tier-dht - usage-type gf_dht_mt_inode_ctx_t memusage]
num_allocs=1  ----------------------> current allocations 

Demoted the 5 file, and this is the output of statedump

[cluster/tier.test-tier-dht - usage-type gf_dht_mt_inode_ctx_t memusage]
num_allocs=6  -----------------------------> current allocations

Observe its bumped up by 5 

Promoted the files again 

[cluster/tier.test-tier-dht - usage-type gf_dht_mt_inode_ctx_t memusage]
num_allocs=11  -----------------------------> current allocations

Observe its bumped up by 5 again.

Inference: The Leak is in the tier xlator of the tier migration process and the leak is of inode ctx variables.

Had a discussion with Nithya and she suggested to reset inode context after the migration of the file in the tier-migration.
Comment 8 Joseph Elwin Fernandes 2015-12-03 05:23:32 EST
Found another mem-leak. This happens because a response dict from parent inode lookup is not freed.
Comment 9 Joseph Elwin Fernandes 2015-12-03 05:28:40 EST
As far as inode_ctx variable I see allocation through this path

(gdb) bt
#0  dht_inode_ctx_layout_set (inode=0x7f9c4000136c, this=0x7f9c7001b630, layout_int=0x7f9c400037b0)
    at dht-helper.c:1359
#1  0x00007f9c75e7d326 in dht_layout_set (this=0x7f9c7001b630, inode=0x7f9c4000136c, layout=0x7f9c400037b0)
    at dht-layout.c:104
#2  0x00007f9c75ea4882 in dht_discover_complete (this=0x7f9c7001b630, discover_frame=0x7f9c40003fdc)
    at dht-common.c:297
#3  0x00007f9c75ea58d0 in dht_discover_cbk (frame=0x7f9c40003fdc, cookie=0x7f9c4000a68c, this=0x7f9c7001b630, 
    op_ret=0, op_errno=117, inode=0x7f9c4000136c, stbuf=0x7f9c4000f724, xattr=0x7f9c700016dc, 
    postparent=0x7f9c4000f954) at dht-common.c:507
#4  0x00007f9c7613386d in dht_discover_complete (this=0x7f9c7001a7a0, discover_frame=0x7f9c4000a78c)
    at dht-common.c:366
#5  0x00007f9c76134190 in dht_discover_cbk (frame=0x7f9c4000a78c, cookie=0x7f9c4000ab7c, this=0x7f9c7001a7a0, 
    op_ret=0, op_errno=117, inode=0x7f9c4000136c, stbuf=0x7f9c40013ac0, xattr=0x7f9c700639bc, 
    postparent=0x7f9c40013b30) at dht-common.c:507
#6  0x00007f9c763dcc98 in afr_discover_done (frame=0x7f9c4000ab7c, this=0x7f9c70019920) at afr-common.c:2162
#7  0x00007f9c763dcfdd in afr_discover_cbk (frame=0x7f9c4000ab7c, cookie=0x1, this=0x7f9c70019920, op_ret=0, 
    op_errno=22, inode=0x7f9c4000136c, buf=0x7f9c67ffe970, xdata=0x7f9c6003cc3c, postparent=0x7f9c67ffe900)
    at afr-common.c:2207
#8  0x00007f9c766222e6 in client3_3_lookup_cbk (req=0x7f9c40015a9c, iov=0x7f9c40015adc, count=1, 
    myframe=0x7f9c400149bc) at client-rpc-fops.c:2976
#9  0x00007f9c834aa1af in rpc_clnt_handle_reply (clnt=0x7f9c70056990, pollin=0x7f9c60006cc0) at rpc-clnt.c:759
#10 0x00007f9c834aa625 in rpc_clnt_notify (trans=0x7f9c70056e20, mydata=0x7f9c700569c0, 
    event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f9c60006cc0) at rpc-clnt.c:900
#11 0x00007f9c834a6a1b in rpc_transport_notify (this=0x7f9c70056e20, event=RPC_TRANSPORT_MSG_RECEIVED, 
    data=0x7f9c60006cc0) at rpc-transport.c:541
#12 0x00007f9c78afbb3c in socket_event_poll_in (this=0x7f9c70056e20) at socket.c:2231
#13 0x00007f9c78afc06a in socket_event_handler (fd=12, idx=9, data=0x7f9c70056e20, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2344
#14 0x00007f9c8373e79c in event_dispatch_epoll_handler (event_pool=0x1329d70, event=0x7f9c67ffeea0)
    at event-epoll.c:571
---Type <return> to continue, or q <return> to quit---
Comment 10 Joseph Elwin Fernandes 2015-12-03 05:48:08 EST
Fix for response dict is posted upstream

Comment 11 Joseph Elwin Fernandes 2015-12-03 06:56:15 EST
Tested the new fix with 30000 files demoted and promoted. No effect on the memory stays steady 0.5% tp 0.6% percent before and after the migration.
Comment 12 Joseph Elwin Fernandes 2015-12-07 03:06:18 EST
Comment 13 RajeshReddy 2015-12-14 01:30:50 EST
Created 50k files and did continuous IO to the files for two days and observed Memory and CPU consumption and  there is no spike in memory so marking this bug as verified
Comment 15 errata-xmlrpc 2016-03-01 01:03:20 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.