Description of problem: ===================== Possible memory leak in the tiered daemon Version-Release number of selected component (if applicable): ================ glusterfs-server-3.7.5-7 How reproducible: Steps to Reproduce: ============ 1. Create 1x2 volume and attach 1x2 hot bricks 2. Mount it on client on fuse and create a directory and using fallowing script modify files continuously for i in {1..100} do for j in {1..1000000} do echo $i $j >> /mnt/afr2x2_tier/data3/file$j done done 3. Tiered daemon keep on increasing the consumed memory and end up consumed 11.2 % of total memory Actual results: Expected results: Additional info: =========== [root@rhs-client19 afr2x2_tier]# gluster vol info afr2x2_tier Volume Name: afr2x2_tier Type: Tier Volume ID: e8d8466d-4883-465c-868d-fd4330e6049e Status: Started Number of Bricks: 4 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick6/tier1 Brick2: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick6/tier1 Cold Tier: Cold Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick3: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick7/afr2x2_tier Brick4: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick7/afr2x2_tier Options Reconfigured: cluster.tier-mode: test cluster.metadata-self-heal: off cluster.data-self-heal: off cluster.self-heal-daemon: off features.ctr-enabled: on performance.readdir-ahead: on cluster.entry-self-heal: off Using fallowing script i keep on collected the memory consumed by tiered daemon for i in {1..10000} do top -p 32118,32127,32237 >> /root/scripts/afr2x2_tier/mem.txt sleep 500 done
Volume is mounted on client using fuse and collected volume brick process and tier daemon process CPU and Memory usage using top and out put of the same is available @/home/repo/sosreports/bug.1285998 on rhsqe-repo.lab.eng.blr.redhat.com
I see a memory increase for migration of 1000 files From PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23419 root 20 0 531032 188712 10692 S 0.3 4.7 12:06.24 memcheck-amd64- to PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ 23419 root 20 0 545576 203796 10692 S 91.7 5.0 17:23.31 memcheck-amd64- Plus while the tier migrator was run with valgrind valgrind --leak-check=full --log-file=/root/tier-valgrind.log /usr/local/sbin/glusterfs -s localhost --volfile-id rebalance/test --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier.tier-dht --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=6174cfd6-8978-45f8-b2eb-6ffebc50fc29 --xlator-option *dht.commit-hash=3001687234 --socket-file /var/run/gluster/gluster-tier-b05bfe8d-939a-4111-b221-ca01e91753bc.sock --pid-file /var/lib/glusterd/vols/test/tier/6174cfd6-8978-45f8-b2eb-6ffebc50fc29.pid -l /var/log/glusterfs/test-tier.log Saw continues errors in valgrind log "/root/tier-valgrind.log" ==23419== Invalid write of size 8 ==23419== at 0x4E87224: __gf_mem_invalidate (mem-pool.c:277) ==23419== by 0x4E874EA: __gf_free (mem-pool.c:333) ==23419== by 0x4E9A966: synctask_destroy (syncop.c:391) ==23419== by 0x4E9A9EA: synctask_done (syncop.c:409) ==23419== by 0x4E9B323: synctask_switchto (syncop.c:668) ==23419== by 0x4E9B3DB: syncenv_processor (syncop.c:699) ==23419== by 0x608A529: start_thread (in /usr/lib64/libpthread-2.20.so) ==23419== by 0x678C22C: clone (in /usr/lib64/libc-2.20.so) ==23419== Address 0x1a5741a9 is 2,072,937 bytes inside a block of size 2,097,224 alloc'd ==23419== at 0x4C2B9C6: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==23419== by 0x4E86CAE: __gf_calloc (mem-pool.c:117) ==23419== by 0x4E9ADE1: synctask_create (syncop.c:497) ==23419== by 0x4E9B020: synctask_new1 (syncop.c:571) ==23419== by 0x4E9B090: synctask_new (syncop.c:586) ==23419== by 0x1284CB5A: dht_rebalance_complete_check (dht-helper.c:1116) ==23419== by 0x1287FE0B: dht_file_setxattr_cbk (dht-common.c:3455) ==23419== by 0x125ED3F2: afr_fsetxattr_unwind (afr-inode-write.c:1575) ==23419== by 0x125E65FA: __afr_inode_write_cbk (afr-inode-write.c:171) ==23419== by 0x125ED480: afr_fsetxattr_wind_cbk (afr-inode-write.c:1585) ==23419== by 0x123A4530: client3_3_fsetxattr_cbk (client-rpc-fops.c:1979) ==23419== by 0x51271AE: rpc_clnt_handle_reply (rpc-clnt.c:759)
After analyzing the statedumps found that the leak is in inode context variable of tier xlator. This is the setup in short Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Create 5 files in hot tier This is the statedump output for inode context variable usage [cluster/tier.test-tier-dht - usage-type gf_dht_mt_inode_ctx_t memusage] size=32 num_allocs=1 ----------------------> current allocations max_size=96 max_num_allocs=3 total_allocs=3 Demoted the 5 file, and this is the output of statedump [cluster/tier.test-tier-dht - usage-type gf_dht_mt_inode_ctx_t memusage] size=192 num_allocs=6 -----------------------------> current allocations max_size=224 max_num_allocs=7 total_allocs=28 Observe its bumped up by 5 Promoted the files again [cluster/tier.test-tier-dht - usage-type gf_dht_mt_inode_ctx_t memusage] size=352 num_allocs=11 -----------------------------> current allocations max_size=384 max_num_allocs=12 total_allocs=53 Observe its bumped up by 5 again. Inference: The Leak is in the tier xlator of the tier migration process and the leak is of inode ctx variables. Solution: Had a discussion with Nithya and she suggested to reset inode context after the migration of the file in the tier-migration.
Found another mem-leak. This happens because a response dict from parent inode lookup is not freed.
As far as inode_ctx variable I see allocation through this path (gdb) bt #0 dht_inode_ctx_layout_set (inode=0x7f9c4000136c, this=0x7f9c7001b630, layout_int=0x7f9c400037b0) at dht-helper.c:1359 #1 0x00007f9c75e7d326 in dht_layout_set (this=0x7f9c7001b630, inode=0x7f9c4000136c, layout=0x7f9c400037b0) at dht-layout.c:104 #2 0x00007f9c75ea4882 in dht_discover_complete (this=0x7f9c7001b630, discover_frame=0x7f9c40003fdc) at dht-common.c:297 #3 0x00007f9c75ea58d0 in dht_discover_cbk (frame=0x7f9c40003fdc, cookie=0x7f9c4000a68c, this=0x7f9c7001b630, op_ret=0, op_errno=117, inode=0x7f9c4000136c, stbuf=0x7f9c4000f724, xattr=0x7f9c700016dc, postparent=0x7f9c4000f954) at dht-common.c:507 #4 0x00007f9c7613386d in dht_discover_complete (this=0x7f9c7001a7a0, discover_frame=0x7f9c4000a78c) at dht-common.c:366 #5 0x00007f9c76134190 in dht_discover_cbk (frame=0x7f9c4000a78c, cookie=0x7f9c4000ab7c, this=0x7f9c7001a7a0, op_ret=0, op_errno=117, inode=0x7f9c4000136c, stbuf=0x7f9c40013ac0, xattr=0x7f9c700639bc, postparent=0x7f9c40013b30) at dht-common.c:507 #6 0x00007f9c763dcc98 in afr_discover_done (frame=0x7f9c4000ab7c, this=0x7f9c70019920) at afr-common.c:2162 #7 0x00007f9c763dcfdd in afr_discover_cbk (frame=0x7f9c4000ab7c, cookie=0x1, this=0x7f9c70019920, op_ret=0, op_errno=22, inode=0x7f9c4000136c, buf=0x7f9c67ffe970, xdata=0x7f9c6003cc3c, postparent=0x7f9c67ffe900) at afr-common.c:2207 #8 0x00007f9c766222e6 in client3_3_lookup_cbk (req=0x7f9c40015a9c, iov=0x7f9c40015adc, count=1, myframe=0x7f9c400149bc) at client-rpc-fops.c:2976 #9 0x00007f9c834aa1af in rpc_clnt_handle_reply (clnt=0x7f9c70056990, pollin=0x7f9c60006cc0) at rpc-clnt.c:759 #10 0x00007f9c834aa625 in rpc_clnt_notify (trans=0x7f9c70056e20, mydata=0x7f9c700569c0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f9c60006cc0) at rpc-clnt.c:900 #11 0x00007f9c834a6a1b in rpc_transport_notify (this=0x7f9c70056e20, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f9c60006cc0) at rpc-transport.c:541 #12 0x00007f9c78afbb3c in socket_event_poll_in (this=0x7f9c70056e20) at socket.c:2231 #13 0x00007f9c78afc06a in socket_event_handler (fd=12, idx=9, data=0x7f9c70056e20, poll_in=1, poll_out=0, poll_err=0) at socket.c:2344 #14 0x00007f9c8373e79c in event_dispatch_epoll_handler (event_pool=0x1329d70, event=0x7f9c67ffeea0) at event-epoll.c:571 ---Type <return> to continue, or q <return> to quit---
Fix for response dict is posted upstream http://review.gluster.org/12867
Tested the new fix with 30000 files demoted and promoted. No effect on the memory stays steady 0.5% tp 0.6% percent before and after the migration.
https://code.engineering.redhat.com/gerrit/#/c/63125/
Created 50k files and did continuous IO to the files for two days and observed Memory and CPU consumption and there is no spike in memory so marking this bug as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html