Bug 768330

Summary: Memory leakage in brick process.[Release-3.3.qa15
Product: [Community] GlusterFS Reporter: Vijaykumar Koppad <vkoppad>
Component: coreAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED WORKSFORME QA Contact: Vijaykumar Koppad <vkoppad>
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: bbandari, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0qa4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-04 10:17:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
valgrind log for brick process
none
valgrind log for brick process of glusterfs-3.2.6qa1
none
valgrind log of mount-point for glusterfs-3.2.6qa1
none
Valgrinf logs of the brick process of the master.
none
Valgrinf logs of the second brick process of the master.
none
Definitely lost valgrind logs from all the bricks. none

Description Vijaykumar Koppad 2011-12-16 11:21:43 UTC
Description of problem: I got some of the bytes which are definitely when i ran brick process of a distribute-replicate volume which is mounted on fuse.


Version-Release number of selected component (if applicable):3.3qa15


Steps to Reproduce:
1. I have distribute-replicate volume and ran all the brick processes with valgrind. On the mount point i untared linux-kernel tar ball. 
  

Additional info:

These are logs i found in valgrind logs of brick process.


==32510== 11 bytes in 1 blocks are definitely lost in loss record 20 of 278
==32510==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==32510==    by 0x4C52071: __gf_calloc (mem-pool.h:84)
==32510==    by 0x4C4134A: __inode_link (mem-pool.h:130)
==32510==    by 0x4C419E9: inode_link (inode.c:814)
==32510==    by 0x99DF961: server_symlink_cbk (server3_1-fops.c:994)
==32510==    by 0x97BC373: io_stats_symlink_cbk (io-stats.c:1481)
==32510==    by 0x9596ABF: marker_symlink_cbk (marker.c:1610)
==32510==    by 0x9380529: iot_symlink_cbk (io-threads.c:687)
==32510==    by 0x4C3A4E9: default_symlink_cbk (defaults.c:147)
==32510==    by 0x8F5D342: posix_acl_symlink_cbk (posix-acl.c:1197)
==32510==    by 0x8D50FF7: posix_symlink (posix.c:1324)
==32510==    by 0x8F5F62D: posix_acl_symlink (posix-acl.c:1212)

LEAK SUMMARY:
==32510==    definitely lost: 149,635 bytes in 1,681 blocks
==32510==    indirectly lost: 302,122 bytes in 3,841 blocks
==32510==      possibly lost: 47,860 bytes in 405 blocks
==32510==    still reachable: 14,367,091 bytes in 4,973 blocks
==32510==         suppressed: 0 bytes in 0 blocks
==32510==
==32510== For counts of detected and suppressed errors, rerun with: -v
==32510== ERROR SUMMARY: 33 errors from 33 contexts (suppressed: 30 from 9)

Comment 1 Vijaykumar Koppad 2011-12-16 13:16:22 UTC
Created attachment 547798 [details]
valgrind log for brick process

Attaching valgrind log

Comment 2 Amar Tumballi 2011-12-19 06:33:33 UTC
Have a suspicion that this may be result of getting the valgrind log before actually doing a cleanup. Can you make sure to remove every file from the mount point and then try to see the result?

(also with a 'echo 3 > /proc/sys/vm/drop_cache') on machine..

Comment 3 Vijaykumar Koppad 2012-01-12 12:17:33 UTC
Created attachment 552393 [details]
valgrind log for brick process of glusterfs-3.2.6qa1

Comment 4 Vijaykumar Koppad 2012-01-12 12:21:57 UTC
These logs i get both in mount logs and brick logs.

I got the similar logs even in glusterfs-3.2.6qa1.

these are the valgrind logs of mount point:
###############################################################


 57 bytes in 1 blocks are definitely lost in loss record 28 of 220
==23005==    at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==23005==    by 0x4C5C3FF: __gf_malloc (mem-pool.c:167)
==23005==    by 0x62E3445: init (fuse-bridge.c:3643)
==23005==    by 0x4C2AB80: __xlator_init (xlator.c:1418)
==23005==    by 0x4C2ACAA: xlator_init (xlator.c:1441)
==23005==    by 0x403FE0: create_fuse_mount (glusterfsd.c:329)
==23005==    by 0x406FC5: main (glusterfsd.c:1497)

512,173 (230,912 direct, 281,261 indirect) bytes in 656 blocks are definitely lost in loss record 209 of 220
==23005==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==23005==    by 0x4C5C312: __gf_calloc (mem-pool.c:142)
==23005==    by 0x4C43A72: __inode_create (inode.c:544)
==23005==    by 0x4C43B86: inode_new (inode.c:576)
==23005==    by 0x62D933E: fuse_create_resume (fuse-bridge.c:1601)
==23005==    by 0x62CF961: fuse_resolve_and_resume (fuse-resolve.c:763)
==23005==    by 0x62D9A4D: fuse_create (fuse-bridge.c:1658)
==23005==    by 0x62E2224: fuse_thread_proc (fuse-bridge.c:3223)
==23005==    by 0x3BF80077E0: start_thread (in /lib64/libpthread-2.12.so)
==23005==    by 0xB6A76FF: ???


LEAK SUMMARY:
==23005==    definitely lost: 280,732 bytes in 1,423 blocks
==23005==    indirectly lost: 302,650 bytes in 1,970 blocks
==23005==      possibly lost: 33,948,692 bytes in 6,267 blocks
==23005==    still reachable: 38,287 bytes in 68 blocks
==23005==         suppressed: 0 bytes in 0 blocks

Comment 5 Vijaykumar Koppad 2012-01-12 12:24:12 UTC
Created attachment 552398 [details]
valgrind log of mount-point  for glusterfs-3.2.6qa1

Comment 6 Vijaykumar Koppad 2012-01-12 12:24:27 UTC
Attaching logs.

Comment 7 Amar Tumballi 2012-02-22 02:23:53 UTC
Vijay, Can you confirm this behavior exists with qa23

Comment 8 Amar Tumballi 2012-03-27 13:08:57 UTC
VijayKumar,

Can you please confirm the behavior with latest build?

Comment 9 Vijaykumar Koppad 2012-04-23 12:43:54 UTC
I have tested with the new build , ie 3.3.0qa37. 
I got some leaks, i am attaching those logs.

Comment 10 Vijaykumar Koppad 2012-04-23 12:44:35 UTC
Created attachment 579522 [details]
Valgrinf logs of the brick process of the master.

Comment 11 Vijaykumar Koppad 2012-04-23 12:45:44 UTC
Created attachment 579524 [details]
Valgrinf logs of the second brick process of the master.

Comment 12 Anand Avati 2012-05-03 20:04:08 UTC
CHANGE: http://review.gluster.com/3244 (protocol: fix memory leak of lk-owner buffer in *lk() calls) merged in master by Anand Avati (avati)

Comment 13 Amar Tumballi 2012-05-04 06:59:24 UTC
Moving it to ON_QA considering multiple fixes to handle brick side leaks. Only pending thing is the leaks reported by posix-acl, which is not going into 3.3.0, may be 3.3.1 or so, as it needs more work on RCA.

VijayKumar, please verify the behavior with latest master.

Comment 14 Vijaykumar Koppad 2012-05-23 09:46:41 UTC
Created attachment 586283 [details]
Definitely lost valgrind logs from all the bricks.

Comment 15 Vijaykumar Koppad 2012-05-23 09:48:53 UTC
I still see some definitely lost valgrind logs. It would be good if all the definitely lost logs are removed from the valgrind logs.

Comment 16 Amar Tumballi 2012-05-23 09:57:05 UTC
I went through the definitely lost logs, and none of it is in common path. Hence removing it from the 3.3.0beta blocker list. The remaining ones need posix_acl fixes.

Comment 17 Raghavendra Bhat 2012-12-04 10:17:39 UTC
In our recent longevity runs we didn't hit any such leaks while running for more than 2weeks.

marking fixed in version as 3.4.0qa4 as thats the latest master release on which we run some tests. (longevity started few commits earlier).