Bug 768330 - Memory leakage in brick process.[Release-3.3.qa15
Summary: Memory leakage in brick process.[Release-3.3.qa15
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Raghavendra Bhat
QA Contact: Vijaykumar Koppad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-16 11:21 UTC by Vijaykumar Koppad
Modified: 2014-08-25 00:49 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0qa4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-12-04 10:17:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
valgrind log for brick process (199.16 KB, text/x-log)
2011-12-16 13:16 UTC, Vijaykumar Koppad
no flags Details
valgrind log for brick process of glusterfs-3.2.6qa1 (184.58 KB, text/x-log)
2012-01-12 12:17 UTC, Vijaykumar Koppad
no flags Details
valgrind log of mount-point for glusterfs-3.2.6qa1 (204.65 KB, text/x-log)
2012-01-12 12:24 UTC, Vijaykumar Koppad
no flags Details
Valgrinf logs of the brick process of the master. (309.31 KB, text/x-log)
2012-04-23 12:44 UTC, Vijaykumar Koppad
no flags Details
Valgrinf logs of the second brick process of the master. (304.84 KB, text/x-log)
2012-04-23 12:45 UTC, Vijaykumar Koppad
no flags Details
Definitely lost valgrind logs from all the bricks. (157.40 KB, text/x-log)
2012-05-23 09:46 UTC, Vijaykumar Koppad
no flags Details

Description Vijaykumar Koppad 2011-12-16 11:21:43 UTC
Description of problem: I got some of the bytes which are definitely when i ran brick process of a distribute-replicate volume which is mounted on fuse.


Version-Release number of selected component (if applicable):3.3qa15


Steps to Reproduce:
1. I have distribute-replicate volume and ran all the brick processes with valgrind. On the mount point i untared linux-kernel tar ball. 
  

Additional info:

These are logs i found in valgrind logs of brick process.


==32510== 11 bytes in 1 blocks are definitely lost in loss record 20 of 278
==32510==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==32510==    by 0x4C52071: __gf_calloc (mem-pool.h:84)
==32510==    by 0x4C4134A: __inode_link (mem-pool.h:130)
==32510==    by 0x4C419E9: inode_link (inode.c:814)
==32510==    by 0x99DF961: server_symlink_cbk (server3_1-fops.c:994)
==32510==    by 0x97BC373: io_stats_symlink_cbk (io-stats.c:1481)
==32510==    by 0x9596ABF: marker_symlink_cbk (marker.c:1610)
==32510==    by 0x9380529: iot_symlink_cbk (io-threads.c:687)
==32510==    by 0x4C3A4E9: default_symlink_cbk (defaults.c:147)
==32510==    by 0x8F5D342: posix_acl_symlink_cbk (posix-acl.c:1197)
==32510==    by 0x8D50FF7: posix_symlink (posix.c:1324)
==32510==    by 0x8F5F62D: posix_acl_symlink (posix-acl.c:1212)

LEAK SUMMARY:
==32510==    definitely lost: 149,635 bytes in 1,681 blocks
==32510==    indirectly lost: 302,122 bytes in 3,841 blocks
==32510==      possibly lost: 47,860 bytes in 405 blocks
==32510==    still reachable: 14,367,091 bytes in 4,973 blocks
==32510==         suppressed: 0 bytes in 0 blocks
==32510==
==32510== For counts of detected and suppressed errors, rerun with: -v
==32510== ERROR SUMMARY: 33 errors from 33 contexts (suppressed: 30 from 9)

Comment 1 Vijaykumar Koppad 2011-12-16 13:16:22 UTC
Created attachment 547798 [details]
valgrind log for brick process

Attaching valgrind log

Comment 2 Amar Tumballi 2011-12-19 06:33:33 UTC
Have a suspicion that this may be result of getting the valgrind log before actually doing a cleanup. Can you make sure to remove every file from the mount point and then try to see the result?

(also with a 'echo 3 > /proc/sys/vm/drop_cache') on machine..

Comment 3 Vijaykumar Koppad 2012-01-12 12:17:33 UTC
Created attachment 552393 [details]
valgrind log for brick process of glusterfs-3.2.6qa1

Comment 4 Vijaykumar Koppad 2012-01-12 12:21:57 UTC
These logs i get both in mount logs and brick logs.

I got the similar logs even in glusterfs-3.2.6qa1.

these are the valgrind logs of mount point:
###############################################################


 57 bytes in 1 blocks are definitely lost in loss record 28 of 220
==23005==    at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==23005==    by 0x4C5C3FF: __gf_malloc (mem-pool.c:167)
==23005==    by 0x62E3445: init (fuse-bridge.c:3643)
==23005==    by 0x4C2AB80: __xlator_init (xlator.c:1418)
==23005==    by 0x4C2ACAA: xlator_init (xlator.c:1441)
==23005==    by 0x403FE0: create_fuse_mount (glusterfsd.c:329)
==23005==    by 0x406FC5: main (glusterfsd.c:1497)

512,173 (230,912 direct, 281,261 indirect) bytes in 656 blocks are definitely lost in loss record 209 of 220
==23005==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==23005==    by 0x4C5C312: __gf_calloc (mem-pool.c:142)
==23005==    by 0x4C43A72: __inode_create (inode.c:544)
==23005==    by 0x4C43B86: inode_new (inode.c:576)
==23005==    by 0x62D933E: fuse_create_resume (fuse-bridge.c:1601)
==23005==    by 0x62CF961: fuse_resolve_and_resume (fuse-resolve.c:763)
==23005==    by 0x62D9A4D: fuse_create (fuse-bridge.c:1658)
==23005==    by 0x62E2224: fuse_thread_proc (fuse-bridge.c:3223)
==23005==    by 0x3BF80077E0: start_thread (in /lib64/libpthread-2.12.so)
==23005==    by 0xB6A76FF: ???


LEAK SUMMARY:
==23005==    definitely lost: 280,732 bytes in 1,423 blocks
==23005==    indirectly lost: 302,650 bytes in 1,970 blocks
==23005==      possibly lost: 33,948,692 bytes in 6,267 blocks
==23005==    still reachable: 38,287 bytes in 68 blocks
==23005==         suppressed: 0 bytes in 0 blocks

Comment 5 Vijaykumar Koppad 2012-01-12 12:24:12 UTC
Created attachment 552398 [details]
valgrind log of mount-point  for glusterfs-3.2.6qa1

Comment 6 Vijaykumar Koppad 2012-01-12 12:24:27 UTC
Attaching logs.

Comment 7 Amar Tumballi 2012-02-22 02:23:53 UTC
Vijay, Can you confirm this behavior exists with qa23

Comment 8 Amar Tumballi 2012-03-27 13:08:57 UTC
VijayKumar,

Can you please confirm the behavior with latest build?

Comment 9 Vijaykumar Koppad 2012-04-23 12:43:54 UTC
I have tested with the new build , ie 3.3.0qa37. 
I got some leaks, i am attaching those logs.

Comment 10 Vijaykumar Koppad 2012-04-23 12:44:35 UTC
Created attachment 579522 [details]
Valgrinf logs of the brick process of the master.

Comment 11 Vijaykumar Koppad 2012-04-23 12:45:44 UTC
Created attachment 579524 [details]
Valgrinf logs of the second brick process of the master.

Comment 12 Anand Avati 2012-05-03 20:04:08 UTC
CHANGE: http://review.gluster.com/3244 (protocol: fix memory leak of lk-owner buffer in *lk() calls) merged in master by Anand Avati (avati)

Comment 13 Amar Tumballi 2012-05-04 06:59:24 UTC
Moving it to ON_QA considering multiple fixes to handle brick side leaks. Only pending thing is the leaks reported by posix-acl, which is not going into 3.3.0, may be 3.3.1 or so, as it needs more work on RCA.

VijayKumar, please verify the behavior with latest master.

Comment 14 Vijaykumar Koppad 2012-05-23 09:46:41 UTC
Created attachment 586283 [details]
Definitely lost valgrind logs from all the bricks.

Comment 15 Vijaykumar Koppad 2012-05-23 09:48:53 UTC
I still see some definitely lost valgrind logs. It would be good if all the definitely lost logs are removed from the valgrind logs.

Comment 16 Amar Tumballi 2012-05-23 09:57:05 UTC
I went through the definitely lost logs, and none of it is in common path. Hence removing it from the 3.3.0beta blocker list. The remaining ones need posix_acl fixes.

Comment 17 Raghavendra Bhat 2012-12-04 10:17:39 UTC
In our recent longevity runs we didn't hit any such leaks while running for more than 2weeks.

marking fixed in version as 3.4.0qa4 as thats the latest master release on which we run some tests. (longevity started few commits earlier).


Note You need to log in before you can comment on or make changes to this bug.