Bug 1470329

Summary: glusterfs process leaking memory when error occurs
Product: [Community] GlusterFS Reporter: Danny Couture <couture.danny>
Component: fuseAssignee: bugs <bugs>
Status: CLOSED DUPLICATE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.10CC: bugs, ndevos
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1470220 Environment:
Last Closed: 2017-07-14 09:47:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Danny Couture 2017-07-12 18:12:51 UTC
+++ This bug was initially created as a clone of Bug #1470220 +++

Description of problem:

When an error occurs on nodes, it is possible that an error unwinding function is called instead of the fuse_release function. In this case, the current state of the code will leak a 88 bytes ctx structure.

Version-Release number of selected component (if applicable):
3.10.3

How reproducible:
100%

Steps to Reproduce:
1.Setup a 3 nodes replicaset (2 nodes might be enough)
2.Write the same file over and over again on one node
3.Try to read that same file over and over again on a second node.

Actual results:
==19736== Thread 1:
==19736== 88 bytes in 1 blocks are definitely lost in loss record 257 of 591
==19736==    at 0x4C277BB: calloc (vg_replace_malloc.c:593)
==19736==    by 0x4E90C31: __gf_calloc (mem-pool.c:117)
==19736==    by 0xD56373D: __fuse_fd_ctx_check_n_create (fuse-bridge.c:79)
==19736==    by 0xD56381E: fuse_fd_ctx_check_n_create (fuse-bridge.c:108)
==19736==    by 0xD56F473: fuse_open_resume (fuse-bridge.c:2148)
==19736==    by 0xD564CE9: fuse_fop_resume (fuse-bridge.c:556)
==19736==    by 0xD562803: fuse_resolve_done (fuse-resolve.c:663)
==19736==    by 0xD5628D9: fuse_resolve_all (fuse-resolve.c:690)
==19736==    by 0xD5627E4: fuse_resolve (fuse-resolve.c:654)
==19736==    by 0xD5628B0: fuse_resolve_all (fuse-resolve.c:686)
==19736==    by 0xD562937: fuse_resolve_continue (fuse-resolve.c:706)
==19736==    by 0xD561CDE: fuse_resolve_inode (fuse-resolve.c:364)
==19736==    by 0xD5627D6: fuse_resolve (fuse-resolve.c:651)
==19736==    by 0xD56285B: fuse_resolve_all (fuse-resolve.c:679)
==19736==    by 0xD562975: fuse_resolve_and_resume (fuse-resolve.c:718)
==19736==    by 0xD56FAF8: fuse_open (fuse-bridge.c:2185)
==19736==    by 0xD57DA2A: fuse_thread_proc (fuse-bridge.c:5068)
==19736==    by 0x5C32AA0: start_thread (pthread_create.c:301)
==19736==    by 0x1633C6FF: ???

Expected results:
no leaks

Additional info:
on our production environment, this happens often enough that we must restart the gluster process every 2-3 months to avoid OOM

--- Additional comment from Worker Ant on 2017-07-12 12:07:23 EDT ---

REVIEW: https://review.gluster.org/17759 (memory leak fixes) posted (#1) for review on master by Danny Couture (couture.danny)

--- Additional comment from Danny Couture on 2017-07-12 13:07:23 EDT ---

I just confirmed the bug for mainline @ a4a417e29c5b2d63e6bf5efae4f0ccf30a39647f

Comment 1 Niels de Vos 2017-07-14 09:47:33 UTC
Sorry, I missed that this bug was filed earlier. I've created an other one (bz1471028) for 3.10 and updated the patch to reference that.

*** This bug has been marked as a duplicate of bug 1471028 ***