Bug 762744 (GLUSTER-1012)

Summary: Memory leak in server_connection_cleanup
Product: [Community] GlusterFS Reporter: zls <zls0424>
Component: protocolAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 3.0.4CC: gluster-bugs, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
a possible patch none

Description zls 2010-06-19 05:31:11 UTC
valgrind reported:

==17821== 129,024 bytes in 63 blocks are indirectly lost in loss record 74 of 77
==17821==    at 0x4C277CC: calloc (vg_replace_malloc.c:467)
==17821==    by 0x4E5A2D1: __gf_fd_fdtable_get_all_fds (fd.c:153)
==17821==    by 0x4E5A9AC: gf_fd_fdtable_get_all_fds (fd.c:168)
==17821==    by 0x6A60160: server_connection_cleanup (server-helpers.c:670)
==17821==    by 0x6A4D839: notify (server-protocol.c:6762)
==17821==    by 0x4E41DC2: xlator_notify (xlator.c:923)
==17821==    by 0x746C17E: socket_event_poll_err (socket.c:435)
==17821==    by 0x746E0E7: socket_event_handler (socket.c:833)
==17821==    by 0x4E5C31C: event_dispatch_epoll (event.c:804)
==17821==    by 0x4044F1: main (glusterfsd.c:1413)
==17821==
==17821== 129,024 bytes in 63 blocks are definitely lost in loss record 75 of 77
==17821==    at 0x4C277CC: calloc (vg_replace_malloc.c:467)
==17821==    by 0x4E5AA28: gf_fd_fdtable_expand (fd.c:102)
==17821==    by 0x4E5ADD4: gf_fd_fdtable_alloc (fd.c:136)
==17821==    by 0x6A5E4DA: server_connection_get (server-helpers.c:874)
==17821==    by 0x6A565D2: mop_setvolume (server-protocol.c:5701)
==17821==    by 0x6A4D769: protocol_server_pollin (server-protocol.c:6687)
==17821==    by 0x6A4D7F2: notify (server-protocol.c:6743)
==17821==    by 0x4E41DC2: xlator_notify (xlator.c:923)
==17821==    by 0x746E099: socket_event_handler (socket.c:829)
==17821==    by 0x4E5C31C: event_dispatch_epoll (event.c:804)

==17821==    by 0x4044F1: main (glusterfsd.c:1413)
==17821==
==17821== 133,056 (4,032 direct, 129,024 indirect) bytes in 63 blocks are
definitely lost in loss record 76 of 77
==17821==    at 0x4C277CC: calloc (vg_replace_malloc.c:467)
==17821==    by 0x4E5ADAC: gf_fd_fdtable_alloc (fd.c:128)
==17821==    by 0x6A5E4DA: server_connection_get (server-helpers.c:874)
==17821==    by 0x6A565D2: mop_setvolume (server-protocol.c:5701)
==17821==    by 0x6A4D769: protocol_server_pollin (server-protocol.c:6687)
==17821==    by 0x6A4D7F2: notify (server-protocol.c:6743)
==17821==    by 0x4E41DC2: xlator_notify (xlator.c:923)
==17821==    by 0x746E099: socket_event_handler (socket.c:829)
==17821==    by 0x4E5C31C: event_dispatch_epoll (event.c:804)
==17821==    by 0x4044F1: main (glusterfsd.c:1413)

Comment 1 zls 2010-06-19 07:10:02 UTC
We have a volume that only opens to some particular ip addresses specified by option "option auth.addr.brick00.allow 10.10.10.10 10.10.10.11". But we have many other clients that are trying for the volume. Then we see the glusterfsd process eating up to 7G memory, and have to kill it.

It's easy to reproduce the problem. Just setup the option and try connecting to it from clients that are not allowed, the more the better.

/*
In function server_connection_cleanup, some memories are only freed if conn->bound_xl is assigned. But in mop_setvolume, this can only happen when gf_authenticate returns AUTH_ACCEPT. So for clients rejected, it prints "Cannot authenticate client from %s" and leaves conn->bound_xl NULL. Then in server_connection_cleanup, do_connection_cleanup will not be called. fdentries has no chance to be freed.
 */

Comment 2 zls 2010-06-28 10:37:54 UTC
also in server_connection_destroy, ltable and fdentries are not freed if bound_xl is NULL :)

Comment 3 zls 2010-07-01 07:53:54 UTC
Created attachment 246 [details]
patch to fix typo and add s390 architecture

Comment 4 Anand Avati 2010-07-14 02:12:08 UTC
PATCH: http://patches.gluster.com/patch/3573 in release-3.0 (protocol/server: Fix memory leak when server authentication fails.)