Bug 829104 - Double-free corruption in glusterd
Summary: Double-free corruption in glusterd
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
Assignee: Jeff Darcy
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-06 01:59 UTC by Jeff Darcy
Modified: 2013-07-24 17:51 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:51:23 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jeff Darcy 2012-06-06 01:59:09 UTC
After updating to the HEAD on master, I started seeing problems first with mounts and then with operations involving glusterd generally.  Sometimes operations would seem to succeed, at least partially, but take a *very* long time.  More often, glusterd would terminate with complaints about malloc detecting a double free something like the following (just an example - this code is not actually the culprit):

(gdb) bt
#0  0x0000003a00232885 in raise () from /lib64/libc.so.6
#1  0x0000003a00234065 in abort () from /lib64/libc.so.6
#2  0x0000003a0026f7a7 in __libc_message ()
   from /lib64/libc.so.6
#3  0x0000003a002750c6 in malloc_printerr ()
   from /lib64/libc.so.6
#4  0x0000003a002ccf68 in freeaddrinfo () from /lib64/libc.so.6
#5  0x00007ffff7d8bb1c in gf_resolve_ip6 (
    hostname=0x6672a0 "gfs2", port=24007, family=2, 
    dnscache=0x667818, addr_info=0x7ffff3f1ac30)
    at common-utils.c:155
#6  0x00007ffff42c66f9 in af_inet_client_get_remote_sockaddr (
    this=0x6677a0, sockaddr=0x7ffff3f1ad00, 
    sockaddr_len=0x7ffff3f1ad84) at name.c:239
#7  0x00007ffff42c726d in socket_client_get_remote_sockaddr (
    this=0x6677a0, sockaddr=0x7ffff3f1ad00, 
    sockaddr_len=0x7ffff3f1ad84, sa_family=0x7ffff3f1ad82)
    at name.c:497
#8  0x00007ffff42c37eb in socket_connect (this=0x6677a0, port=0)
    at socket.c:2064
#9  0x00007ffff7b50ae7 in rpc_transport_connect (this=0x6677a0, 
    port=0) at rpc-transport.c:389
#10 0x00007ffff7b53f49 in rpc_clnt_reconnect (
    trans_ptr=0x6677a0) at rpc-clnt.c:430
#11 0x00007ffff7d90790 in gf_timer_proc (ctx=0x634010)
    at timer.c:168
#12 0x0000003a006077f1 in start_thread ()
   from /lib64/libpthread.so.0
#13 0x0000003a002e570d in clone () from /lib64/libc.so.6

By doing a "manual bisect" first of the three commits that had happened since my last refresh and then of files within the offending commit, I tracked the problem down to an invalid GF_FREE in rpc_transport_load, which was corrupting memory.  A patch will be forthcoming as soon as I get the bug ID.

Comment 1 Amar Tumballi 2012-06-06 05:58:35 UTC
patch sent (by jdarcy) and merged (by avati). http://review.gluster.com/3528


Note You need to log in before you can comment on or make changes to this bug.