Bug 829104 - Double-free corruption in glusterd
Double-free corruption in glusterd
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: rpc (Show other bugs)
mainline
Unspecified Unspecified
high Severity urgent
: ---
: ---
Assigned To: Jeff Darcy
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-05 21:59 EDT by Jeff Darcy
Modified: 2013-07-24 13:51 EDT (History)
1 user (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:51:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jeff Darcy 2012-06-05 21:59:09 EDT
After updating to the HEAD on master, I started seeing problems first with mounts and then with operations involving glusterd generally.  Sometimes operations would seem to succeed, at least partially, but take a *very* long time.  More often, glusterd would terminate with complaints about malloc detecting a double free something like the following (just an example - this code is not actually the culprit):

(gdb) bt
#0  0x0000003a00232885 in raise () from /lib64/libc.so.6
#1  0x0000003a00234065 in abort () from /lib64/libc.so.6
#2  0x0000003a0026f7a7 in __libc_message ()
   from /lib64/libc.so.6
#3  0x0000003a002750c6 in malloc_printerr ()
   from /lib64/libc.so.6
#4  0x0000003a002ccf68 in freeaddrinfo () from /lib64/libc.so.6
#5  0x00007ffff7d8bb1c in gf_resolve_ip6 (
    hostname=0x6672a0 "gfs2", port=24007, family=2, 
    dnscache=0x667818, addr_info=0x7ffff3f1ac30)
    at common-utils.c:155
#6  0x00007ffff42c66f9 in af_inet_client_get_remote_sockaddr (
    this=0x6677a0, sockaddr=0x7ffff3f1ad00, 
    sockaddr_len=0x7ffff3f1ad84) at name.c:239
#7  0x00007ffff42c726d in socket_client_get_remote_sockaddr (
    this=0x6677a0, sockaddr=0x7ffff3f1ad00, 
    sockaddr_len=0x7ffff3f1ad84, sa_family=0x7ffff3f1ad82)
    at name.c:497
#8  0x00007ffff42c37eb in socket_connect (this=0x6677a0, port=0)
    at socket.c:2064
#9  0x00007ffff7b50ae7 in rpc_transport_connect (this=0x6677a0, 
    port=0) at rpc-transport.c:389
#10 0x00007ffff7b53f49 in rpc_clnt_reconnect (
    trans_ptr=0x6677a0) at rpc-clnt.c:430
#11 0x00007ffff7d90790 in gf_timer_proc (ctx=0x634010)
    at timer.c:168
#12 0x0000003a006077f1 in start_thread ()
   from /lib64/libpthread.so.0
#13 0x0000003a002e570d in clone () from /lib64/libc.so.6

By doing a "manual bisect" first of the three commits that had happened since my last refresh and then of files within the offending commit, I tracked the problem down to an invalid GF_FREE in rpc_transport_load, which was corrupting memory.  A patch will be forthcoming as soon as I get the bug ID.
Comment 1 Amar Tumballi 2012-06-06 01:58:35 EDT
patch sent (by jdarcy) and merged (by avati). http://review.gluster.com/3528

Note You need to log in before you can comment on or make changes to this bug.