829104 – Double-free corruption in glusterd

Bug 829104 - Double-free corruption in glusterd

Summary: Double-free corruption in glusterd

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	rpc
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Assignee:	Jeff Darcy
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-06-06 01:59 UTC by Jeff Darcy
Modified:	2013-07-24 17:51 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.4.0
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-07-24 17:51:23 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jeff Darcy 2012-06-06 01:59:09 UTC

After updating to the HEAD on master, I started seeing problems first with mounts and then with operations involving glusterd generally.  Sometimes operations would seem to succeed, at least partially, but take a *very* long time.  More often, glusterd would terminate with complaints about malloc detecting a double free something like the following (just an example - this code is not actually the culprit):

(gdb) bt
#0  0x0000003a00232885 in raise () from /lib64/libc.so.6
#1  0x0000003a00234065 in abort () from /lib64/libc.so.6
#2  0x0000003a0026f7a7 in __libc_message ()
   from /lib64/libc.so.6
#3  0x0000003a002750c6 in malloc_printerr ()
   from /lib64/libc.so.6
#4  0x0000003a002ccf68 in freeaddrinfo () from /lib64/libc.so.6
#5  0x00007ffff7d8bb1c in gf_resolve_ip6 (
    hostname=0x6672a0 "gfs2", port=24007, family=2, 
    dnscache=0x667818, addr_info=0x7ffff3f1ac30)
    at common-utils.c:155
#6  0x00007ffff42c66f9 in af_inet_client_get_remote_sockaddr (
    this=0x6677a0, sockaddr=0x7ffff3f1ad00, 
    sockaddr_len=0x7ffff3f1ad84) at name.c:239
#7  0x00007ffff42c726d in socket_client_get_remote_sockaddr (
    this=0x6677a0, sockaddr=0x7ffff3f1ad00, 
    sockaddr_len=0x7ffff3f1ad84, sa_family=0x7ffff3f1ad82)
    at name.c:497
#8  0x00007ffff42c37eb in socket_connect (this=0x6677a0, port=0)
    at socket.c:2064
#9  0x00007ffff7b50ae7 in rpc_transport_connect (this=0x6677a0, 
    port=0) at rpc-transport.c:389
#10 0x00007ffff7b53f49 in rpc_clnt_reconnect (
    trans_ptr=0x6677a0) at rpc-clnt.c:430
#11 0x00007ffff7d90790 in gf_timer_proc (ctx=0x634010)
    at timer.c:168
#12 0x0000003a006077f1 in start_thread ()
   from /lib64/libpthread.so.0
#13 0x0000003a002e570d in clone () from /lib64/libc.so.6

By doing a "manual bisect" first of the three commits that had happened since my last refresh and then of files within the offending commit, I tracked the problem down to an invalid GF_FREE in rpc_transport_load, which was corrupting memory.  A patch will be forthcoming as soon as I get the bug ID.

Comment 1 Amar Tumballi 2012-06-06 05:58:35 UTC

patch sent (by jdarcy) and merged (by avati). http://review.gluster.com/3528

Note You need to log in before you can comment on or make changes to this bug.