After updating to the HEAD on master, I started seeing problems first with mounts and then with operations involving glusterd generally. Sometimes operations would seem to succeed, at least partially, but take a *very* long time. More often, glusterd would terminate with complaints about malloc detecting a double free something like the following (just an example - this code is not actually the culprit): (gdb) bt #0 0x0000003a00232885 in raise () from /lib64/libc.so.6 #1 0x0000003a00234065 in abort () from /lib64/libc.so.6 #2 0x0000003a0026f7a7 in __libc_message () from /lib64/libc.so.6 #3 0x0000003a002750c6 in malloc_printerr () from /lib64/libc.so.6 #4 0x0000003a002ccf68 in freeaddrinfo () from /lib64/libc.so.6 #5 0x00007ffff7d8bb1c in gf_resolve_ip6 ( hostname=0x6672a0 "gfs2", port=24007, family=2, dnscache=0x667818, addr_info=0x7ffff3f1ac30) at common-utils.c:155 #6 0x00007ffff42c66f9 in af_inet_client_get_remote_sockaddr ( this=0x6677a0, sockaddr=0x7ffff3f1ad00, sockaddr_len=0x7ffff3f1ad84) at name.c:239 #7 0x00007ffff42c726d in socket_client_get_remote_sockaddr ( this=0x6677a0, sockaddr=0x7ffff3f1ad00, sockaddr_len=0x7ffff3f1ad84, sa_family=0x7ffff3f1ad82) at name.c:497 #8 0x00007ffff42c37eb in socket_connect (this=0x6677a0, port=0) at socket.c:2064 #9 0x00007ffff7b50ae7 in rpc_transport_connect (this=0x6677a0, port=0) at rpc-transport.c:389 #10 0x00007ffff7b53f49 in rpc_clnt_reconnect ( trans_ptr=0x6677a0) at rpc-clnt.c:430 #11 0x00007ffff7d90790 in gf_timer_proc (ctx=0x634010) at timer.c:168 #12 0x0000003a006077f1 in start_thread () from /lib64/libpthread.so.0 #13 0x0000003a002e570d in clone () from /lib64/libc.so.6 By doing a "manual bisect" first of the three commits that had happened since my last refresh and then of files within the offending commit, I tracked the problem down to an invalid GF_FREE in rpc_transport_load, which was corrupting memory. A patch will be forthcoming as soon as I get the bug ID.
patch sent (by jdarcy) and merged (by avati). http://review.gluster.com/3528