Description of problem: #0 __pthread_mutex_lock (mutex=0x0) at pthread_mutex_lock.c:51 __PRETTY_FUNCTION__ = "__pthread_mutex_lock" type = 28397472 #1 0x00007fbf02fa860c in server_connection_cleanup (this=0x7fbf03f2d660, conn=0x7fbf01b14f80) at server-helpers.c:549 ltable = 0x0 fdentries = 0x0 fd_count = 0 ret = 0 __FUNCTION__ = "server_connection_cleanup" #2 0x00007fbf02fa0f73 in server_submit_reply (frame=0x7fbf067fa77c, req=0x7fbf027760f0, arg=0x7fbf02671a50, payload=0x0, payloadcount=0, iobref=0x7fbf01a91f70, xdrproc=0x7fbf07a8fcf3 <xdr_gf_common_rsp>) at server.c:164 iob = 0x7fbf07787eb0 ret = -1 rsp = {iov_base = 0x7fbf07763c00, iov_len = 12} state = 0x7fbf01b0c810 new_iobref = 1 '\001' conn = 0x7fbf01b14f80 __FUNCTION__ = "server_submit_reply" #3 0x00007fbf02fae158 in server_finodelk_cbk (frame=0x7fbf067fa77c, cookie=0x7fbf06a052bc, this=0x7fbf03f2d660, op_ret=-1, op_errno=11) at server3_1-fops.c:280 rsp = {op_ret = -1, op_errno = 11, xdata = {xdata_len = 0, xdata_val = 0x0}} state = 0x7fbf01b0c810 conn = 0x7fbf01b14f80 req = 0x7fbf027760f0 __FUNCTION__ = "server_finodelk_cbk" #4 0x00007fbf031dbd3b in io_stats_finodelk_cbk (frame=0x7fbf06a052bc, cookie=0x7fbf06a05770, this=0x7fbf03f03660, op_ret=-1, op_errno=11) at io-stats.c:1800 fn = 0x7fbf02fadf03 <server_finodelk_cbk> _parent = 0x7fbf067fa77c old_THIS = 0x7fbf03f03660 __FUNCTION__ = "io_stats_finodelk_cbk" #5 0x00007fbf07ed9a96 in default_finodelk_cbk (frame=0x7fbf06a05770, cookie=0x7fbf06a0581c, this=0x7fbf03ec8660, op_ret=-1, op_errno=11) at defaults.c:360 fn = 0x7fbf031dbb05 <io_stats_finodelk_cbk> _parent = 0x7fbf06a052bc old_THIS = 0x7fbf03ec8660 __FUNCTION__ = "default_finodelk_cbk" #6 0x00007fbf07ed9a96 in default_finodelk_cbk (frame=0x7fbf06a0581c, cookie=0x7fbf06a058c8, this=0x7fbf03ea6660, op_ret=-1, op_errno=11) at defaults.c:360 fn = 0x7fbf07ed994d <default_finodelk_cbk> _parent = 0x7fbf06a05770 old_THIS = 0x7fbf03ea6660 __FUNCTION__ = "default_finodelk_cbk" #7 0x00007fbf0382ad09 in iot_finodelk_cbk (frame=0x7fbf06a058c8, cookie=0x7fbf06a05368, this=0x7fbf03e8c660, op_ret=-1, op_errno=11) at io-threads.c:2023 fn = 0x7fbf07ed994d <default_finodelk_cbk> _parent = 0x7fbf06a0581c old_THIS = 0x7fbf03e8c660 __FUNCTION__ = "iot_finodelk_cbk" [2012-02-23 13:35:01.513347] I [server-helpers.c:763:server_connection_destroy] 0-repl2-server: destroyed connection of pranithk-laptop-15694-2012/02/23-13:33:50:194603-repl2-client-0 [2012-02-23 13:35:01.513578] I [socket.c:2379:socket_submit_reply] 0-tcp.repl2-server: not connected (priv->connected = 255) [2012-02-23 13:35:01.513795] E [rpcsvc.c:1078:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1682x, Program: GlusterFS 3git, ProgVers: 330, Proc: 30) to rpc-transport (tcp.repl2-server) [2012-02-23 13:35:01.577018] E [server.c:162:server_submit_reply] (-->/usr/local/lib/libglusterfs.so.0(default_finodelk_cbk+0x149) [0x7fbf07ed9a96] (-->/usr/local/lib/glusterfs/3git/xlator/debug/io-stats.so(io_stats_finodelk_cbk+0x236) [0x7fbf031dbd3b] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/server.so(server_finodelk_cbk+0x255) [0x7fbf02fae158]))) 0-: Reply submission failed pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2012-02-23 13:35:01 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3git /lib64/libc.so.6[0x3fca436300] /lib64/libpthread.so.0(pthread_mutex_lock+0x4)[0x3fca809db4] Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
OK, I see that the lock pointer is null, and that server_connection_cleanup doesn't protect against concurrent calls. Is the race that there might be multiple threads calling server_connection_cleanup from server_submit_reply?
The problem is that Even before the reqs that are in transit are replied the conn structure is Destroyed leading to a crash. So the fix is to take refs/unrefs for the conn on receiving/replying req respectively. By the time conn->ref count becomes 0 there should not be anymore reqs in transit. So the connection destroy should just free all the memory. (This last part is where I made a mistake in my previous fix to this issue.)
CHANGE: http://review.gluster.com/2806 (protocol/server: Make conn object ref-counted) merged in master by Vijay Bellur (vijay)
Verified with release-3.3