Bug 796581 - server crashes on disconnect because of race in connection destroy
Summary: server crashes on disconnect because of race in connection destroy
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: pre-release
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact: Anush Shetty
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-02-23 09:03 UTC by Pranith Kumar K
Modified: 2013-07-24 17:16 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:16:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2012-02-23 09:03:29 UTC
Description of problem:
#0  __pthread_mutex_lock (mutex=0x0) at pthread_mutex_lock.c:51
        __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
        type = 28397472
#1  0x00007fbf02fa860c in server_connection_cleanup (this=0x7fbf03f2d660, conn=0x7fbf01b14f80) at server-helpers.c:549
        ltable = 0x0
        fdentries = 0x0
        fd_count = 0
        ret = 0
        __FUNCTION__ = "server_connection_cleanup"
#2  0x00007fbf02fa0f73 in server_submit_reply (frame=0x7fbf067fa77c, req=0x7fbf027760f0, arg=0x7fbf02671a50, payload=0x0, payloadcount=0, iobref=0x7fbf01a91f70, 
    xdrproc=0x7fbf07a8fcf3 <xdr_gf_common_rsp>) at server.c:164
        iob = 0x7fbf07787eb0
        ret = -1
        rsp = {iov_base = 0x7fbf07763c00, iov_len = 12}
        state = 0x7fbf01b0c810
        new_iobref = 1 '\001'
        conn = 0x7fbf01b14f80
        __FUNCTION__ = "server_submit_reply"
#3  0x00007fbf02fae158 in server_finodelk_cbk (frame=0x7fbf067fa77c, cookie=0x7fbf06a052bc, this=0x7fbf03f2d660, op_ret=-1, op_errno=11) at server3_1-fops.c:280
        rsp = {op_ret = -1, op_errno = 11, xdata = {xdata_len = 0, xdata_val = 0x0}}
        state = 0x7fbf01b0c810
        conn = 0x7fbf01b14f80
        req = 0x7fbf027760f0
        __FUNCTION__ = "server_finodelk_cbk"
#4  0x00007fbf031dbd3b in io_stats_finodelk_cbk (frame=0x7fbf06a052bc, cookie=0x7fbf06a05770, this=0x7fbf03f03660, op_ret=-1, op_errno=11) at io-stats.c:1800
        fn = 0x7fbf02fadf03 <server_finodelk_cbk>
        _parent = 0x7fbf067fa77c
        old_THIS = 0x7fbf03f03660
        __FUNCTION__ = "io_stats_finodelk_cbk"
#5  0x00007fbf07ed9a96 in default_finodelk_cbk (frame=0x7fbf06a05770, cookie=0x7fbf06a0581c, this=0x7fbf03ec8660, op_ret=-1, op_errno=11) at defaults.c:360
        fn = 0x7fbf031dbb05 <io_stats_finodelk_cbk>
        _parent = 0x7fbf06a052bc
        old_THIS = 0x7fbf03ec8660
        __FUNCTION__ = "default_finodelk_cbk"
#6  0x00007fbf07ed9a96 in default_finodelk_cbk (frame=0x7fbf06a0581c, cookie=0x7fbf06a058c8, this=0x7fbf03ea6660, op_ret=-1, op_errno=11) at defaults.c:360
        fn = 0x7fbf07ed994d <default_finodelk_cbk>
        _parent = 0x7fbf06a05770
        old_THIS = 0x7fbf03ea6660
        __FUNCTION__ = "default_finodelk_cbk"
#7  0x00007fbf0382ad09 in iot_finodelk_cbk (frame=0x7fbf06a058c8, cookie=0x7fbf06a05368, this=0x7fbf03e8c660, op_ret=-1, op_errno=11) at io-threads.c:2023
        fn = 0x7fbf07ed994d <default_finodelk_cbk>
        _parent = 0x7fbf06a0581c
        old_THIS = 0x7fbf03e8c660
        __FUNCTION__ = "iot_finodelk_cbk"


[2012-02-23 13:35:01.513347] I [server-helpers.c:763:server_connection_destroy] 0-repl2-server: destroyed connection of pranithk-laptop-15694-2012/02/23-13:33:50:194603-repl2-client-0
[2012-02-23 13:35:01.513578] I [socket.c:2379:socket_submit_reply] 0-tcp.repl2-server: not connected (priv->connected = 255)
[2012-02-23 13:35:01.513795] E [rpcsvc.c:1078:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1682x, Program: GlusterFS 3git, ProgVers: 330, Proc: 30) to rpc-transport (tcp.repl2-server)
[2012-02-23 13:35:01.577018] E [server.c:162:server_submit_reply] (-->/usr/local/lib/libglusterfs.so.0(default_finodelk_cbk+0x149) [0x7fbf07ed9a96] (-->/usr/local/lib/glusterfs/3git/xlator/debug/io-stats.so(io_stats_finodelk_cbk+0x236) [0x7fbf031dbd3b] (-->/usr/local/lib/glusterfs/3git/xlator/protocol/server.so(server_finodelk_cbk+0x255) [0x7fbf02fae158]))) 0-: Reply submission failed
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-02-23 13:35:01
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3git
/lib64/libc.so.6[0x3fca436300]
/lib64/libpthread.so.0(pthread_mutex_lock+0x4)[0x3fca809db4]


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jeff Darcy 2012-02-23 15:26:32 UTC
OK, I see that the lock pointer is null, and that server_connection_cleanup doesn't protect against concurrent calls.  Is the race that there might be multiple threads calling server_connection_cleanup from server_submit_reply?

Comment 2 Pranith Kumar K 2012-03-01 17:11:07 UTC
The problem is that Even before the reqs that are in transit are replied the conn structure is Destroyed leading to a crash. So the fix is to take refs/unrefs for the conn on receiving/replying req respectively. By the time conn->ref count becomes 0 there should not be anymore reqs in transit. So the connection destroy should just free all the memory. (This last part is where I made a mistake in my previous fix to this issue.)

Comment 3 Anand Avati 2012-03-01 17:12:16 UTC
CHANGE: http://review.gluster.com/2806 (protocol/server: Make conn object ref-counted) merged in master by Vijay Bellur (vijay)

Comment 4 Anush Shetty 2012-05-30 13:34:11 UTC
Verified with release-3.3


Note You need to log in before you can comment on or make changes to this bug.