Bug 1391086

Summary: gfapi clients crash while using async calls due to double fd_unref
Product: [Community] GlusterFS Reporter: Raghavendra Talur <rtalur>
Component: libgfapiAssignee: Raghavendra Talur <rtalur>
Status: CLOSED CURRENTRELEASE QA Contact: Sudhir D <sdharane>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.10.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1392286 1392288 1392289 (view as bug list) Environment:
Last Closed: 2017-03-06 17:32:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1391093, 1392286, 1392288, 1392289    

Description Raghavendra Talur 2016-11-02 14:20:36 UTC
Description of problem:
Clients crash with backtrace similar to one shown below.

(gdb) bt
#0  0x00007f120b0755f7 in raise () from /lib64/libc.so.6
#1  0x00007f120b076ce8 in abort () from /lib64/libc.so.6
#2  0x00007f120c9d6b9b in dump_core () at ../source3/lib/dumpcore.c:322
#3  0x00007f120c9c9f97 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814
#4  0x00007f120eebc57f in smb_panic (why=why@entry=0x7f120ef0354a "internal error") at ../lib/util/fault.c:166
#5  0x00007f120eebc796 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83
#6  sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94
#7  <signal handler called>
#8  0x00007f120f11a210 in pthread_spin_lock () from /lib64/libpthread.so.0
#9  0x00007f11f3f78255 in fd_unref () from /lib64/libglusterfs.so.0
#10 0x00007f11f4661a81 in glfs_io_async_cbk () from /lib64/libgfapi.so.0
#11 0x00007f11f4661eef in glfs_preadv_async_cbk () from /lib64/libgfapi.so.0
#12 0x00007f11df5b002d in io_stats_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/debug/io-stats.so
#13 0x00007f11f3fd0c52 in default_readv_cbk () from /lib64/libglusterfs.so.0
#14 0x00007f11f3fd0c52 in default_readv_cbk () from /lib64/libglusterfs.so.0
#15 0x00007f11f3fd0c52 in default_readv_cbk () from /lib64/libglusterfs.so.0
#16 0x00007f11dfdf7787 in ioc_frame_return () from /usr/lib64/glusterfs/3.8.4/xlator/performance/io-cache.so
#17 0x00007f11dfdf7b2f in ioc_waitq_return () from /usr/lib64/glusterfs/3.8.4/xlator/performance/io-cache.so
#18 0x00007f11dfdf81dd in ioc_fault_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/performance/io-cache.so
#19 0x00007f11ec413332 in ra_readv_disabled_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/performance/read-ahead.so
#20 0x00007f11f3fd0c52 in default_readv_cbk () from /lib64/libglusterfs.so.0
#21 0x00007f11ec890fe3 in dht_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so
#22 0x00007f11ecac7649 in afr_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so
#23 0x00007f11ecd59773 in client3_3_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so
#24 0x00007f11f4449680 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
#25 0x00007f11f444995f in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#26 0x00007f11f4445883 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#27 0x00007f11ed78ceb4 in socket_event_poll_in () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#28 0x00007f11ed78f365 in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#29 0x00007f11f3fae340 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#30 0x00007f120f115dc5 in start_thread () from /lib64/libpthread.so.0
#31 0x00007f120b136ced in clone () from /lib64/libc.so.6


Version-Release number of selected component (if applicable):
mainline


How reproducible:
Always when using async api


Steps to Reproduce:
Will attach a reproducer

Comment 1 Worker Ant 2016-11-02 15:08:50 UTC
REVIEW: http://review.gluster.org/15768 (gfapi: async fops should unref in callbacks) posted (#1) for review on master by Raghavendra Talur (rtalur)

Comment 2 Worker Ant 2016-11-04 07:32:56 UTC
REVIEW: http://review.gluster.org/15768 (gfapi: async fops should unref in callbacks) posted (#2) for review on master by Raghavendra Talur (rtalur)

Comment 3 Worker Ant 2016-11-05 20:32:43 UTC
COMMIT: http://review.gluster.org/15768 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit e65738818dd22462ec00dda021566654d1c702b1
Author: Raghavendra Talur <rtalur>
Date:   Wed Nov 2 19:51:26 2016 +0530

    gfapi: async fops should unref in callbacks
    
    If fd is unref'd at the end of async call then the unref in cbks would
    lead to double unref and possible crash. Removing duplicate unrefs.
    
    Added unref only in failure cases.
    
    A simple test case has been added to test async write case. Need to
    extend the same for other async APIs too.
    
    Details:
    All glfd based calls in libgfapi, except for glfs_open and glfs_close,
    behave in the same way. At the start of the operation, they take a ref
    on glfd and fd. At the end of the operation, they unref it. Async calls
    are a little different as they unref in the cbk function. A successfull
    open call does not unref either the glfd or fd, thereby functioning as a
    reference for a OPEN file object. glfs_close makes a syncop_flush call
    sandwiched between a fd ref and unref(this can be removed, more on this
    below), followed by a call to glfs_mark_glfd_for_deletion which unrefs
    glfd and also calls glfs_fd_destroy as a release function thereby doing
    a unref on fd too.
    
    Functionally, there is no problem with how everything works when as
    described above. However, it is a little non-intuitive that we need to
    perform a fd_unref as a consequence of a implicit fd_ref that happens
    within glfs_resolve_fd. As we perform a GF_REF_GET(glfd) at the start of
    every operation, it would be worthwhile to remove the fd_ref that
    glfs_resovle_fd takes and do away with explicit fd_unref()s at the end
    of every operation. This is the same reason why we don't need the fd_ref
    in glfs_close. This is however not in the scope of this patch.
    
    Change-Id: I86b1d3b2ad846b16ea527d541dc82b5e90b0ba85
    BUG: 1391086
    Signed-off-by: Raghavendra Talur <rtalur>
    Reviewed-on: http://review.gluster.org/15768
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Rajesh Joseph <rjoseph>
    Reviewed-by: Xavier Hernandez <xhernandez>
    Reviewed-by: soumya k <skoduri>
    Reviewed-by: Prasanna Kumar Kalever <pkalever>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 4 Worker Ant 2016-11-07 06:35:49 UTC
REVIEW: http://review.gluster.org/15778 (gfapi: async fops should unref in callbacks) posted (#1) for review on release-3.9 by Raghavendra Talur (rtalur)

Comment 5 Shyamsundar 2017-03-06 17:32:36 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/