+++ This bug was initially created as a clone of Bug #1391086 +++ Description of problem: Clients crash with backtrace similar to one shown below. (gdb) bt #0 0x00007f120b0755f7 in raise () from /lib64/libc.so.6 #1 0x00007f120b076ce8 in abort () from /lib64/libc.so.6 #2 0x00007f120c9d6b9b in dump_core () at ../source3/lib/dumpcore.c:322 #3 0x00007f120c9c9f97 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814 #4 0x00007f120eebc57f in smb_panic (why=why@entry=0x7f120ef0354a "internal error") at ../lib/util/fault.c:166 #5 0x00007f120eebc796 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83 #6 sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94 #7 <signal handler called> #8 0x00007f120f11a210 in pthread_spin_lock () from /lib64/libpthread.so.0 #9 0x00007f11f3f78255 in fd_unref () from /lib64/libglusterfs.so.0 #10 0x00007f11f4661a81 in glfs_io_async_cbk () from /lib64/libgfapi.so.0 #11 0x00007f11f4661eef in glfs_preadv_async_cbk () from /lib64/libgfapi.so.0 #12 0x00007f11df5b002d in io_stats_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/debug/io-stats.so #13 0x00007f11f3fd0c52 in default_readv_cbk () from /lib64/libglusterfs.so.0 #14 0x00007f11f3fd0c52 in default_readv_cbk () from /lib64/libglusterfs.so.0 #15 0x00007f11f3fd0c52 in default_readv_cbk () from /lib64/libglusterfs.so.0 #16 0x00007f11dfdf7787 in ioc_frame_return () from /usr/lib64/glusterfs/3.8.4/xlator/performance/io-cache.so #17 0x00007f11dfdf7b2f in ioc_waitq_return () from /usr/lib64/glusterfs/3.8.4/xlator/performance/io-cache.so #18 0x00007f11dfdf81dd in ioc_fault_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/performance/io-cache.so #19 0x00007f11ec413332 in ra_readv_disabled_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/performance/read-ahead.so #20 0x00007f11f3fd0c52 in default_readv_cbk () from /lib64/libglusterfs.so.0 #21 0x00007f11ec890fe3 in dht_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so #22 0x00007f11ecac7649 in afr_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so #23 0x00007f11ecd59773 in client3_3_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so #24 0x00007f11f4449680 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0 #25 0x00007f11f444995f in rpc_clnt_notify () from /lib64/libgfrpc.so.0 #26 0x00007f11f4445883 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #27 0x00007f11ed78ceb4 in socket_event_poll_in () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #28 0x00007f11ed78f365 in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #29 0x00007f11f3fae340 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #30 0x00007f120f115dc5 in start_thread () from /lib64/libpthread.so.0 #31 0x00007f120b136ced in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): mainline How reproducible: Always when using async api Steps to Reproduce: Will attach a reproducer --- Additional comment from Worker Ant on 2016-11-02 20:38:50 IST --- REVIEW: http://review.gluster.org/15768 (gfapi: async fops should unref in callbacks) posted (#1) for review on master by Raghavendra Talur (rtalur) --- Additional comment from Worker Ant on 2016-11-04 13:02:56 IST --- REVIEW: http://review.gluster.org/15768 (gfapi: async fops should unref in callbacks) posted (#2) for review on master by Raghavendra Talur (rtalur) --- Additional comment from Worker Ant on 2016-11-06 02:02:43 IST --- COMMIT: http://review.gluster.org/15768 committed in master by Shyamsundar Ranganathan (srangana) ------ commit e65738818dd22462ec00dda021566654d1c702b1 Author: Raghavendra Talur <rtalur> Date: Wed Nov 2 19:51:26 2016 +0530 gfapi: async fops should unref in callbacks If fd is unref'd at the end of async call then the unref in cbks would lead to double unref and possible crash. Removing duplicate unrefs. Added unref only in failure cases. A simple test case has been added to test async write case. Need to extend the same for other async APIs too. Details: All glfd based calls in libgfapi, except for glfs_open and glfs_close, behave in the same way. At the start of the operation, they take a ref on glfd and fd. At the end of the operation, they unref it. Async calls are a little different as they unref in the cbk function. A successfull open call does not unref either the glfd or fd, thereby functioning as a reference for a OPEN file object. glfs_close makes a syncop_flush call sandwiched between a fd ref and unref(this can be removed, more on this below), followed by a call to glfs_mark_glfd_for_deletion which unrefs glfd and also calls glfs_fd_destroy as a release function thereby doing a unref on fd too. Functionally, there is no problem with how everything works when as described above. However, it is a little non-intuitive that we need to perform a fd_unref as a consequence of a implicit fd_ref that happens within glfs_resolve_fd. As we perform a GF_REF_GET(glfd) at the start of every operation, it would be worthwhile to remove the fd_ref that glfs_resovle_fd takes and do away with explicit fd_unref()s at the end of every operation. This is the same reason why we don't need the fd_ref in glfs_close. This is however not in the scope of this patch. Change-Id: I86b1d3b2ad846b16ea527d541dc82b5e90b0ba85 BUG: 1391086 Signed-off-by: Raghavendra Talur <rtalur> Reviewed-on: http://review.gluster.org/15768 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Rajesh Joseph <rjoseph> Reviewed-by: Xavier Hernandez <xhernandez> Reviewed-by: soumya k <skoduri> Reviewed-by: Prasanna Kumar Kalever <pkalever> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Shyamsundar Ranganathan <srangana>
REVIEW: http://review.gluster.org/15778 (gfapi: async fops should unref in callbacks) posted (#2) for review on release-3.9 by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15778 (gfapi: async fops should unref in callbacks) posted (#3) for review on release-3.9 by Raghavendra Talur (rtalur)
COMMIT: http://review.gluster.org/15778 committed in release-3.9 by Raghavendra Talur (rtalur) ------ commit a3bade0976d9f976855baf0de479205aa50e2abd Author: Raghavendra Talur <rtalur> Date: Wed Nov 2 19:51:26 2016 +0530 gfapi: async fops should unref in callbacks If fd is unref'd at the end of async call then the unref in cbks would lead to double unref and possible crash. Removing duplicate unrefs. Added unref only in failure cases. A simple test case has been added to test async write case. Need to extend the same for other async APIs too. Details: All glfd based calls in libgfapi, except for glfs_open and glfs_close, behave in the same way. At the start of the operation, they take a ref on glfd and fd. At the end of the operation, they unref it. Async calls are a little different as they unref in the cbk function. A successfull open call does not unref either the glfd or fd, thereby functioning as a reference for a OPEN file object. glfs_close makes a syncop_flush call sandwiched between a fd ref and unref(this can be removed, more on this below), followed by a call to glfs_mark_glfd_for_deletion which unrefs glfd and also calls glfs_fd_destroy as a release function thereby doing a unref on fd too. Functionally, there is no problem with how everything works when as described above. However, it is a little non-intuitive that we need to perform a fd_unref as a consequence of a implicit fd_ref that happens within glfs_resolve_fd. As we perform a GF_REF_GET(glfd) at the start of every operation, it would be worthwhile to remove the fd_ref that glfs_resovle_fd takes and do away with explicit fd_unref()s at the end of every operation. This is the same reason why we don't need the fd_ref in glfs_close. This is however not in the scope of this patch. Change-Id: I86b1d3b2ad846b16ea527d541dc82b5e90b0ba85 BUG: 1392286 Signed-off-by: Raghavendra Talur <rtalur> Reviewed-on: http://review.gluster.org/15768 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Rajesh Joseph <rjoseph> Reviewed-by: Xavier Hernandez <xhernandez> Reviewed-by: soumya k <skoduri> Reviewed-by: Prasanna Kumar Kalever <pkalever> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Shyamsundar Ranganathan <srangana> (cherry picked from commit e65738818dd22462ec00dda021566654d1c702b1) Reviewed-on: http://review.gluster.org/15778
REVIEW: http://review.gluster.org/15829 (gfapi: remote extra fd_unref) posted (#1) for review on release-3.9 by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15829 (gfapi: remote extra fd_unref) posted (#2) for review on release-3.9 by Raghavendra Talur (rtalur)
REVIEW: http://review.gluster.org/15829 (gfapi: remove extra fd_unref) posted (#3) for review on release-3.9 by Raghavendra Talur (rtalur)
COMMIT: http://review.gluster.org/15829 committed in release-3.9 by Raghavendra Talur (rtalur) ------ commit 53ea680ac8a2acf0ecc48d07cb555093e1f0876f Author: Raghavendra Talur <rtalur> Date: Fri Nov 11 16:35:12 2016 +0530 gfapi: remove extra fd_unref Online gerrit rebase does not seem to be working. Commit a3bade0976d9f976855baf0de479205aa50e2abd is not complete backport of http://review.gluster.org/#/c/15768. Change-Id: I5229d87c8c403106f20c50bc8a7f38dad9a2303d BUG: 1392286 Signed-off-by: Raghavendra Talur <rtalur> Reviewed-on: http://review.gluster.org/15829 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org>
This bug is getting closed because GlusterFS-3.9 has reached its end-of-life [1]. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please open a new bug against the newer release. [1]: https://www.gluster.org/community/release-schedule/