Description of problem: I got disconnects to the client and then after some saved frames unwind, I got this crash. I was running dbench with some extra options and ran from a single client. Core was generated by `/usr/local/sbin/glusterfs --volfile-id=vol --volfile-server=dagobah mount/'. Program terminated with signal 11, Segmentation fault. #0 0x00007f28ee9d6f50 in afr_changelog_post_op_cbk (frame=0x7f28f0fd0264, cookie=0x7f28f124b778, this=0x7f28dc78aeb0, op_ret=0, op_errno=22, xattr=0x7f28dc3f58b0) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:347 347 call_count = --local->call_count; (gdb) bt #0 0x00007f28ee9d6f50 in afr_changelog_post_op_cbk (frame=0x7f28f0fd0264, cookie=0x7f28f124b778, this=0x7f28dc78aeb0, op_ret=0, op_errno=22, xattr=0x7f28dc3f58b0) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:347 #1 0x00007f28eec425e0 in client3_1_xattrop_cbk (req=0x7f28bbb599cc, iov=0x7f28bbb59a0c, count=1, myframe=0x7f28f124b778) at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1425 #2 0x00007f28f2a329c6 in rpc_clnt_handle_reply (clnt=0x7f28df162de0, pollin=0x7f28dc0e0f70) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789 #3 0x00007f28f2a32d28 in rpc_clnt_notify (trans=0x7f28dd07e8c0, mydata=0x7f28df162e10, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f28dc0e0f70) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:908 #4 0x00007f28f2a2ee3d in rpc_transport_notify (this=0x7f28dd07e8c0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f28dc0e0f70) at ../../../../rpc/rpc-lib/src/rpc-transport.c:498 #5 0x00007f28ef8a4359 in socket_event_poll_in (this=0x7f28dd07e8c0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1675 #6 0x00007f28ef8a48cd in socket_event_handler (fd=168, idx=18, data=0x7f28dd07e8c0, poll_in=1, poll_out=0, poll_err=0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1790 #7 0x00007f28f2c834b9 in event_dispatch_epoll_handler (event_pool=0x231d2d0, events=0x2322a60, i=0) at ../../../libglusterfs/src/event.c:794 #8 0x00007f28f2c836d3 in event_dispatch_epoll (event_pool=0x231d2d0) at ../../../libglusterfs/src/event.c:856 #9 0x00007f28f2c83a45 in event_dispatch (event_pool=0x231d2d0) at ../../../libglusterfs/src/event.c:956 #10 0x0000000000407d83 in main (argc=4, argv=0x7ffffe95ca58) at ../../../glusterfsd/src/glusterfsd.c:1601 (gdb) f 1 #1 0x00007f28eec425e0 in client3_1_xattrop_cbk (req=0x7f28bbb599cc, iov=0x7f28bbb59a0c, count=1, myframe=0x7f28f124b778) at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1425 1425 STACK_UNWIND_STRICT (xattrop, frame, op_ret, (gdb) p *this $3 = {name = 0x7f28dc515770 "vol-client-4", type = 0x7f28dc14a980 "protocol/client", next = 0x7f28dc78b820, prev = 0x7f28dc78cb00, parents = 0x7f28dea2b4e0, children = 0x0, options = 0x7f28dc5b1e80, dlhandle = 0x2329420, fops = 0x7f28eee61460, cbks = 0x7f28eee61440, dumpops = 0x7f28eee61700, volume_options = {next = 0x7f28dc14aa30, prev = 0x7f28de928e50}, fini = 0x7f28eec3a78e <fini>, init = 0x7f28eec3a5c5 <init>, reconfigure = 0x7f28eec3a418 <reconfigure>, mem_acct_init = 0x7f28eec39fd3 <mem_acct_init>, notify = 0x7f28eec39c68 <notify>, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 45 times>}, ctx = 0x2305010, graph = 0x7f28de816d70, itable = 0x0, init_succeeded = 1 '\001', private = 0x7f28df10c880, mem_acct = {num_types = 91, rec = 0x7f28de3cfea0}} Logs: [2011-12-12 15:51:38.051586] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f28f2a32bad] (-->/usr/local/l ib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x110) [0x7f28f2a3211f] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f28f2a31be9]))) 17-vol-c lient-5: forced unwinding frame type(GlusterFS 3.1) op(RENAME(8)) called at 2011-12-12 15:51:37.657225 [2011-12-12 15:51:38.051605] W [client3_1-fops.c:2015:client3_1_rename_cbk] 17-vol-client-5: remote operation failed: Transport endpoint is not connected [2011-12-12 15:51:38.051799] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f28f2a32bad] (-->/usr/local/l ib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x110) [0x7f28f2a3211f] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f28f2a31be9]))) 17-vol-c lient-5: forced unwinding frame type(GlusterFS 3.1) op(RELEASEDIR(42)) called at 2011-12-12 15:51:37.694882 [2011-12-12 15:51:38.051838] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f28f2a32bad] (-->/usr/local/l ib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x110) [0x7f28f2a3211f] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f28f2a31be9]))) 17-vol-c lient-5: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-12-12 15:51:37.695624 [2011-12-12 15:51:38.051865] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 17-vol-client-5: remote operation failed: Transport endpoint is not connected. Pa th: /clients/client2/~dmtmp/COREL/GRAPH2.BAK [2011-12-12 15:51:38.051959] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f28f2a32bad] (-->/usr/local/l ib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x110) [0x7f28f2a3211f] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f28f2a31be9]))) 17-vol-c lient-5: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-12-12 15:51:37.697138 [2011-12-12 15:51:38.051977] W [client3_1-fops.c:2249:client3_1_lookup_cbk] 17-vol-client-5: remote operation failed: Transport endpoint is not connected. Pa th: /clients/client8 Version-Release number of selected component (if applicable): How reproducible: Hit it only once. Trying to reproduce. Steps to Reproduce: Though not exact steps, this is how i got the crash. 1. Created a distributed replicate volume 3x2. 2. enabled geo-rep & quota 3. made numerous graph changes by doing stat-prefetch on/off 4. when the crash happened stat-prefetch was off. 5. ran dbench -s -F -S --stat-check 10 Actual results: the client crashed after a brick had been crashed. Raising a different bug for it. Expected results: Should have completed dbench successfully. Additional info:
Created attachment 545693 [details] logs of all bricks and client.
CHANGE: http://review.gluster.com/783 (cluster/afr: Double the call count if transaction is for rename) merged in master by Vijay Bellur (vijay)
*** Bug 765435 has been marked as a duplicate of this bug. ***
Crash not seen with same steps along with brick up/down. Tested against latest git head.