Hide Forgot
I observed this crash only in a 4 subvolume setup of AFR. Consistently the 4th server crashed. Works for 2 and 3 subvolume setup. backtrace: (gdb) bt #0 0x000000000188b9e8 in ?? () #1 0x00007f235be7097f in rpc_transport_submit_reply (this=0x189a528, reply=0x7f235c447900) at rpc-transport.c:1111 #2 0x00007f235be6bd82 in rpcsvc_conn_submit (conn=0x188c278, hdrvec=0x7f235c4479c0, hdrcount=1, proghdr=0x7f235c447a50, proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=0x7f2354000958, priv=0x0) at rpcsvc.c:1369 #3 0x00007f235be6c3b8 in rpcsvc_submit_generic (req=0x7f2358f44898, proghdr=0x7f235c447a50, hdrcount=1, payload=0x0, payloadcount=0, iobref=0x7f2354000958) at rpcsvc.c:1530 #4 0x00007f2359beaaec in server_submit_reply (frame=0x18b75b0, req=0x7f2358f44898, arg=0x7f235c447b00, payload=0x0, payloadcount=0, iobref=0x7f2354000958, sfunc=0x7f235bc56eca <xdr_serialize_writev_rsp>) at server.c:123 #5 0x00007f2359bf69ac in server_writev_cbk (frame=0x18b75b0, cookie=0x18ae108, this=0x1893fd8, op_ret=131072, op_errno=0, prebuf=0x7f235c447db0, postbuf=0x7f235c447d40) at server3_1-fops.c:1235 #6 0x00007f2359e189cd in iot_writev_cbk (frame=0x18ae108, cookie=0x7f2354001608, this=0x1892d88, op_ret=131072, op_errno=0, prebuf=0x7f235c447db0, postbuf=0x7f235c447d40) at io-threads.c:945 #7 0x00007f235a02c5a4 in pl_writev_cbk (frame=0x7f2354001608, cookie=0x7f23540145e8, this=0x18919f8, op_ret=131072, op_errno=0, prebuf=0x7f235c447db0, postbuf=0x7f235c447d40) at posix.c:497 #8 0x00007f235a248b08 in posix_writev (frame=0x7f23540145e8, this=0x1890548, fd=0x189c6a8, vector=0x189c978, count=1, offset=65536, iobref=0x18ae098) at posix.c:2548 #9 0x00007f235a02d96f in pl_writev (frame=0x7f2354001608, this=0x18919f8, fd=0x189c6a8, vector=0x189c978, count=1, offset=65536, iobref=0x18ae098) at posix.c:736 #10 0x00007f2359e18bf6 in iot_writev_wrapper (frame=0x18ae108, this=0x1892d88, fd=0x189c6a8, vector=0x189c978, count=1, offset=65536, iobref=0x18ae098) at io-threads.c:955 #11 0x00007f235c0ae5cd in call_resume_wind (stub=0x18b71d8) at call-stub.c:2233 #12 0x00007f235c0b5015 in call_resume (stub=0x18b71d8) at call-stub.c:3852 #13 0x00007f2359e13332 in iot_worker (data=0x1898078) at io-threads.c:118 #14 0x00007f235b830a04 in start_thread (arg=<value optimized out>) at pthread_create.c:300 #15 0x00007f235b599d4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #16 0x0000000000000000 in ?? () Client log: 2010-07-24 19:01:28.226805] T [rpcsvc.c:940:rpcsvc_program_actor] rpc-service: Actor found: GlusterFS-3.1.0 - WRITE [2010-07-24 19:01:28.226846] T [socket.c:185:__socket_rwv] RPC: EOF from peer 127.0.0.1:1018 [2010-07-24 19:01:28.226867] T [socket.c:1120:__socket_read_frag] RPC: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1018) [2010-07-24 19:01:28.226887] T [socket.c:1446:socket_event_handler] transport: disconnecting now [2010-07-24 19:01:28.226935] T [socket.c:2245:fini] RPC: transport 0x189a528 destroyed [2010-07-24 19:01:28.238810] T [rpcsvc.c:1508:rpcsvc_submit_generic] rpc-service: Tx message: 200 [2010-07-24 19:01:28.238835] T [rpcsvc.c:1314:rpcsvc_record_build_header] rpc-service: Reply fraglen 224, payload: 200, rpc hdr: 24 pending frames: patchset: git://git.sv.gnu.org/gluster.git signal received: 11 time of crash: 2010-07-24 19:01:28 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.0git /lib/libc.so.6[0x7f235b4ed530] [0x188b9e8]
Ok, the crash has happended in rpc-transport.c:1111 which means the code went in peer-transport shortcut path. This can happen only in case of 'conn' pointer being corrupted.. that needs more debugging, but currently i will mark it duplicate of 1223. *** This bug has been marked as a duplicate of bug 1223 ***