Created attachment 558618 [details] fuse client log Description of problem: Was building glusterfs on the mountpoint and doing some profile/top operations on the server. It was stripe-replicate volume with stripe-block-size set to 64MB. After make exited successfully with the zero exit status took down one of the replicate pair down,then mountpoint became inaccessible. Version-Release number of selected component (if applicable): glusterfs-3.3.0qa20 How reproducible: 1/1 Steps to Reproduce: 1. Create and start a stripe replicate volume. 2. Set the stripe-block-size to 64MB and enable profiling. 3. untar both linux kernel source and glusterfs source and start building the glusterfs source. 4. meanwhile Keep running some profile and top operations. 5. After 'make' took one of the glusterfsd down. Actual results: mountpoint became inaccessible. [root@RHEL6 hosa_dir]# ls ls: reading directory .: Transport endpoint is not connected [root@RHEL6 hosa_dir]# Expected results: Mountpoint should be accessible. Additional info: Following options were set on volume. Volume Name: hosdu Type: Striped-Replicate Volume ID: 56528124-1918-4923-a1cd-c02ddf22e671 Status: Started Number of Bricks: 1 x 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.1.11.113:/data/brick/hosdu_brick1 Brick2: 10.1.11.114:/data/brick/hosdu_brick2 Brick3: 10.1.11.136:/data/brick/hosdu_brick3 Brick4: 10.1.11.137:/data/brick/hosdu_brick4 Options Reconfigured: cluster.stripe-block-size: 64MB diagnostics.count-fop-hits: on diagnostics.latency-measurement: on Entries from the client log. 2012-01-31 07:21:56.632023] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-hosdu-client-1: remote operation failed: Invalid argument [2012-01-31 07:21:56.632077] E [afr-lk-common.c:567:afr_unlock_inodelk_cbk] 0-hosdu-replicate-0: /hosa_dir/glusterfs-3.3.0qa20/rpc/rpc-lib/src/rpcsvc.loT: unlock failed on 1, reason: Invalid argument [2012-01-31 07:28:07.133314] W [socket.c:1510:__socket_proto_state_machine] 0-hosdu-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.113:24009) [2012-01-31 07:28:07.133443] I [client.c:1885:client_rpc_notify] 0-hosdu-client-0: disconnected [2012-01-31 07:28:17.351524] E [socket.c:1713:socket_connect_finish] 0-hosdu-client-0: connection to 10.1.11.113:24009 failed (Connection refused) [2012-01-31 07:28:19.221959] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100583: READDIR => -1 (Transport endpoint is not connected) [2012-01-31 07:28:28.205013] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100597: READDIR => -1 (Transport endpoint is not connected) [2012-01-31 07:28:38.486903] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100615: READDIR => -1 (Transport endpoint is not connected) [2012-01-31 07:28:39.418324] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100619: READDIR => -1 (Transport endpoint is not connected) [2012-01-31 07:28:42.739546] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100623: READDIR => -1 (Transport endpoint is not connected) [2012-01-31 07:29:12.037329] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100627: READDIR => -1 (Transport endpoint is not connected) [2012-01-31 07:34:52.870849] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1101063: READDIR => -1 (Transport endpoint is not connected) I have attached the client log.
I got a core this time around with the glusterfs-3.3.30qa34. (gdb) bt #0 0x00007ffda03c4da1 in stripe_readv_cbk (frame=0x7ffda3c190f4, cookie=<value optimized out>, this=<value optimized out>, op_ret=8070, op_errno=<value optimized out>, vector=<value optimized out>, count=1, stbuf=0x7fff71f33e10, iobref=0x5544190, xdata=0x0) at stripe.c:3271 #1 0x00007ffda05e64f1 in afr_readv_cbk (frame=0x7ffda3db7368, cookie=<value optimized out>, this=<value optimized out>, op_ret=8070, op_errno=2, vector=0x7fff71f33c80, count=1, buf=0x7fff71f33e10, iobref=0x5544190, xdata=0x0) at afr-inode-read.c:1298 #2 0x00007ffda085e3fb in client3_1_readv_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7ffda3da3e58) at client3_1-fops.c:2679 #3 0x00007ffda4d2e515 in rpc_clnt_handle_reply (clnt=0x25fda80, pollin=0x590e6b0) at rpc-clnt.c:797 #4 0x00007ffda4d2ed10 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x25fdab0, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:916 #5 0x00007ffda4d29e48 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:498 #6 0x00007ffda1693704 in socket_event_poll_in (this=0x260d4e0) at socket.c:1686 #7 0x00007ffda16937e7 in socket_event_handler (fd=<value optimized out>, idx=1, data=0x260d4e0, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1801 #8 0x00007ffda4f75884 in event_dispatch_epoll_handler (event_pool=0x2538db0) at event.c:794 #9 event_dispatch_epoll (event_pool=0x2538db0) at event.c:856 #10 0x0000000000406eda in main (argc=<value optimized out>, argv=0x7fff71f34188) at glusterfsd.c:1650 (gdb) p ((stripe_local_t *)(((stripe_local_t *)(frame->local))->orig_frame->local))->fctx->xl_array[0] $9 = (xlator_t *) 0x25638e0 (gdb) p ((stripe_local_t *)(((stripe_local_t *)(frame->local))->orig_frame->local))->fctx->xl_array[1] $10 = (xlator_t *) 0x0
Shishir, Can you please take a look in?
*** This bug has been marked as a duplicate of bug 810450 ***