Description of problem: ========================== On a distribute-replicate volume, even after successful remove-brick operation there are temporary failures in writes on a file from the mount point will error message: "Transport endpoint is not connected" Version-Release number of selected component (if applicable): ============================================================ root@king [Aug-02-2013-16:42:27] >rpm -qa | grep glusterfs-server glusterfs-server-3.4.0.14rhs-1.el6rhs.x86_64 root@king [Aug-02-2013-16:42:34] >gluster --version glusterfs 3.4.0.14rhs built on Jul 30 2013 09:09:36 How reproducible: =================== Steps to Reproduce: ===================== 1.Create 3 x 2 distribute-replicate volume. Start the volume. gluster v info vol_dis_rep Volume Name: vol_dis_rep Type: Distributed-Replicate Volume ID: e8fd704d-f0b4-4b68-bfb2-dd19553c1a68 Status: Created Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: king:/rhs/bricks/b0 Brick2: hicks:/rhs/bricks/b1 Brick3: king:/rhs/bricks/b2 Brick4: hicks:/rhs/bricks/b3 Brick5: king:/rhs/bricks/b4 Brick6: hicks:/rhs/bricks/b5 2.Create a fuse mount. Open a fd on a file. (touch host.conf ; exec 5>./host.conf ) 3.Remove the bricks which has the file "host.conf" ( gluster v remove-brick vol_dis_rep replica 2 king:/rhs/bricks/b2 hicks:/rhs/bricks/b3 start gluster v remove-brick vol_dis_rep replica 2 king:/rhs/bricks/b2 hicks:/rhs/bricks/b3 status gluster v remove-brick vol_dis_rep replica 2 king:/rhs/bricks/b2 hicks:/rhs/bricks/b3 commit ) 4. When the commit operation is done , immediately write to the file from mount point ( root@darrel [Aug-02-2013-16:11:03] >for i in `seq 1 1000`; do echo "Hello World $i" >&5; sleep 1 ; done -bash: echo: write error: Transport endpoint is not connected ) Actual results: ================ Writes on the file didn't fail for ever. The failure was temporary. Following is the mount log messages for the failure: ==================================================== [2013-08-02 10:42:16.053969] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 160: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-08-02 10:42:17.055777] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 161: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-08-02 10:42:17.056406] W [fuse-bridge.c:2681:fuse_writev_cbk] 0-glusterfs-fuse: 163: WRITE => -1 (Transport endpoint is not connected) [2013-08-02 10:42:17.056928] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 164: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-08-02 10:42:17.111462] I [rpc-clnt.c:1675:rpc_clnt_reconfig] 1-vol_dis_rep-client-1: changing port to 49152 (from 0) [2013-08-02 10:42:17.111560] I [rpc-clnt.c:1675:rpc_clnt_reconfig] 1-vol_dis_rep-client-3: changing port to 49155 (from 0) [2013-08-02 10:42:17.129567] I [client-handshake.c:1658:select_server_supported_programs] 1-vol_dis_rep-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2013-08-02 10:42:17.129870] I [client-handshake.c:1658:select_server_supported_programs] 1-vol_dis_rep-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2013-08-02 10:42:17.130166] I [client-handshake.c:1456:client_setvolume_cbk] 1-vol_dis_rep-client-1: Connected to 10.70.34.118:49152, attached to remote volume '/rhs/bricks/b1'. [2013-08-02 10:42:17.130214] I [client-handshake.c:1468:client_setvolume_cbk] 1-vol_dis_rep-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-08-02 10:42:17.130458] I [client-handshake.c:1456:client_setvolume_cbk] 1-vol_dis_rep-client-3: Connected to 10.70.34.118:49155, attached to remote volume '/rhs/bricks/b5'. [2013-08-02 10:42:17.130542] I [client-handshake.c:1468:client_setvolume_cbk] 1-vol_dis_rep-client-3: Server and Client lk-version numbers are not same, reopening the fds [2013-08-02 10:42:17.136813] I [fuse-bridge.c:5735:fuse_graph_setup] 0-fuse: switched to graph 1 [2013-08-02 10:42:17.136985] I [client-handshake.c:450:client_set_lk_version_cbk] 1-vol_dis_rep-client-3: Server lk version = 1 [2013-08-02 10:42:17.137040] I [client-handshake.c:450:client_set_lk_version_cbk] 1-vol_dis_rep-client-1: Server lk version = 1 [2013-08-02 10:42:18.059174] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 165: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-08-02 10:42:18.060107] I [afr-common.c:2118:afr_set_root_inode_on_first_lookup] 1-vol_dis_rep-replicate-0: added root inode [2013-08-02 10:42:18.061862] I [afr-common.c:2118:afr_set_root_inode_on_first_lookup] 1-vol_dis_rep-replicate-1: added root inode [2013-08-02 10:42:18.062268] W [fuse-bridge.c:5103:fuse_migrate_fd] 0-glusterfs-fuse: syncop_fsync failed (Transport endpoint is not connected) on fd (0x1d9338c)(basefd:0x1d9338c basefd-inode.gfid:7a315be7-683f-4a4c-b6d6-85936bde21a1) (old-subvolume:vol_dis_rep-0 new-subvolume:vol_dis_rep-1) Expected results: ================ The vol file change should have been transparent and should not through the error to the mount point. {even though the error is temporary)
Created attachment 781940 [details] SOS Reports