Bug 991402

Summary: Write on fuse mount failed with "write error: Transport endpoint is not connected" after a successful remove brick operation
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: spandura
Component: glusterfsAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED DEFERRED QA Contact: storage-qa-internal <storage-qa-internal>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.1CC: rhs-bugs, spalai, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1286108 (view as bug list) Environment:
Last Closed: 2015-11-27 10:47:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1286108    
Attachments:
Description Flags
SOS Reports none

Description spandura 2013-08-02 11:15:57 UTC
Description of problem:
==========================
On a distribute-replicate volume, even after successful remove-brick operation there are temporary failures in writes on a file from the mount point will error message: "Transport endpoint is not connected"

Version-Release number of selected component (if applicable):
============================================================
root@king [Aug-02-2013-16:42:27] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.14rhs-1.el6rhs.x86_64

root@king [Aug-02-2013-16:42:34] >gluster --version
glusterfs 3.4.0.14rhs built on Jul 30 2013 09:09:36

How reproducible:
===================


Steps to Reproduce:
=====================
1.Create 3 x 2 distribute-replicate volume. Start the volume. 

gluster v info vol_dis_rep
 
Volume Name: vol_dis_rep
Type: Distributed-Replicate
Volume ID: e8fd704d-f0b4-4b68-bfb2-dd19553c1a68
Status: Created
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: king:/rhs/bricks/b0
Brick2: hicks:/rhs/bricks/b1
Brick3: king:/rhs/bricks/b2
Brick4: hicks:/rhs/bricks/b3
Brick5: king:/rhs/bricks/b4
Brick6: hicks:/rhs/bricks/b5


2.Create a fuse mount. Open a fd on a file. 
(touch host.conf ; exec 5>./host.conf )

3.Remove the bricks which has the file "host.conf" 

( gluster v remove-brick vol_dis_rep replica 2 king:/rhs/bricks/b2 hicks:/rhs/bricks/b3 start

gluster v remove-brick vol_dis_rep replica 2 king:/rhs/bricks/b2 hicks:/rhs/bricks/b3 status

gluster v remove-brick vol_dis_rep replica 2 king:/rhs/bricks/b2 hicks:/rhs/bricks/b3 commit )

4. When the commit operation is done , immediately write to the file from mount point 

( root@darrel [Aug-02-2013-16:11:03] >for i in `seq 1 1000`; do echo "Hello World $i" >&5; sleep 1 ; done
-bash: echo: write error: Transport endpoint is not connected )

Actual results:
================
Writes on the file didn't fail for ever. The failure was temporary. 

Following is the mount log messages for the failure:
====================================================
[2013-08-02 10:42:16.053969] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 160: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-08-02 10:42:17.055777] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 161: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-08-02 10:42:17.056406] W [fuse-bridge.c:2681:fuse_writev_cbk] 0-glusterfs-fuse: 163: WRITE => -1 (Transport endpoint is not connected)
[2013-08-02 10:42:17.056928] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 164: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-08-02 10:42:17.111462] I [rpc-clnt.c:1675:rpc_clnt_reconfig] 1-vol_dis_rep-client-1: changing port to 49152 (from 0)
[2013-08-02 10:42:17.111560] I [rpc-clnt.c:1675:rpc_clnt_reconfig] 1-vol_dis_rep-client-3: changing port to 49155 (from 0)
[2013-08-02 10:42:17.129567] I [client-handshake.c:1658:select_server_supported_programs] 1-vol_dis_rep-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-08-02 10:42:17.129870] I [client-handshake.c:1658:select_server_supported_programs] 1-vol_dis_rep-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-08-02 10:42:17.130166] I [client-handshake.c:1456:client_setvolume_cbk] 1-vol_dis_rep-client-1: Connected to 10.70.34.118:49152, attached to remote volume '/rhs/bricks/b1'.
[2013-08-02 10:42:17.130214] I [client-handshake.c:1468:client_setvolume_cbk] 1-vol_dis_rep-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-08-02 10:42:17.130458] I [client-handshake.c:1456:client_setvolume_cbk] 1-vol_dis_rep-client-3: Connected to 10.70.34.118:49155, attached to remote volume '/rhs/bricks/b5'.
[2013-08-02 10:42:17.130542] I [client-handshake.c:1468:client_setvolume_cbk] 1-vol_dis_rep-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2013-08-02 10:42:17.136813] I [fuse-bridge.c:5735:fuse_graph_setup] 0-fuse: switched to graph 1
[2013-08-02 10:42:17.136985] I [client-handshake.c:450:client_set_lk_version_cbk] 1-vol_dis_rep-client-3: Server lk version = 1
[2013-08-02 10:42:17.137040] I [client-handshake.c:450:client_set_lk_version_cbk] 1-vol_dis_rep-client-1: Server lk version = 1
[2013-08-02 10:42:18.059174] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 165: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-08-02 10:42:18.060107] I [afr-common.c:2118:afr_set_root_inode_on_first_lookup] 1-vol_dis_rep-replicate-0: added root inode
[2013-08-02 10:42:18.061862] I [afr-common.c:2118:afr_set_root_inode_on_first_lookup] 1-vol_dis_rep-replicate-1: added root inode

[2013-08-02 10:42:18.062268] W [fuse-bridge.c:5103:fuse_migrate_fd] 0-glusterfs-fuse: syncop_fsync failed (Transport endpoint is not connected) on fd (0x1d9338c)(basefd:0x1d9338c basefd-inode.gfid:7a315be7-683f-4a4c-b6d6-85936bde21a1) (old-subvolume:vol_dis_rep-0 new-subvolume:vol_dis_rep-1)

Expected results:
================
The vol file change should have been transparent and should not through the error to the mount point. {even though the error is temporary)

Comment 1 spandura 2013-08-02 11:18:14 UTC
Created attachment 781940 [details]
SOS Reports