Bug 991402 - Write on fuse mount failed with "write error: Transport endpoint is not connected" after a successful remove brick operation
Write on fuse mount failed with "write error: Transport endpoint is not conne...
Status: CLOSED DEFERRED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
storage-qa-internal@redhat.com
:
Depends On:
Blocks: 1286108
  Show dependency treegraph
 
Reported: 2013-08-02 07:15 EDT by spandura
Modified: 2015-11-27 05:51 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1286108 (view as bug list)
Environment:
Last Closed: 2015-11-27 05:47:14 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
SOS Reports (5.59 MB, application/x-gzip)
2013-08-02 07:18 EDT, spandura
no flags Details

  None (edit)
Description spandura 2013-08-02 07:15:57 EDT
Description of problem:
==========================
On a distribute-replicate volume, even after successful remove-brick operation there are temporary failures in writes on a file from the mount point will error message: "Transport endpoint is not connected"

Version-Release number of selected component (if applicable):
============================================================
root@king [Aug-02-2013-16:42:27] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.14rhs-1.el6rhs.x86_64

root@king [Aug-02-2013-16:42:34] >gluster --version
glusterfs 3.4.0.14rhs built on Jul 30 2013 09:09:36

How reproducible:
===================


Steps to Reproduce:
=====================
1.Create 3 x 2 distribute-replicate volume. Start the volume. 

gluster v info vol_dis_rep
 
Volume Name: vol_dis_rep
Type: Distributed-Replicate
Volume ID: e8fd704d-f0b4-4b68-bfb2-dd19553c1a68
Status: Created
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: king:/rhs/bricks/b0
Brick2: hicks:/rhs/bricks/b1
Brick3: king:/rhs/bricks/b2
Brick4: hicks:/rhs/bricks/b3
Brick5: king:/rhs/bricks/b4
Brick6: hicks:/rhs/bricks/b5


2.Create a fuse mount. Open a fd on a file. 
(touch host.conf ; exec 5>./host.conf )

3.Remove the bricks which has the file "host.conf" 

( gluster v remove-brick vol_dis_rep replica 2 king:/rhs/bricks/b2 hicks:/rhs/bricks/b3 start

gluster v remove-brick vol_dis_rep replica 2 king:/rhs/bricks/b2 hicks:/rhs/bricks/b3 status

gluster v remove-brick vol_dis_rep replica 2 king:/rhs/bricks/b2 hicks:/rhs/bricks/b3 commit )

4. When the commit operation is done , immediately write to the file from mount point 

( root@darrel [Aug-02-2013-16:11:03] >for i in `seq 1 1000`; do echo "Hello World $i" >&5; sleep 1 ; done
-bash: echo: write error: Transport endpoint is not connected )

Actual results:
================
Writes on the file didn't fail for ever. The failure was temporary. 

Following is the mount log messages for the failure:
====================================================
[2013-08-02 10:42:16.053969] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 160: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-08-02 10:42:17.055777] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 161: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-08-02 10:42:17.056406] W [fuse-bridge.c:2681:fuse_writev_cbk] 0-glusterfs-fuse: 163: WRITE => -1 (Transport endpoint is not connected)
[2013-08-02 10:42:17.056928] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 164: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-08-02 10:42:17.111462] I [rpc-clnt.c:1675:rpc_clnt_reconfig] 1-vol_dis_rep-client-1: changing port to 49152 (from 0)
[2013-08-02 10:42:17.111560] I [rpc-clnt.c:1675:rpc_clnt_reconfig] 1-vol_dis_rep-client-3: changing port to 49155 (from 0)
[2013-08-02 10:42:17.129567] I [client-handshake.c:1658:select_server_supported_programs] 1-vol_dis_rep-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-08-02 10:42:17.129870] I [client-handshake.c:1658:select_server_supported_programs] 1-vol_dis_rep-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-08-02 10:42:17.130166] I [client-handshake.c:1456:client_setvolume_cbk] 1-vol_dis_rep-client-1: Connected to 10.70.34.118:49152, attached to remote volume '/rhs/bricks/b1'.
[2013-08-02 10:42:17.130214] I [client-handshake.c:1468:client_setvolume_cbk] 1-vol_dis_rep-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2013-08-02 10:42:17.130458] I [client-handshake.c:1456:client_setvolume_cbk] 1-vol_dis_rep-client-3: Connected to 10.70.34.118:49155, attached to remote volume '/rhs/bricks/b5'.
[2013-08-02 10:42:17.130542] I [client-handshake.c:1468:client_setvolume_cbk] 1-vol_dis_rep-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2013-08-02 10:42:17.136813] I [fuse-bridge.c:5735:fuse_graph_setup] 0-fuse: switched to graph 1
[2013-08-02 10:42:17.136985] I [client-handshake.c:450:client_set_lk_version_cbk] 1-vol_dis_rep-client-3: Server lk version = 1
[2013-08-02 10:42:17.137040] I [client-handshake.c:450:client_set_lk_version_cbk] 1-vol_dis_rep-client-1: Server lk version = 1
[2013-08-02 10:42:18.059174] W [fuse-bridge.c:1612:fuse_err_cbk] 0-glusterfs-fuse: 165: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-08-02 10:42:18.060107] I [afr-common.c:2118:afr_set_root_inode_on_first_lookup] 1-vol_dis_rep-replicate-0: added root inode
[2013-08-02 10:42:18.061862] I [afr-common.c:2118:afr_set_root_inode_on_first_lookup] 1-vol_dis_rep-replicate-1: added root inode

[2013-08-02 10:42:18.062268] W [fuse-bridge.c:5103:fuse_migrate_fd] 0-glusterfs-fuse: syncop_fsync failed (Transport endpoint is not connected) on fd (0x1d9338c)(basefd:0x1d9338c basefd-inode.gfid:7a315be7-683f-4a4c-b6d6-85936bde21a1) (old-subvolume:vol_dis_rep-0 new-subvolume:vol_dis_rep-1)

Expected results:
================
The vol file change should have been transparent and should not through the error to the mount point. {even though the error is temporary)
Comment 1 spandura 2013-08-02 07:18:14 EDT
Created attachment 781940 [details]
SOS Reports

Note You need to log in before you can comment on or make changes to this bug.