Bug 1330399

Summary: "Transport endpoint is not connected" error on fuse mount when we bring down the legacy brick of a volume after converting it to replicate
Product: Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED WORKSFORME QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: medium    
Version: rhgs-3.1CC: ksubrahm, ravishankar, rhs-bugs
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-16 07:35:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Nag Pavan Chilakam 2016-04-26 07:22:13 UTC
Description of problem:
=======================
As part of validating bug 1248998 - [AFR]: Files not available in the mount point after converting Distributed volume type to Replicated one
I came up with few testcases to validate the bug. Following was one case:
Testcase:-->bringing down brick of the original brick must not cause any IO 
issue(As it is now a AFR volume)-->test on both fuse and nfs
Steps:
    1:create a single brick volume

    2:now start volume, and add some files and directories and note them

    3:now add-brick such that this brick makes the volume a replica vol 1x2 by using below command  gluster v add-brick <vname> rep  2   <newbirck>
    4:Now check if the heal is complete using heal info

    5. After heal completes, now bring down the first brick(which was used to create the actual old volume)

    6.without doing any lookup on the fuse mount, try to create new file ; make sure the file gets created


But at step 6 , when i do a file create(no look up done like ls,etc)
, the first file fails saying "transport end point error"

Version-Release number of selected component (if applicable):
============================
[root@dhcp35-98 qatp]# rpm -qa|grep gluster
glusterfs-client-xlators-3.7.9-2.el7rhgs.x86_64
glusterfs-server-3.7.9-2.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64
vdsm-gluster-4.16.30-1.3.el7rhgs.noarch
glusterfs-3.7.9-2.el7rhgs.x86_64
glusterfs-api-3.7.9-2.el7rhgs.x86_64
glusterfs-cli-3.7.9-2.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-2.el7rhgs.x86_64
gluster-nagios-common-0.2.3-1.el7rhgs.noarch
glusterfs-libs-3.7.9-2.el7rhgs.x86_64
glusterfs-fuse-3.7.9-2.el7rhgs.x86_64
glusterfs-rdma-3.7.9-2.el7rhgs.x86_64
[root@dhcp35-98 qatp]# 





Fuse mount log errors:
======================
 +------------------------------------------------------------------------------+
    167 [2016-04-26 06:32:39.073633] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-1: changing port to 49157 (from 0)
    168 [2016-04-26 06:32:39.078158] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 2-qatp-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
    169 [2016-04-26 06:32:39.080152] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 2-qatp-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
    170 [2016-04-26 06:32:39.080630] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 2-qatp-client-1: Connected to qatp-client-1, attached to remote volume '/rhs/brick1/qatp'.
    171 [2016-04-26 06:32:39.080665] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 2-qatp-client-1: Server and Client lk-version numbers are not same, reopening the fds
    172 [2016-04-26 06:32:39.080740] I [MSGID: 108005] [afr-common.c:4006:afr_notify] 2-qatp-replicate-0: Subvolume 'qatp-client-1' came back up; going online.
    173 [2016-04-26 06:32:39.080955] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 2-qatp-client-1: Server lk version = 1
    174 [2016-04-26 06:32:39.082195] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 2-qatp-client-0: Connected to qatp-client-0, attached to remote volume '/rhs/brick1/qatp'.
    175 [2016-04-26 06:32:39.082221] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 2-qatp-client-0: Server and Client lk-version numbers are not same, reopening the fds
    176 [2016-04-26 06:32:39.095778] I [fuse-bridge.c:5156:fuse_graph_setup] 0-fuse: switched to graph 2
    177 [2016-04-26 06:32:39.095877] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 2-qatp-client-0: Server lk version = 1
    178 [2016-04-26 06:34:28.786313] W [socket.c:589:__socket_rwv] 2-qatp-client-0: readv on 10.70.35.27:49157 failed (No data available)
    179 [2016-04-26 06:34:28.786313] W [socket.c:589:__socket_rwv] 0-qatp-client-0: readv on 10.70.35.27:49157 failed (No data available)
    180 [2016-04-26 06:34:28.786844] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-qatp-client-0: disconnected from qatp-client-0. Client process will keep trying to connect to glusterd unti        l brick's port is available
    181 [2016-04-26 06:34:28.786850] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 2-qatp-client-0: disconnected from qatp-client-0. Client process will keep trying to connect to glusterd unti        l brick's port is available
    182 [2016-04-26 06:34:39.521692] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 0-qatp-client-0: changing port to 49157 (from 0)
    183 [2016-04-26 06:34:39.525326] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    184 [2016-04-26 06:34:39.527215] E [socket.c:2279:socket_connect_finish] 0-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    185 [2016-04-26 06:34:39.529650] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    186 [2016-04-26 06:34:43.529263] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 0-qatp-client-0: changing port to 49157 (from 0)
    187 [2016-04-26 06:34:43.531599] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    188 [2016-04-26 06:34:43.534005] E [socket.c:2279:socket_connect_finish] 0-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    189 [2016-04-26 06:34:43.536378] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    190 [2016-04-26 06:34:46.642793] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-qatp-client-0: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transpo        rt endpoint is not connected]
    191 [2016-04-26 06:34:46.642970] W [fuse-bridge.c:763:fuse_attr_cbk] 0-glusterfs-fuse: 2000740: LOOKUP() / => -1 (Transport endpoint is not connected)
    192 [2016-04-26 06:34:46.647018] I [MSGID: 114021] [client.c:2115:notify] 0-qatp-client-0: current graph is no longer active, destroying rpc_client
    193 [2016-04-26 06:34:47.535752] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    194 [2016-04-26 06:34:47.538778] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    195 [2016-04-26 06:34:51.539068] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    196 [2016-04-26 06:34:51.541994] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    197 [2016-04-26 06:34:55.542671] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    198 [2016-04-26 06:34:55.545955] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    199 [2016-04-26 06:34:59.545721] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    200 [2016-04-26 06:34:59.548577] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    201 [2016-04-26 06:35:03.549121] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    202 [2016-04-26 06:35:03.552082] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    203 [2016-04-26 06:35:03.908561] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 2-qatp-dht: Found anomalies in /dir1 (gfid = d072cf5f-b4c4-4591-be1b-9b12ec514841). Holes=1 overlaps=0
    204 [2016-04-26 06:35:03.911394] I [MSGID: 109036] [dht-common.c:8173:dht_log_new_layout_for_dir_selfheal] 2-qatp-dht: Setting layout of /dir1 with [Subvol_name: qatp-replicate-0, Err: -1 , Star        t: 0 , Stop: 4294967295 , Hash: 1 ],
    205 [2016-04-26 06:35:03.913635] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 2-qatp-dht: Found anomalies in /dir2 (gfid = f30e2145-a4a8-4c00-a476-08ccf194f7e3). Holes=1 overlaps=0
    206 [2016-04-26 06:35:03.915365] I [MSGID: 109036] [dht-common.c:8173:dht_log_new_layout_for_dir_selfheal] 2-qatp-dht: Setting layout of /dir2 with [Subvol_name: qatp-replicate-0, Err: -1 , Star        t: 0 , Stop: 4294967295 , Hash: 1 ],
    207 [2016-04-26 06:35:07.553499] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    208 [2016-04-26 06:35:07.556544] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    209 [2016-04-26 06:35:11.557447] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    210 [2016-04-26 06:35:11.560322] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    211 [2016-04-26 06:35:15.560325] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    212 [2016-04-26 06:35:15.563205] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    213 [2016-04-26 06:35:19.563777] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)

Comment 2 Nag Pavan Chilakam 2016-04-26 07:25:18 UTC
however, new files creates pass after first file(which failed)

Comment 5 Karthik U S 2018-02-16 07:35:12 UTC
Not able to hit this on the latest bits and it works as expected. Please feel free to reopen this if you still hit this issue.