Bug 1330399 - "Transport endpoint is not connected" error on fuse mount when we bring down the legacy brick of a volume after converting it to replicate
Summary: "Transport endpoint is not connected" error on fuse mount when we bring down ...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-26 07:22 UTC by Nag Pavan Chilakam
Modified: 2019-04-03 09:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-16 07:35:12 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2016-04-26 07:22:13 UTC
Description of problem:
=======================
As part of validating bug 1248998 - [AFR]: Files not available in the mount point after converting Distributed volume type to Replicated one
I came up with few testcases to validate the bug. Following was one case:
Testcase:-->bringing down brick of the original brick must not cause any IO 
issue(As it is now a AFR volume)-->test on both fuse and nfs
Steps:
    1:create a single brick volume

    2:now start volume, and add some files and directories and note them

    3:now add-brick such that this brick makes the volume a replica vol 1x2 by using below command  gluster v add-brick <vname> rep  2   <newbirck>
    4:Now check if the heal is complete using heal info

    5. After heal completes, now bring down the first brick(which was used to create the actual old volume)

    6.without doing any lookup on the fuse mount, try to create new file ; make sure the file gets created


But at step 6 , when i do a file create(no look up done like ls,etc)
, the first file fails saying "transport end point error"

Version-Release number of selected component (if applicable):
============================
[root@dhcp35-98 qatp]# rpm -qa|grep gluster
glusterfs-client-xlators-3.7.9-2.el7rhgs.x86_64
glusterfs-server-3.7.9-2.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64
vdsm-gluster-4.16.30-1.3.el7rhgs.noarch
glusterfs-3.7.9-2.el7rhgs.x86_64
glusterfs-api-3.7.9-2.el7rhgs.x86_64
glusterfs-cli-3.7.9-2.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-2.el7rhgs.x86_64
gluster-nagios-common-0.2.3-1.el7rhgs.noarch
glusterfs-libs-3.7.9-2.el7rhgs.x86_64
glusterfs-fuse-3.7.9-2.el7rhgs.x86_64
glusterfs-rdma-3.7.9-2.el7rhgs.x86_64
[root@dhcp35-98 qatp]# 





Fuse mount log errors:
======================
 +------------------------------------------------------------------------------+
    167 [2016-04-26 06:32:39.073633] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-1: changing port to 49157 (from 0)
    168 [2016-04-26 06:32:39.078158] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 2-qatp-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
    169 [2016-04-26 06:32:39.080152] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 2-qatp-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
    170 [2016-04-26 06:32:39.080630] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 2-qatp-client-1: Connected to qatp-client-1, attached to remote volume '/rhs/brick1/qatp'.
    171 [2016-04-26 06:32:39.080665] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 2-qatp-client-1: Server and Client lk-version numbers are not same, reopening the fds
    172 [2016-04-26 06:32:39.080740] I [MSGID: 108005] [afr-common.c:4006:afr_notify] 2-qatp-replicate-0: Subvolume 'qatp-client-1' came back up; going online.
    173 [2016-04-26 06:32:39.080955] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 2-qatp-client-1: Server lk version = 1
    174 [2016-04-26 06:32:39.082195] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 2-qatp-client-0: Connected to qatp-client-0, attached to remote volume '/rhs/brick1/qatp'.
    175 [2016-04-26 06:32:39.082221] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 2-qatp-client-0: Server and Client lk-version numbers are not same, reopening the fds
    176 [2016-04-26 06:32:39.095778] I [fuse-bridge.c:5156:fuse_graph_setup] 0-fuse: switched to graph 2
    177 [2016-04-26 06:32:39.095877] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 2-qatp-client-0: Server lk version = 1
    178 [2016-04-26 06:34:28.786313] W [socket.c:589:__socket_rwv] 2-qatp-client-0: readv on 10.70.35.27:49157 failed (No data available)
    179 [2016-04-26 06:34:28.786313] W [socket.c:589:__socket_rwv] 0-qatp-client-0: readv on 10.70.35.27:49157 failed (No data available)
    180 [2016-04-26 06:34:28.786844] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-qatp-client-0: disconnected from qatp-client-0. Client process will keep trying to connect to glusterd unti        l brick's port is available
    181 [2016-04-26 06:34:28.786850] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 2-qatp-client-0: disconnected from qatp-client-0. Client process will keep trying to connect to glusterd unti        l brick's port is available
    182 [2016-04-26 06:34:39.521692] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 0-qatp-client-0: changing port to 49157 (from 0)
    183 [2016-04-26 06:34:39.525326] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    184 [2016-04-26 06:34:39.527215] E [socket.c:2279:socket_connect_finish] 0-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    185 [2016-04-26 06:34:39.529650] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    186 [2016-04-26 06:34:43.529263] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 0-qatp-client-0: changing port to 49157 (from 0)
    187 [2016-04-26 06:34:43.531599] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    188 [2016-04-26 06:34:43.534005] E [socket.c:2279:socket_connect_finish] 0-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    189 [2016-04-26 06:34:43.536378] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    190 [2016-04-26 06:34:46.642793] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-qatp-client-0: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transpo        rt endpoint is not connected]
    191 [2016-04-26 06:34:46.642970] W [fuse-bridge.c:763:fuse_attr_cbk] 0-glusterfs-fuse: 2000740: LOOKUP() / => -1 (Transport endpoint is not connected)
    192 [2016-04-26 06:34:46.647018] I [MSGID: 114021] [client.c:2115:notify] 0-qatp-client-0: current graph is no longer active, destroying rpc_client
    193 [2016-04-26 06:34:47.535752] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    194 [2016-04-26 06:34:47.538778] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    195 [2016-04-26 06:34:51.539068] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    196 [2016-04-26 06:34:51.541994] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    197 [2016-04-26 06:34:55.542671] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    198 [2016-04-26 06:34:55.545955] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    199 [2016-04-26 06:34:59.545721] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    200 [2016-04-26 06:34:59.548577] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    201 [2016-04-26 06:35:03.549121] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    202 [2016-04-26 06:35:03.552082] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    203 [2016-04-26 06:35:03.908561] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 2-qatp-dht: Found anomalies in /dir1 (gfid = d072cf5f-b4c4-4591-be1b-9b12ec514841). Holes=1 overlaps=0
    204 [2016-04-26 06:35:03.911394] I [MSGID: 109036] [dht-common.c:8173:dht_log_new_layout_for_dir_selfheal] 2-qatp-dht: Setting layout of /dir1 with [Subvol_name: qatp-replicate-0, Err: -1 , Star        t: 0 , Stop: 4294967295 , Hash: 1 ],
    205 [2016-04-26 06:35:03.913635] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 2-qatp-dht: Found anomalies in /dir2 (gfid = f30e2145-a4a8-4c00-a476-08ccf194f7e3). Holes=1 overlaps=0
    206 [2016-04-26 06:35:03.915365] I [MSGID: 109036] [dht-common.c:8173:dht_log_new_layout_for_dir_selfheal] 2-qatp-dht: Setting layout of /dir2 with [Subvol_name: qatp-replicate-0, Err: -1 , Star        t: 0 , Stop: 4294967295 , Hash: 1 ],
    207 [2016-04-26 06:35:07.553499] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    208 [2016-04-26 06:35:07.556544] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    209 [2016-04-26 06:35:11.557447] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    210 [2016-04-26 06:35:11.560322] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    211 [2016-04-26 06:35:15.560325] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)
    212 [2016-04-26 06:35:15.563205] E [socket.c:2279:socket_connect_finish] 2-qatp-client-0: connection to 10.70.35.27:49157 failed (Connection refused)
    213 [2016-04-26 06:35:19.563777] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 2-qatp-client-0: changing port to 49157 (from 0)

Comment 2 Nag Pavan Chilakam 2016-04-26 07:25:18 UTC
however, new files creates pass after first file(which failed)

Comment 5 Karthik U S 2018-02-16 07:35:12 UTC
Not able to hit this on the latest bits and it works as expected. Please feel free to reopen this if you still hit this issue.


Note You need to log in before you can comment on or make changes to this bug.