Bug 1233151 - rm command fails with "Transport end point not connected" during add brick
Summary: rm command fails with "Transport end point not connected" during add brick
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Sakshi
QA Contact:
URL:
Whiteboard: dht-add-brick
Depends On: 994405 1265890 1273354
Blocks: 1225330 1229270 1229271 1235202
TreeView+ depends on / blocked
 
Reported: 2015-06-18 10:48 UTC by Sakshi
Modified: 2016-08-01 01:22 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of: 994405
Environment:
Last Closed: 2016-06-16 13:13:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Sakshi 2015-06-18 10:48:25 UTC
+++ This bug was initially created as a clone of Bug #994405 +++

Description of problem:

while removing a directory from the mount point if we issue add-brick command then  rm fails with "Transport end point not connected"

Version-Release number of selected component (if applicable):

3.4.0.17rhs-1.el6rhs.x86_64
How reproducible:
always

Steps to Reproduce:
1. created a distributed volume
2. mount the volume and untarred the kernel
3. rm -rf linux-2.6.32.61

Actual results:

after sometime error pops on the mount point

[root@gqac024 mnt]# rm -rf linux-2.6.32.61
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-integrator/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-iop13xx/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-iop32x/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-iop33x/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-ixp2000/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-ixp23xx/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-ixp4xx/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-kirkwood/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-ks8695/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-l7200/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-lh7a40x/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-loki/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mmp/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-msm/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mv78xx0/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mx1': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mx2': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mx25': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mx3': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mxc91231': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-netx/include/mach': Transport endpoint is not connected


Expected results:


Additional info:
================

RHS nodes
=========
gqac022.sbu.lab.eng.bos.redhat.com
gqac023.sbu.lab.eng.bos.redhat.com

Mounted on 
============
gqac024.sbu.lab.eng.bos.redhat.com

mount point 
===========
/mnt

add-brick issued from gqac022.sbu.lab.eng.bos.redhat.com

[root@gqac022 rpm]# gluster v info anon
 
Volume Name: anon
Type: Distribute
Volume ID: 61e3c5b2-cb03-4ea8-9a69-a8762191d296
Status: Started
Number of Bricks: 15
Transport-type: tcp
Bricks:
Brick1: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon1
Brick2: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon2
Brick3: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon3
Brick4: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon4
Brick5: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon5
Brick6: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon6
Brick7: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon7
Brick8: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon8
Brick9: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon9
Brick10: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon10
Brick11: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon11
Brick12: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon12
Brick13: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon13
Brick14: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon14
Brick15: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon15


Rebalance was performed before adding the new bricks



mnt logs
=========
2013-08-07 08:06:00.201235] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-14: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt/kvm (28c128a7-87c1-493e-9aab-713bcbc73221)
[2013-08-07 08:06:00.201274] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-13: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt/kvm (28c128a7-87c1-493e-9aab-713bcbc73221)
[2013-08-07 08:06:00.215054] W [client-rpc-fops.c:2316:client3_3_readdirp_cbk] 4-anon-client-13: remote operation failed: No such fil
e or directory
[2013-08-07 08:06:00.215480] W [client-rpc-fops.c:2316:client3_3_readdirp_cbk] 4-anon-client-14: remote operation failed: No such fil
e or directory
[2013-08-07 08:06:00.228461] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-13: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt/kvm (28c128a7-87c1-493e-9aab-713bcbc73221)
[2013-08-07 08:06:00.228536] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-14: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt/kvm (28c128a7-87c1-493e-9aab-713bcbc73221)
[2013-08-07 08:06:00.229528] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 4-anon-client-14: remote operation failed: No such file or
 directory
[2013-08-07 08:06:00.229607] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 4-anon-client-13: remote operation failed: No such file or
 directory
[2013-08-07 08:06:00.230174] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1154112: RMDIR() <gfid:6ef0c711-d65c-4cec-90ba-
1ba87e1163e0>/virt/kvm => -1 (No such file or directory)
[2013-08-07 08:06:00.230932] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-14: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt (42ca8074-6470-48c0-9731-eb7e8a5d63ea)
[2013-08-07 08:06:00.231116] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-13: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt (42ca8074-6470-48c0-9731-eb7e8a5d63ea)
[2013-08-07 08:06:00.231941] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 4-anon-client-14: remote operation failed: No such file or
 directory
[2013-08-07 08:06:00.232113] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 4-anon-client-13: remote operation failed: No such file or
 directory
[2013-08-07 08:06:00.232618] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1154115: RMDIR() <gfid:6ef0c711-d65c-4cec-90ba-
1ba87e1163e0>/virt => -1 (No such file or directory)

--- Additional comment from shylesh on 2013-08-07 04:32:29 EDT ---

sosreports@
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/994405/

Comment 1 Sakshi 2015-06-18 11:06:14 UTC
The main issue is that, irrespective of whether the newly added brick received a port or not the volume file change happens. So consider a scenario where the newly added brick requested for a new port but has not received a port, however volume file change happens. Hence fops are now sent to the newly added brick as well. The newly added brick still does not have a port yet hence the fop fails on that brick with "Transpor endpoint not connected". Also instead of glusterd creating and notifying the volfile immediately, it can notify the client once the brick is added. 

The fix would be to notify volume file change only after a new brick is added and has gets a port.

Comment 2 Anand Avati 2015-08-20 03:51:17 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd: newly added brick receives fops only after it is started) posted (#2) for review on master by Sakshi Bansal

Comment 3 Anand Avati 2015-08-21 10:55:47 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd: newly added brick receives fops only after it is started) posted (#3) for review on master by Dan Lambright (dlambrig)

Comment 4 Anand Avati 2015-08-27 17:14:11 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd: newly added brick receives fops only after it is started) posted (#4) for review on master by Dan Lambright (dlambrig)

Comment 5 Anand Avati 2015-08-28 02:37:42 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd: newly added brick receives fops only after it is started) posted (#5) for review on master by Dan Lambright (dlambrig)

Comment 6 Vijay Bellur 2015-09-02 16:00:43 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd: newly added brick receives fops only after it is started) posted (#6) for review on master by Vijay Bellur (vbellur)

Comment 7 Vijay Bellur 2015-09-04 04:56:40 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd : newly added brick receives fops only after it is started) posted (#7) for review on master by Sakshi Bansal (sabansal)

Comment 8 Vijay Bellur 2015-09-13 01:03:00 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd : newly added brick receives fops only after it is started) posted (#8) for review on master by Dan Lambright (dlambrig)

Comment 9 Niels de Vos 2016-06-16 13:13:39 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.