Bug 1233151

Summary: rm command fails with "Transport end point not connected" during add brick
Product: [Community] GlusterFS Reporter: Sakshi <sabansal>
Component: distributeAssignee: Sakshi <sabansal>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: amukherj, bugs, dlambrig, nbalacha, rhs-bugs, shmohan, smohan, storage-qa-internal, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: dht-add-brick
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 994405 Environment:
Last Closed: 2016-06-16 13:13:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 994405, 1265890, 1273354    
Bug Blocks: 1225330, 1229270, 1229271, 1235202    

Description Sakshi 2015-06-18 10:48:25 UTC
+++ This bug was initially created as a clone of Bug #994405 +++

Description of problem:

while removing a directory from the mount point if we issue add-brick command then  rm fails with "Transport end point not connected"

Version-Release number of selected component (if applicable):

3.4.0.17rhs-1.el6rhs.x86_64
How reproducible:
always

Steps to Reproduce:
1. created a distributed volume
2. mount the volume and untarred the kernel
3. rm -rf linux-2.6.32.61

Actual results:

after sometime error pops on the mount point

[root@gqac024 mnt]# rm -rf linux-2.6.32.61
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-integrator/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-iop13xx/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-iop32x/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-iop33x/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-ixp2000/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-ixp23xx/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-ixp4xx/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-kirkwood/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-ks8695/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-l7200/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-lh7a40x/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-loki/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mmp/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-msm/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mv78xx0/include/mach': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mx1': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mx2': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mx25': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mx3': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-mxc91231': Transport endpoint is not connected
rm: cannot remove `linux-2.6.32.61/arch/arm/mach-netx/include/mach': Transport endpoint is not connected


Expected results:


Additional info:
================

RHS nodes
=========
gqac022.sbu.lab.eng.bos.redhat.com
gqac023.sbu.lab.eng.bos.redhat.com

Mounted on 
============
gqac024.sbu.lab.eng.bos.redhat.com

mount point 
===========
/mnt

add-brick issued from gqac022.sbu.lab.eng.bos.redhat.com

[root@gqac022 rpm]# gluster v info anon
 
Volume Name: anon
Type: Distribute
Volume ID: 61e3c5b2-cb03-4ea8-9a69-a8762191d296
Status: Started
Number of Bricks: 15
Transport-type: tcp
Bricks:
Brick1: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon1
Brick2: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon2
Brick3: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon3
Brick4: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon4
Brick5: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon5
Brick6: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon6
Brick7: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon7
Brick8: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon8
Brick9: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon9
Brick10: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon10
Brick11: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon11
Brick12: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon12
Brick13: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon13
Brick14: gqac023.sbu.lab.eng.bos.redhat.com:/home/anon14
Brick15: gqac022.sbu.lab.eng.bos.redhat.com:/home/anon15


Rebalance was performed before adding the new bricks



mnt logs
=========
2013-08-07 08:06:00.201235] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-14: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt/kvm (28c128a7-87c1-493e-9aab-713bcbc73221)
[2013-08-07 08:06:00.201274] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-13: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt/kvm (28c128a7-87c1-493e-9aab-713bcbc73221)
[2013-08-07 08:06:00.215054] W [client-rpc-fops.c:2316:client3_3_readdirp_cbk] 4-anon-client-13: remote operation failed: No such fil
e or directory
[2013-08-07 08:06:00.215480] W [client-rpc-fops.c:2316:client3_3_readdirp_cbk] 4-anon-client-14: remote operation failed: No such fil
e or directory
[2013-08-07 08:06:00.228461] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-13: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt/kvm (28c128a7-87c1-493e-9aab-713bcbc73221)
[2013-08-07 08:06:00.228536] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-14: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt/kvm (28c128a7-87c1-493e-9aab-713bcbc73221)
[2013-08-07 08:06:00.229528] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 4-anon-client-14: remote operation failed: No such file or
 directory
[2013-08-07 08:06:00.229607] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 4-anon-client-13: remote operation failed: No such file or
 directory
[2013-08-07 08:06:00.230174] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1154112: RMDIR() <gfid:6ef0c711-d65c-4cec-90ba-
1ba87e1163e0>/virt/kvm => -1 (No such file or directory)
[2013-08-07 08:06:00.230932] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-14: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt (42ca8074-6470-48c0-9731-eb7e8a5d63ea)
[2013-08-07 08:06:00.231116] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 4-anon-client-13: remote operation failed: No such file
 or directory. Path: <gfid:6ef0c711-d65c-4cec-90ba-1ba87e1163e0>/virt (42ca8074-6470-48c0-9731-eb7e8a5d63ea)
[2013-08-07 08:06:00.231941] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 4-anon-client-14: remote operation failed: No such file or
 directory
[2013-08-07 08:06:00.232113] W [client-rpc-fops.c:695:client3_3_rmdir_cbk] 4-anon-client-13: remote operation failed: No such file or
 directory
[2013-08-07 08:06:00.232618] W [fuse-bridge.c:1688:fuse_unlink_cbk] 0-glusterfs-fuse: 1154115: RMDIR() <gfid:6ef0c711-d65c-4cec-90ba-
1ba87e1163e0>/virt => -1 (No such file or directory)

--- Additional comment from shylesh on 2013-08-07 04:32:29 EDT ---

sosreports@
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/994405/

Comment 1 Sakshi 2015-06-18 11:06:14 UTC
The main issue is that, irrespective of whether the newly added brick received a port or not the volume file change happens. So consider a scenario where the newly added brick requested for a new port but has not received a port, however volume file change happens. Hence fops are now sent to the newly added brick as well. The newly added brick still does not have a port yet hence the fop fails on that brick with "Transpor endpoint not connected". Also instead of glusterd creating and notifying the volfile immediately, it can notify the client once the brick is added. 

The fix would be to notify volume file change only after a new brick is added and has gets a port.

Comment 2 Anand Avati 2015-08-20 03:51:17 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd: newly added brick receives fops only after it is started) posted (#2) for review on master by Sakshi Bansal

Comment 3 Anand Avati 2015-08-21 10:55:47 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd: newly added brick receives fops only after it is started) posted (#3) for review on master by Dan Lambright (dlambrig)

Comment 4 Anand Avati 2015-08-27 17:14:11 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd: newly added brick receives fops only after it is started) posted (#4) for review on master by Dan Lambright (dlambrig)

Comment 5 Anand Avati 2015-08-28 02:37:42 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd: newly added brick receives fops only after it is started) posted (#5) for review on master by Dan Lambright (dlambrig)

Comment 6 Vijay Bellur 2015-09-02 16:00:43 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd: newly added brick receives fops only after it is started) posted (#6) for review on master by Vijay Bellur (vbellur)

Comment 7 Vijay Bellur 2015-09-04 04:56:40 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd : newly added brick receives fops only after it is started) posted (#7) for review on master by Sakshi Bansal (sabansal)

Comment 8 Vijay Bellur 2015-09-13 01:03:00 UTC
REVIEW: http://review.gluster.org/11342 (glusterfsd : newly added brick receives fops only after it is started) posted (#8) for review on master by Dan Lambright (dlambrig)

Comment 9 Niels de Vos 2016-06-16 13:13:39 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user