Bug 763215 (GLUSTER-1483)

Summary: after remove-brick , can't access mount point
Product: [Community] GlusterFS Reporter: Lakshmipathi G <lakshmipathi>
Component: glusterdAssignee: shishir gowda <sgowda>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: 3.1-alphaCC: gluster-bugs, nsathyan, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lakshmipathi G 2010-08-31 07:07:42 UTC
I have 4 dht servers , s1 mounted to c1 and s2 to  c2  -

c1#mount -t glusterfs s1:vol /mnt
c2#mount -t glusterfs s2:vol /mnt

After removing s1(10.192.141.187) using the command, gluster volume remove-brick. I'm unable to access mount point in c1.
c1#ls -ltr /mnt
ls: cannot access /mnt: Stale NFS file handle
==========
#gluster volume info
Number of Volumes: 1

Volume Name: dd4
Type: None
Status: Started
Number of Bricks: 3
Bricks:
Brick1: 10.192.134.144:/mnt/d1
Brick2: 10.214.231.112:/mnt/d1
Brick3: 10.198.110.16:/mnt/d1


client log-

+---------------------------------------------
---------------------------------+
[2010-08-30 09:50:55.976670] I [fuse-bridge.c:2860:fuse_init] glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.8
[2010-08-30 09:50:59.865067] I [client-handshake.c:649:select_server_supported_programs] dd4-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2010-08-30 09:50:59.865224] I [client-handshake.c:649:select_server_supported_programs] dd4-client-2: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2010-08-30 09:50:59.865281] I [client-handshake.c:649:select_server_supported_programs] dd4-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2010-08-30 09:50:59.889901] I [client-handshake.c:649:select_server_supported_programs] dd4-client-3: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2010-08-30 09:50:59.935835] I [client-handshake.c:486:client_setvolume_cbk] dd4-client-1: Connected to 10.192.134.144:6971, attached to remote volume '/mnt/d1'.
[2010-08-30 09:50:59.936189] I [client-handshake.c:486:client_setvolume_cbk] dd4-client-2: Connected to 10.214.231.112:6971, attached to remote volume '/mnt/d1'.
[2010-08-30 09:50:59.946060] I [client-handshake.c:486:client_setvolume_cbk] dd4-client-0: Connected to 10.192.141.187:6971, attached to remote volume '/mnt/d1'.
[2010-08-30 09:50:59.963181] I [client-handshake.c:486:client_setvolume_cbk] dd4-client-3: Connected to 10.198.110.16:6971, attached to remote volume '/mnt/d1'.
[2010-08-31 01:37:02.315068] I [rpc-clnt.c:681:rpc_clnt_handle_cbk] rpc-clnt: RPC XID: 2a, Ver: 2, Program: 52743234, ProgVers: 1, Proc: 1
[2010-08-31 01:37:02.315154] I [glusterfsd-mgmt.c:57:mgmt_cbk_spec] mgmt: Volume file changed
[2010-08-31 01:37:13.75888] E [socket.c:1580:socket_connect_finish] dd4-client-0: connection to 10.192.141.187:6971 failed (Connection refused)
[2010-08-31 01:38:43.545709] W [fuse-bridge.c:416:fuse_attr_cbk] glusterfs-fuse: 7034: LOOKUP() / => -1 (Stale NFS file handle)

Comment 1 Amar Tumballi 2010-09-06 02:04:35 UTC
Shishir can you check this? I tried reproducing it, for me, it was coming only for the first time, and then mountpoint was working fine. Let me know what you get.

Comment 2 shishir gowda 2010-09-07 03:19:49 UTC
The reason seems to be an open socket for the older brick.

[2010-09-07 11:39:41.520839] I [client-handshake.c:489:client_setvolume_cbk] new-client-0: Connected to 192.168.1.131:6972, attached to remote volume '/export/dir1'.  <--------new brick-0 after remove brick succeeded
[2010-09-07 11:39:43.730686] E [socket.c:1609:socket_connect_finish] new-client-0: connection to 192.168.1.131:6971 failed (Connection refused) <---- old brick-0 being accessed. port number is 6971.

root@shishirng-laptop:/home/shishirng# ls /mnt/dht/
ls: cannot access /mnt/dht/linux-2.6.31.14: Input/output error
ls: cannot access /mnt/dht/bacd: Input/output error
ls: cannot access /mnt/dht/new: Input/output error
bacd  checkout.sh  linux-2.6.31.14  linux-2.6.31.14.tar  new

<----seeing error msg, but still the correct o/p is shown. This should be expected, as we removed a brick.
root@shishirng-laptop:/home/shishirng# ls /mnt/dht/
bacd  checkout.sh  linux-2.6.31.14  linux-2.6.31.14.tar  new
root@shishirng-laptop:/home/shishirng# cd /mnt/dht/
root@shishirng-laptop:/mnt/dht# ls
bacd  checkout.sh  linux-2.6.31.14  linux-2.6.31.14.tar  new

<--- all following operations succeed... 
the error is seen only for the first time

Lakshmi, please confirm that you are not seeing the errors everytime.

Comment 3 Lakshmipathi G 2010-09-07 03:28:35 UTC
yes.this error
"ls: cannot access /mnt: Stale NFS file handle"
happens at first time but not everytime,issuing another "ls" command ,it works fine. 

ls -l /mnt