Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 763232 (GLUSTER-1500)

Summary:	Mount point should not be in-accessible between reconnect to server
Product:	[Community] GlusterFS	Reporter:	Sachidananda Urs <sac>
Component:	distribute	Assignee:	Amar Tumballi <amarts>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	low	Docs Contact:
Priority:	low
Version:	3.0.5	CC:	aavati, anush, exa.exa, gluster.bugs, gluster-bugs, vijay, vraman
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Sachidananda Urs 2010-09-01 11:47:04 UTC

Scenario:

Setup two servers running on plain distribute and ucarp is setup to run GlusterFS on a backup machine upon failover.

Machine A     Machine B
         UCARP
           |
    CLIENT (/mnt/gluster)

When the Machine A goes down, ucarp starts GlusterFS automatically on Machine B and client reconnects to the volumes on Machine B.

But when tried to access mount point, I get `Stale NFS File Handle' error. I have to unmount and remount the client.

Note: Machine A and Machine B are connected to same SAN backends.

Comment 1 Amar Tumballi 2010-09-13 01:07:41 UTC

Sacchi,

Can you check the same setup with 3.1.0alpha release and see if it works without any changes? I guess we don't need any changes in code base to get this working as we got 'gfid' feature.

If it works, can you close the bug?

-Amar

Comment 2 Sachidananda Urs 2010-09-13 01:38:57 UTC

> Can you check the same setup with 3.1.0alpha release and see if it works
> without any changes? I guess we don't need any changes in code base to get this
> working as we got 'gfid' feature.
> 
> If it works, can you close the bug?

I will check and update the bug.

Comment 3 Amar Tumballi 2010-09-13 06:59:26 UTC

Bring in a feature to block client writes for 10-20 (or for the given option 'reconnection-timeout') seconds if there is no connection with server. This helps in bringing in fail-over features in GlusterFS, without application knowing.

Comment 4 Vijay Bellur 2010-09-29 10:17:41 UTC

PATCH: http://patches.gluster.com/patch/5026 in master (defaults.{c,h}: _resume functions added)

Comment 5 Vijay Bellur 2010-09-29 10:17:45 UTC

PATCH: http://patches.gluster.com/patch/5039 in master (features/quiesce: new translator)

Comment 6 Amar Tumballi 2010-10-04 02:13:39 UTC

quiesce translator has been developed for the same reason.. we will enable it after testing it further post 3.1.0

Comment 7 Mirek Kratochvil 2010-10-05 04:23:58 UTC

Is there some good temporary workaround or or patch that would allow nodes to reconnect without receiving 'stale NFS file handle' errors until this fix gets released?

thx
-mk

Comment 8 Richard Scott 2010-12-09 09:33:43 UTC

I get a cannot access /mnt/glusterfs: Stale NFS file handle warning when adding a new node to a cluster.... will this patch help resolve that?

Comment 9 Anand Avati 2010-12-09 09:34:55 UTC

(In reply to comment #8)
> I get a cannot access /mnt/glusterfs: Stale NFS file handle warning when adding
> a new node to a cluster.... will this patch help resolve that?

What version are you trying with? Can you see if the latest git head works fine for you?

Avati

Comment 10 Richard Scott 2010-12-09 09:39:21 UTC

I'm using 3.1.1... will try and see if I can get the git release.
Rich

Comment 11 Richard Scott 2010-12-09 11:06:35 UTC

the git I've just downloaded is unusable for me...

# gluster volume create biostar transport tcp 172.16.0.1:/mnt/storage
Creating Volume biostar failed
# gluster volume help
unrecognized word: help (position 1)
biostar glusterd # gluster volume create help
Segmentation fault (core dumped)

Rich

Comment 12 Richard Scott 2010-12-09 17:27:03 UTC

Update: I only get the Stale NFS file handle warning when adding the 1st new node to a cluster. 

For example, in a distributed node I have node1, node2 and node3. When I create the volume with node1, and then add node2 I get the error, but adding node3 is ok.

Likewise if I have a replicated-distributed cluster with node1+node2 and then add node3+node4 I get the error, but when I then add node5+node6 there is no error.

Hope this helps,

Rich

Comment 13 Anand Avati 2010-12-10 01:54:28 UTC

Can you try the latest git head? Some fixes have gone into code which are related to the errors you are facing. It is very much possible your issue has already gotten addressed in the repository code.

Comment 14 Richard Scott 2010-12-10 06:04:02 UTC

I've just downloaded the latest git release and its still unusable for me as per #11.

I'll keep trying tho, and open another bug report if its still a problem for me.

Comment 15 Anand Avati 2010-12-29 15:01:07 UTC

PATCH: http://patches.gluster.com/patch/5648 in master (quiesce: bring in feature to re-transmit the frames)