Bug 763232 (GLUSTER-1500) - Mount point should not be in-accessible between reconnect to server
Summary: Mount point should not be in-accessible between reconnect to server
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-1500
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.0.5
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-09-01 11:47 UTC by Sachidananda Urs
Modified: 2015-12-01 16:45 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Sachidananda Urs 2010-09-01 11:47:04 UTC
Scenario:

Setup two servers running on plain distribute and ucarp is setup to run GlusterFS on a backup machine upon failover.

Machine A     Machine B
         UCARP
           |
    CLIENT (/mnt/gluster)

When the Machine A goes down, ucarp starts GlusterFS automatically on Machine B and client reconnects to the volumes on Machine B.

But when tried to access mount point, I get `Stale NFS File Handle' error. I have to unmount and remount the client.

Note: Machine A and Machine B are connected to same SAN backends.

Comment 1 Amar Tumballi 2010-09-13 01:07:41 UTC
Sacchi,

Can you check the same setup with 3.1.0alpha release and see if it works without any changes? I guess we don't need any changes in code base to get this working as we got 'gfid' feature.

If it works, can you close the bug?

-Amar

Comment 2 Sachidananda Urs 2010-09-13 01:38:57 UTC
> Can you check the same setup with 3.1.0alpha release and see if it works
> without any changes? I guess we don't need any changes in code base to get this
> working as we got 'gfid' feature.
> 
> If it works, can you close the bug?

I will check and update the bug.

Comment 3 Amar Tumballi 2010-09-13 06:59:26 UTC
Bring in a feature to block client writes for 10-20 (or for the given option 'reconnection-timeout') seconds if there is no connection with server. This helps in bringing in fail-over features in GlusterFS, without application knowing.

Comment 4 Vijay Bellur 2010-09-29 10:17:41 UTC
PATCH: http://patches.gluster.com/patch/5026 in master (defaults.{c,h}: _resume functions added)

Comment 5 Vijay Bellur 2010-09-29 10:17:45 UTC
PATCH: http://patches.gluster.com/patch/5039 in master (features/quiesce: new translator)

Comment 6 Amar Tumballi 2010-10-04 02:13:39 UTC
quiesce translator has been developed for the same reason.. we will enable it after testing it further post 3.1.0

Comment 7 Mirek Kratochvil 2010-10-05 04:23:58 UTC
Is there some good temporary workaround or or patch that would allow nodes to reconnect without receiving 'stale NFS file handle' errors until this fix gets released?

thx
-mk

Comment 8 Richard Scott 2010-12-09 09:33:43 UTC
I get a cannot access /mnt/glusterfs: Stale NFS file handle warning when adding a new node to a cluster.... will this patch help resolve that?

Comment 9 Anand Avati 2010-12-09 09:34:55 UTC
(In reply to comment #8)
> I get a cannot access /mnt/glusterfs: Stale NFS file handle warning when adding
> a new node to a cluster.... will this patch help resolve that?

What version are you trying with? Can you see if the latest git head works fine for you?

Avati

Comment 10 Richard Scott 2010-12-09 09:39:21 UTC
I'm using 3.1.1... will try and see if I can get the git release.
Rich

Comment 11 Richard Scott 2010-12-09 11:06:35 UTC
the git I've just downloaded is unusable for me...

# gluster volume create biostar transport tcp 172.16.0.1:/mnt/storage
Creating Volume biostar failed
# gluster volume help
unrecognized word: help (position 1)
biostar glusterd # gluster volume create help
Segmentation fault (core dumped)

Rich

Comment 12 Richard Scott 2010-12-09 17:27:03 UTC
Update: I only get the Stale NFS file handle warning when adding the 1st new node to a cluster. 

For example, in a distributed node I have node1, node2 and node3. When I create the volume with node1, and then add node2 I get the error, but adding node3 is ok.

Likewise if I have a replicated-distributed cluster with node1+node2 and then add node3+node4 I get the error, but when I then add node5+node6 there is no error.

Hope this helps,

Rich

Comment 13 Anand Avati 2010-12-10 01:54:28 UTC
Can you try the latest git head? Some fixes have gone into code which are related to the errors you are facing. It is very much possible your issue has already gotten addressed in the repository code.

Comment 14 Richard Scott 2010-12-10 06:04:02 UTC
I've just downloaded the latest git release and its still unusable for me as per #11.

I'll keep trying tho, and open another bug report if its still a problem for me.

Comment 15 Anand Avati 2010-12-29 15:01:07 UTC
PATCH: http://patches.gluster.com/patch/5648 in master (quiesce: bring in feature to re-transmit the frames)


Note You need to log in before you can comment on or make changes to this bug.